Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Prediction of fine-tuned promoter activity from DNA sequence

Identifieur interne : 000897 ( Pmc/Curation ); précédent : 000896; suivant : 000898

Prediction of fine-tuned promoter activity from DNA sequence

Auteurs : Geoffrey Siwo [États-Unis, Afrique du Sud] ; Andrew Rider [États-Unis] ; Asako Tan [États-Unis] ; Richard Pinapati [États-Unis] ; Scott Emrich [États-Unis] ; Nitesh Chawla [États-Unis] ; Michael Ferdig [États-Unis]

Source :

RBID : PMC:4916984

Abstract

The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.


Url:
DOI: 10.12688/f1000research.7485.1
PubMed: 27347373
PubMed Central: 4916984

Links toward previous steps (curation, corpus...)


Links to Exploration step

PMC:4916984

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Prediction of fine-tuned promoter activity from DNA sequence</title>
<author>
<name sortKey="Siwo, Geoffrey" sort="Siwo, Geoffrey" uniqKey="Siwo G" first="Geoffrey" last="Siwo">Geoffrey Siwo</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a5">IBM TJ Watson Research Center, NY, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>IBM TJ Watson Research Center, NY</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a6">IBM Research-Africa, Johannesberg, South Africa</nlm:aff>
<country xml:lang="fr">Afrique du Sud</country>
<wicri:regionArea>IBM Research-Africa, Johannesberg</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Rider, Andrew" sort="Rider, Andrew" uniqKey="Rider A" first="Andrew" last="Rider">Andrew Rider</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a3">Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Tan, Asako" sort="Tan, Asako" uniqKey="Tan A" first="Asako" last="Tan">Asako Tan</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a7">Epicentre, Madison, WI, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Epicentre, Madison, WI</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pinapati, Richard" sort="Pinapati, Richard" uniqKey="Pinapati R" first="Richard" last="Pinapati">Richard Pinapati</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Emrich, Scott" sort="Emrich, Scott" uniqKey="Emrich S" first="Scott" last="Emrich">Scott Emrich</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a3">Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Chawla, Nitesh" sort="Chawla, Nitesh" uniqKey="Chawla N" first="Nitesh" last="Chawla">Nitesh Chawla</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a3">Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Ferdig, Michael" sort="Ferdig, Michael" uniqKey="Ferdig M" first="Michael" last="Ferdig">Michael Ferdig</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27347373</idno>
<idno type="pmc">4916984</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4916984</idno>
<idno type="RBID">PMC:4916984</idno>
<idno type="doi">10.12688/f1000research.7485.1</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000897</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000897</idno>
<idno type="wicri:Area/Pmc/Curation">000897</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000897</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Prediction of fine-tuned promoter activity from DNA sequence</title>
<author>
<name sortKey="Siwo, Geoffrey" sort="Siwo, Geoffrey" uniqKey="Siwo G" first="Geoffrey" last="Siwo">Geoffrey Siwo</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a5">IBM TJ Watson Research Center, NY, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>IBM TJ Watson Research Center, NY</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a6">IBM Research-Africa, Johannesberg, South Africa</nlm:aff>
<country xml:lang="fr">Afrique du Sud</country>
<wicri:regionArea>IBM Research-Africa, Johannesberg</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Rider, Andrew" sort="Rider, Andrew" uniqKey="Rider A" first="Andrew" last="Rider">Andrew Rider</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a3">Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Tan, Asako" sort="Tan, Asako" uniqKey="Tan A" first="Asako" last="Tan">Asako Tan</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a7">Epicentre, Madison, WI, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Epicentre, Madison, WI</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Pinapati, Richard" sort="Pinapati, Richard" uniqKey="Pinapati R" first="Richard" last="Pinapati">Richard Pinapati</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Emrich, Scott" sort="Emrich, Scott" uniqKey="Emrich S" first="Scott" last="Emrich">Scott Emrich</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a3">Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Chawla, Nitesh" sort="Chawla, Nitesh" uniqKey="Chawla N" first="Nitesh" last="Chawla">Nitesh Chawla</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a3">Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Ferdig, Michael" sort="Ferdig, Michael" uniqKey="Ferdig M" first="Michael" last="Ferdig">Michael Ferdig</name>
<affiliation wicri:level="1">
<nlm:aff id="a1">Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a2">Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Biological Sciences, University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="a4">Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">F1000Research</title>
<idno type="eISSN">2046-1402</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p>The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Schadt, Ee" uniqKey="Schadt E">EE Schadt</name>
</author>
<author>
<name sortKey="Monks, Sa" uniqKey="Monks S">SA Monks</name>
</author>
<author>
<name sortKey="Drake, Ta" uniqKey="Drake T">TA Drake</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tirosh, I" uniqKey="Tirosh I">I Tirosh</name>
</author>
<author>
<name sortKey="Reikhav, S" uniqKey="Reikhav S">S Reikhav</name>
</author>
<author>
<name sortKey="Sigal, N" uniqKey="Sigal N">N Sigal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tirosh, I" uniqKey="Tirosh I">I Tirosh</name>
</author>
<author>
<name sortKey="Weinberger, A" uniqKey="Weinberger A">A Weinberger</name>
</author>
<author>
<name sortKey="Carmi, M" uniqKey="Carmi M">M Carmi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Field, Y" uniqKey="Field Y">Y Field</name>
</author>
<author>
<name sortKey="Fondufe Mittendorf, Y" uniqKey="Fondufe Mittendorf Y">Y Fondufe-Mittendorf</name>
</author>
<author>
<name sortKey="Moore, Ik" uniqKey="Moore I">IK Moore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gonzales, Jm" uniqKey="Gonzales J">JM Gonzales</name>
</author>
<author>
<name sortKey="Patel, Jj" uniqKey="Patel J">JJ Patel</name>
</author>
<author>
<name sortKey="Ponmee, N" uniqKey="Ponmee N">N Ponmee</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ellis, T" uniqKey="Ellis T">T Ellis</name>
</author>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Collins, Jj" uniqKey="Collins J">JJ Collins</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gertz, J" uniqKey="Gertz J">J Gertz</name>
</author>
<author>
<name sortKey="Cohen, Ba" uniqKey="Cohen B">BA Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gertz, J" uniqKey="Gertz J">J Gertz</name>
</author>
<author>
<name sortKey="Siggia, Ed" uniqKey="Siggia E">ED Siggia</name>
</author>
<author>
<name sortKey="Cohen, Ba" uniqKey="Cohen B">BA Cohen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Hd" uniqKey="Kim H">HD Kim</name>
</author>
<author>
<name sortKey="Shay, T" uniqKey="Shay T">T Shay</name>
</author>
<author>
<name sortKey="O Hea, Ek" uniqKey="O Hea E">EK O’Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Takahashi, K" uniqKey="Takahashi K">K Takahashi</name>
</author>
<author>
<name sortKey="Yamanaka, S" uniqKey="Yamanaka S">S Yamanaka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kim, Hd" uniqKey="Kim H">HD Kim</name>
</author>
<author>
<name sortKey="O Hea, Ek" uniqKey="O Hea E">EK O’Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Irie, T" uniqKey="Irie T">T Irie</name>
</author>
<author>
<name sortKey="Park, Sj" uniqKey="Park S">SJ Park</name>
</author>
<author>
<name sortKey="Yamashita, R" uniqKey="Yamashita R">R Yamashita</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cookson, W" uniqKey="Cookson W">W Cookson</name>
</author>
<author>
<name sortKey="Liang, L" uniqKey="Liang L">L Liang</name>
</author>
<author>
<name sortKey="Abecasis, G" uniqKey="Abecasis G">G Abecasis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karczewski, Kj" uniqKey="Karczewski K">KJ Karczewski</name>
</author>
<author>
<name sortKey="Tatonetti, Np" uniqKey="Tatonetti N">NP Tatonetti</name>
</author>
<author>
<name sortKey="Landt, Sg" uniqKey="Landt S">SG Landt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mjolsness, E" uniqKey="Mjolsness E">E Mjolsness</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Das, D" uniqKey="Das D">D Das</name>
</author>
<author>
<name sortKey="Banerjee, N" uniqKey="Banerjee N">N Banerjee</name>
</author>
<author>
<name sortKey="Zhang, Mq" uniqKey="Zhang M">MQ Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lam, Fh" uniqKey="Lam F">FH Lam</name>
</author>
<author>
<name sortKey="Steger, Dj" uniqKey="Steger D">DJ Steger</name>
</author>
<author>
<name sortKey="O Hea, Ek" uniqKey="O Hea E">EK O’Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mirny, La" uniqKey="Mirny L">LA Mirny</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, Xy" uniqKey="Li X">XY Li</name>
</author>
<author>
<name sortKey="Thomas, S" uniqKey="Thomas S">S Thomas</name>
</author>
<author>
<name sortKey="Sabo, Pj" uniqKey="Sabo P">PJ Sabo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Choi, Jk" uniqKey="Choi J">JK Choi</name>
</author>
<author>
<name sortKey="Kim, Yj" uniqKey="Kim Y">YJ Kim</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lidor Nili, E" uniqKey="Lidor Nili E">E Lidor Nili</name>
</author>
<author>
<name sortKey="Field, Y" uniqKey="Field Y">Y Field</name>
</author>
<author>
<name sortKey="Lubling, Y" uniqKey="Lubling Y">Y Lubling</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raveh Sadka, T" uniqKey="Raveh Sadka T">T Raveh-Sadka</name>
</author>
<author>
<name sortKey="Levo, M" uniqKey="Levo M">M Levo</name>
</author>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kaplan, N" uniqKey="Kaplan N">N Kaplan</name>
</author>
<author>
<name sortKey="Moore, Ik" uniqKey="Moore I">IK Moore</name>
</author>
<author>
<name sortKey="Fondufe Mittendorf, Y" uniqKey="Fondufe Mittendorf Y">Y Fondufe-Mittendorf</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Der Heijden, T" uniqKey="Van Der Heijden T">T van der Heijden</name>
</author>
<author>
<name sortKey="Van Vugt, Jj" uniqKey="Van Vugt J">JJ van Vugt</name>
</author>
<author>
<name sortKey="Logie, C" uniqKey="Logie C">C Logie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Segal, E" uniqKey="Segal E">E Segal</name>
</author>
<author>
<name sortKey="Widom, J" uniqKey="Widom J">J Widom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lee, Ck" uniqKey="Lee C">CK Lee</name>
</author>
<author>
<name sortKey="Shibata, Y" uniqKey="Shibata Y">Y Shibata</name>
</author>
<author>
<name sortKey="Rao, B" uniqKey="Rao B">B Rao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shivaswamy, S" uniqKey="Shivaswamy S">S Shivaswamy</name>
</author>
<author>
<name sortKey="Bhinge, A" uniqKey="Bhinge A">A Bhinge</name>
</author>
<author>
<name sortKey="Zhao, Y" uniqKey="Zhao Y">Y Zhao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zeevi, D" uniqKey="Zeevi D">D Zeevi</name>
</author>
<author>
<name sortKey="Sharon, E" uniqKey="Sharon E">E Sharon</name>
</author>
<author>
<name sortKey="Lotan Pompan, M" uniqKey="Lotan Pompan M">M Lotan-Pompan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yang, Yh" uniqKey="Yang Y">YH Yang</name>
</author>
<author>
<name sortKey="Dudoit, S" uniqKey="Dudoit S">S Dudoit</name>
</author>
<author>
<name sortKey="Luu, P" uniqKey="Luu P">P Luu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oshlack, A" uniqKey="Oshlack A">A Oshlack</name>
</author>
<author>
<name sortKey="Wakefield, Mj" uniqKey="Wakefield M">MJ Wakefield</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kalir, S" uniqKey="Kalir S">S Kalir</name>
</author>
<author>
<name sortKey="Mcclure, J" uniqKey="Mcclure J">J McClure</name>
</author>
<author>
<name sortKey="Pabbaraju, K" uniqKey="Pabbaraju K">K Pabbaraju</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Meyer, P" uniqKey="Meyer P">P Meyer</name>
</author>
<author>
<name sortKey="Siwo, G" uniqKey="Siwo G">G Siwo</name>
</author>
<author>
<name sortKey="Zeevi, D" uniqKey="Zeevi D">D Zeevi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brukner, I" uniqKey="Brukner I">I Brukner</name>
</author>
<author>
<name sortKey="Sanchez, R" uniqKey="Sanchez R">R Sánchez</name>
</author>
<author>
<name sortKey="Suck, D" uniqKey="Suck D">D Suck</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Olson, Wk" uniqKey="Olson W">WK Olson</name>
</author>
<author>
<name sortKey="Gorin, Aa" uniqKey="Gorin A">AA Gorin</name>
</author>
<author>
<name sortKey="Lu, Xj" uniqKey="Lu X">XJ Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sivolob, Av" uniqKey="Sivolob A">AV Sivolob</name>
</author>
<author>
<name sortKey="Khrapunov, Sn" uniqKey="Khrapunov S">SN Khrapunov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Raveh Sadka, T" uniqKey="Raveh Sadka T">T Raveh-Sadka</name>
</author>
<author>
<name sortKey="Levo, M" uniqKey="Levo M">M Levo</name>
</author>
<author>
<name sortKey="Shabi, U" uniqKey="Shabi U">U Shabi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lascaris, Rf" uniqKey="Lascaris R">RF Lascaris</name>
</author>
<author>
<name sortKey="Mager, Wh" uniqKey="Mager W">WH Mager</name>
</author>
<author>
<name sortKey="Planta, Rj" uniqKey="Planta R">RJ Planta</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Packer, Mj" uniqKey="Packer M">MJ Packer</name>
</author>
<author>
<name sortKey="Dauncey, Mp" uniqKey="Dauncey M">MP Dauncey</name>
</author>
<author>
<name sortKey="Hunter, Ca" uniqKey="Hunter C">CA Hunter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Laurens, N" uniqKey="Laurens N">N Laurens</name>
</author>
<author>
<name sortKey="Rusling, Da" uniqKey="Rusling D">DA Rusling</name>
</author>
<author>
<name sortKey="Pernstich, C" uniqKey="Pernstich C">C Pernstich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Starr, Db" uniqKey="Starr D">DB Starr</name>
</author>
<author>
<name sortKey="Hoopes, Bc" uniqKey="Hoopes B">BC Hoopes</name>
</author>
<author>
<name sortKey="Hawley, Dk" uniqKey="Hawley D">DK Hawley</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vijayan, V" uniqKey="Vijayan V">V Vijayan</name>
</author>
<author>
<name sortKey="Zuzow, R" uniqKey="Zuzow R">R Zuzow</name>
</author>
<author>
<name sortKey="O Hea, Ek" uniqKey="O Hea E">EK O’Shea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Parvin, Jd" uniqKey="Parvin J">JD Parvin</name>
</author>
<author>
<name sortKey="Mccormick, Rj" uniqKey="Mccormick R">RJ McCormick</name>
</author>
<author>
<name sortKey="Sharp, Pa" uniqKey="Sharp P">PA Sharp</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bosio, Mc" uniqKey="Bosio M">MC Bosio</name>
</author>
<author>
<name sortKey="Negri, R" uniqKey="Negri R">R Negri</name>
</author>
<author>
<name sortKey="Dieci, G" uniqKey="Dieci G">G Dieci</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yonetani, Y" uniqKey="Yonetani Y">Y Yonetani</name>
</author>
<author>
<name sortKey="Kono, H" uniqKey="Kono H">H Kono</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, B" uniqKey="Li B">B Li</name>
</author>
<author>
<name sortKey="Vilardell, J" uniqKey="Vilardell J">J Vilardell</name>
</author>
<author>
<name sortKey="Warner, Jr" uniqKey="Warner J">JR Warner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Deutschbauer, Am" uniqKey="Deutschbauer A">AM Deutschbauer</name>
</author>
<author>
<name sortKey="Jaramillo, Df" uniqKey="Jaramillo D">DF Jaramillo</name>
</author>
<author>
<name sortKey="Proctor, M" uniqKey="Proctor M">M Proctor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Warner, Jr" uniqKey="Warner J">JR Warner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Spahn, Cm" uniqKey="Spahn C">CM Spahn</name>
</author>
<author>
<name sortKey="Beckmann, R" uniqKey="Beckmann R">R Beckmann</name>
</author>
<author>
<name sortKey="Eswar, N" uniqKey="Eswar N">N Eswar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ju, Q" uniqKey="Ju Q">Q Ju</name>
</author>
<author>
<name sortKey="Warner, Jr" uniqKey="Warner J">JR Warner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Causton, Hc" uniqKey="Causton H">HC Causton</name>
</author>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
<author>
<name sortKey="Koh, Ss" uniqKey="Koh S">SS Koh</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Oinn, T" uniqKey="Oinn T">T Oinn</name>
</author>
<author>
<name sortKey="Addis, M" uniqKey="Addis M">M Addis</name>
</author>
<author>
<name sortKey="Ferris, J" uniqKey="Ferris J">J Ferris</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Go I, Jr" uniqKey="Go I J">JR Goñi</name>
</author>
<author>
<name sortKey="Fenollosa, C" uniqKey="Fenollosa C">C Fenollosa</name>
</author>
<author>
<name sortKey="Perez, A" uniqKey="Perez A">A Pérez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Satchwell, Sc" uniqKey="Satchwell S">SC Satchwell</name>
</author>
<author>
<name sortKey="Drew, Hr" uniqKey="Drew H">HR Drew</name>
</author>
<author>
<name sortKey="Travers, Aa" uniqKey="Travers A">AA Travers</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hall, M" uniqKey="Hall M">M Hall</name>
</author>
<author>
<name sortKey="Frank, E" uniqKey="Frank E">E Frank</name>
</author>
<author>
<name sortKey="Holmes, G" uniqKey="Holmes G">G Holmes</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Siwo, G" uniqKey="Siwo G">G Siwo</name>
</author>
<author>
<name sortKey="Rider, A" uniqKey="Rider A">A Rider</name>
</author>
<author>
<name sortKey="Tan, A" uniqKey="Tan A">A Tan</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="methods-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">F1000Res</journal-id>
<journal-id journal-id-type="iso-abbrev">F1000Res</journal-id>
<journal-id journal-id-type="pmc">F1000Research</journal-id>
<journal-title-group>
<journal-title>F1000Research</journal-title>
</journal-title-group>
<issn pub-type="epub">2046-1402</issn>
<publisher>
<publisher-name>F1000Research</publisher-name>
<publisher-loc>London, UK</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">27347373</article-id>
<article-id pub-id-type="pmc">4916984</article-id>
<article-id pub-id-type="doi">10.12688/f1000research.7485.1</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Method Article</subject>
</subj-group>
<subj-group>
<subject>Articles</subject>
<subj-group>
<subject>Bioinformatics</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Prediction of fine-tuned promoter activity from DNA sequence</article-title>
<fn-group content-type="pub-status">
<fn>
<p>[version 1; referees: 1 approved</p>
</fn>
</fn-group>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Siwo</surname>
<given-names>Geoffrey</given-names>
</name>
<xref ref-type="corresp" rid="c1">a</xref>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a2">2</xref>
<xref ref-type="aff" rid="a4">4</xref>
<xref ref-type="aff" rid="a5">5</xref>
<xref ref-type="aff" rid="a6">6</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Rider</surname>
<given-names>Andrew</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a3">3</xref>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tan</surname>
<given-names>Asako</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a2">2</xref>
<xref ref-type="aff" rid="a7">7</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Pinapati</surname>
<given-names>Richard</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a2">2</xref>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Emrich</surname>
<given-names>Scott</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a3">3</xref>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Chawla</surname>
<given-names>Nitesh</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a3">3</xref>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ferdig</surname>
<given-names>Michael</given-names>
</name>
<xref ref-type="aff" rid="a1">1</xref>
<xref ref-type="aff" rid="a2">2</xref>
<xref ref-type="aff" rid="a4">4</xref>
</contrib>
<aff id="a1">
<label>1</label>
Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN, USA</aff>
<aff id="a2">
<label>2</label>
Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, USA</aff>
<aff id="a3">
<label>3</label>
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA</aff>
<aff id="a4">
<label>4</label>
Interdisciplinary Center for Network Science and Applications (iCeNSA), University of Notre Dame, Notre Dame, IN, USA</aff>
<aff id="a5">
<label>5</label>
IBM TJ Watson Research Center, NY, USA</aff>
<aff id="a6">
<label>6</label>
IBM Research-Africa, Johannesberg, South Africa</aff>
<aff id="a7">
<label>7</label>
Epicentre, Madison, WI, USA</aff>
</contrib-group>
<author-notes>
<corresp id="c1">
<label>a</label>
<email xlink:href="mailto:siwomolbio@gmail.com">siwomolbio@gmail.com</email>
</corresp>
<fn fn-type="con">
<p>GHS, RSP, AT, AKR conceived the methods and performed the analysis. All authors wrote the manuscript.</p>
</fn>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
The authors declare that they have no competing interests.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>11</day>
<month>2</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>5</volume>
<elocation-id>158</elocation-id>
<history>
<date date-type="accepted">
<day>8</day>
<month>2</month>
<year>2016</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: © 2016 Siwo G et al.</copyright-statement>
<copyright-year>2016</copyright-year>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:type="simple" xlink:href="f1000research-5-8064.pdf"></self-uri>
<abstract>
<p>The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring biological systems.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Promoter activity</kwd>
<kwd>Gene expression</kwd>
<kwd>Expression prediction</kwd>
<kwd>DREAM challenges</kwd>
<kwd>Machine learning</kwd>
<kwd>Gene regulation</kwd>
<kwd>DNA sequence</kwd>
<kwd>Transcription modeling</kwd>
</kwd-group>
<funding-group>
<funding-statement>The author(s) declared that no grants were involved in supporting this work.</funding-statement>
</funding-group>
</article-meta>
</front>
<sub-article id="report14225" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.8064.r14225</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Grau</surname>
<given-names>Jan</given-names>
</name>
<xref ref-type="aff" rid="r14225a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r14225a1">
<label>1</label>
Institute of Computer Science, Martin Luther University of Halle-Wittenberg, Halle, Germany</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>20</day>
<month>6</month>
<year>2016</year>
</pub-date>
<related-article id="d35e3974" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.7485.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve-with-reservations</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>The authors present FIrST, an approach for predicting promoter activity from sequence, which won one of the DREAM6 challenges. FIrST is using only simple sequence features in a limited range (100 bp) upstream of the translation start site for making its predictions, which distinguishes it from several other approaches in this field.</p>
<p> Prediction results are convincing and the method appears to be sound. However, currently the method is not described detailed enough. In addition, I have a few further major and minor concerns regarding the current version of the manuscript:</p>
<p> Major comments:
<list list-type="order">
<list-item>
<p>In the list of features described in section "Feature extraction", some seem redundant to me. For instance, the trinucleotide parameters for bendability are just computed from the k-mers for k=3. Also nucleosome binding prediction was based on trinucleotide preference. Please explain why it may be useful to also include those 3-mer-derived features in addition to the 3-mers themselves.</p>
</list-item>
<list-item>
<p>The description of methods in section "Machine learning model exploration" is too coarse. Please provide more detail on the SVMs, linear regression, and regression trees employed. It also remains unclear if the scales of features are normalized somehow, before their values are provided to the SVM.</p>
</list-item>
<list-item>
<p>No details are given on the selected 3-mers and 4-mers (Table 1). Please provide a list of the specific k-mers selected by FIrST. It may also be reasonable to discuss potential biological reasons for their importance (as partly covered for TATA-boxes on page 6).</p>
</list-item>
<list-item>
<p>Considering Fig. 3, I wondered if the difference in deformability may be related to transcription initiation. Or, stated differently, might we observe an ever clearer signal if all sequences (and their deformability profiles) would be aligned by the transcription start site (TSS) instead of the TrSS? One idea in the same direction, which could contribute to the novelty of the manuscript, would be to evaluate similar profiles (of sequences aligned to TSS or TrSS) for all features found to be informative by FIrST. For instance, one could expect to see something like general fluctuations of G/C content, or the TATA-box in 4-mer profiles as a spike approx. 35 bp before the TSS. From my perspective, this might improve the novelty of the manuscripts and the interpretation of features.</p>
</list-item>
</list>
</p>
<p> Minor comments:
<list list-type="order">
<list-item>
<p>The data from the DREAM6 challenge only consider a special subset of genes (ribosomal genes) and only in yeast. It is unclear if the features derived by the authors' method would also be informative for higher eukaryotes. I understand that this question cannot be finally answered from the DREAM6 data, but the authors might comment on this issue.</p>
</list-item>
<list-item>
<p>Figure 1B remains a bit unclear. In the caption and the main text, the authors explain that they use non-overlapping 100bp sub-sequences. However, from the figure it rather seems that they consider upstream sequences of 300 bp, 200 bp and 100 bp (and the full promoter sequence) relative to the translation start site. Please clarify.</p>
</list-item>
<list-item>
<p>In section "DREAM6 challenge data" of "Methods", the authors refer to "the sequence 1200 bp upstream of a gene", where "upstream of the translation start site" (as in the remainder of the text) would be more specific.</p>
</list-item>
<list-item>
<p>In section "Feature extraction", the authors explain that "each promoter sequence was divided into 100 bp non-overlapping windows", while in the previous section they explain that the full 1200 bp sequences do not extend over the nearest gene. From my understanding, this may result in some of the sequences being shorter than 1200 bp, and their length might not be dividable by 100. Please explain how such cases are handled.</p>
</list-item>
<list-item>
<p>At the end of section "Validation of model by DREAM6 consortium", the authors explain that "the overall score was defined as the product of the four P-values", whereas later they explain that -log
<sub>10</sub>
of the geometric mean of the p-values was used as the overall measure. Although bot definition are equivalent with respect to the resulting ranking, I would suggest to provide one consistent definition of the overall score.</p>
</list-item>
<list-item>
<p>From the manuscript it did not become fully clear if the TA-tracts (also termed poly(dA-dT) tracts in some parts of the manuscript) are tracts of poly "A or T" or tracts of poly "AT"-dinucleotides.</p>
</list-item>
<list-item>
<p>In section "Error profile of SVM promoter activity model", the authors explain that natural promoters had (slightly) lower activity than synthetic promoters and that the prediction error of the SVM is lower for natural promoters. However, I did not get the idea, why this should explain that low activity genes had larger prediction errors.</p>
</list-item>
<list-item>
<p>In section "Error profile of SVM promoter activity model", the authors explain that one reason why FIrST did not perform well for synthetic promoters is that most mutations had been introduced outside the 100 bp range considered by FIrST. However, this reasoning partly contradicts the claim of the authors that most of the transcriptional activity may be explained from the sequence in that 100 bp window. If this would truly be the case, mutations outside this range should have only minor effects.</p>
</list-item>
<list-item>
<p>In the Discussion, the authors mention that TF binding motifs are 6 to 8 bp in length. While this may be true for several yeast TFs, it is not correct for eukaryotes in general and motifs may be wider than 10 bp.</p>
</list-item>
</list>
</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
</body>
</sub-article>
<sub-article id="report12380" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.8064.r12380</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Ruan</surname>
<given-names>Jianhua</given-names>
</name>
<xref ref-type="aff" rid="r12380a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r12380a1">
<label>1</label>
Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX, USA</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>7</day>
<month>4</month>
<year>2016</year>
</pub-date>
<related-article id="d35e4079" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.7485.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve-with-reservations</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>This article describes the winning method of the DREAM6 promoter activity prediction challenge. While a meta analysis of the competing methods participated in the challenge has already been published (Meyer
<italic>et al.</italic>
, 2013), this article provides more details of the winning method and some additional analysis of the predictive model, which may lead to better understanding of the predictability of gene transcription. While its contribution is undoubtable, this article should be revised to address several issues:</p>
<p>Major issues:
<list list-type="order">
<list-item>
<p>The 23 features utilized by the SVM model (as well as their coefficients in the model) is not provided explicitly in the main text nor in the supplement file. Table 1 in the main text shows that 6 trinucleotides and 12 tetranucleotides are important features, but it is nowhere to be found which tri- and tetra-nucleotides they are. For lengths of T or TA-tracts, the supplement file shows several different values, including mean, median and stdev. It is unclear which one is actually used by the SVM model. Similarly, supplement file shows 79 values for deformability and it is unknown which one is used. </p>
</list-item>
<list-item>
<p>In the case of ranking the features by their SVM coefficients, the authors need to clarify if the feature values were normalized prior to model building, as these features are on very different scales and if not normalized the ranking of the coefficients are not very meaningful.</p>
</list-item>
<list-item>
<p>The main conclusion in the subsection "Error profile of SVM promoter activity model" do not seem to make sense. First, promoters of low activity had larger prediction error. Then the authors stated that natural promoters had lower activity. This seems to contradict with their observation that the prediction error was significantly less for natural promoters than for mutated promoters.</p>
</list-item>
</list>
</p>
<p>Minor issues:
<list list-type="order">
<list-item>
<p>Authors only mentioned that feature selection was done in WEKA with wrapper. More details need to be given. For example, what was the selection strategy used by the wrapper, e.g., exhaustive search, greedy forward search, backward search, or other types of heuristics? </p>
</list-item>
<list-item>
<p>What is the purpose of first training 1000 SVM classifiers using 66% of data as training and 34% as testing, and then another 500 SVM classifiers using 80% as training and 20% as testing?</p>
</list-item>
</list>
</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="rep-ref-12380-1">
<label>1</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meyer</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Siwo</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Zeevi</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Sharon</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Norel</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Segal</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Stolovitzky</surname>
<given-names>G</given-names>
</name>
</person-group>
:
<article-title>Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach.</article-title>
<source>
<italic>Genome Res</italic>
</source>
.
<year>2013</year>
;
<volume>23</volume>
(
<issue>11</issue>
) :
<elocation-id>10.1101/gr.157420.113</elocation-id>
<fpage>1928</fpage>
-
<lpage>37</lpage>
<pub-id pub-id-type="doi">10.1101/gr.157420.113</pub-id>
<pub-id pub-id-type="pmid">23950146</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</sub-article>
<sub-article id="report12530" article-type="peer-review">
<front-stub>
<article-id pub-id-type="doi">10.5256/f1000research.8064.r12530</article-id>
<title-group>
<article-title>Referee response for version 1</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Pavlidis</surname>
<given-names>Paul</given-names>
</name>
<xref ref-type="aff" rid="r12530a1">1</xref>
<role>Referee</role>
</contrib>
<aff id="r12530a1">
<label>1</label>
Centre for High-Throughput Biology and Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada</aff>
</contrib-group>
<author-notes>
<fn fn-type="COI-statement">
<p>
<bold>Competing interests: </bold>
No competing interests were disclosed.</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>2</day>
<month>3</month>
<year>2016</year>
</pub-date>
<related-article id="d35e4232" related-article-type="peer-reviewed-article" ext-link-type="doi" xlink:href="10.12688/f1000research.7485.1">Version 1</related-article>
<custom-meta-group>
<custom-meta>
<meta-name>recommendation</meta-name>
<meta-value>approve</meta-value>
</custom-meta>
</custom-meta-group>
</front-stub>
<body>
<p>Siwo
<italic>et al</italic>
. give a detailed report of their entry to the DREAM promoter activity prediction assessment, conducted in 2011. The paper describing the results of the assessment appeared in 2013 (Meyer
<italic>et al</italic>
.), and the entry from Siwo
<italic>et al</italic>
. (“FiRST”) was the top-performer overall. Meyer
<italic>et al</italic>
. gives few details about the specific methods, mentioning only that the FiRST entry used an SVM and did not use TF binding site motif information. Here it is clarified that FiRST is a simple method that uses only part of the sequence and the most prominent features were about nucleotide content.</p>
<p>Because it is perhaps a little eye-opening (even embarrassing, depending on one’s point of view) that the best method in the assessment is so simple, this paper is an important footnote to Meyer
<italic>et al</italic>
. but it could be fleshed out further to get at what is going on. My suggestions for revisions are to give more detail about the properties of the sequences used and the relationship to performance.</p>
<p>FiRST predicts from only the 100 bases of sequence upstream of the translation start (which was considered as part of the promoter by DREAM; I note this is not “upstream of the gene” as described by Siwo
<italic>et al</italic>
. in the methods section), and that their predictions were dominated by the effect of a simple measure of G content. Siwo
<italic>et al</italic>
. report that they did worse at predicting the synthetically mutated promoters (this was apparently not true overall across methods as reported by Meyer
<italic>et al</italic>
.). In Meyer
<italic>et al</italic>
., adding tf binding information to FiRST improved performance.</p>
<p>The authors mention this, but the most important reason that FiRST does poorly at predicting the synthetic mutations seems to be that most of the mutations (seems to be 29 out of 33, based on Table 1 of Meyer
<italic>et al</italic>
.) are not in the 100 bp window used. That is, because in most cases these synthetic sequences were (as I understand it) identical in features to other examples while having different activities, for the purposes of FiRST, they could only introduce prediction errors. In light of this fact the rest of the speculation about why performance varied in this way seems extraneous.</p>
<p>It would also be useful to see more detailed information on the sequences used (e.g., the G content or other features), and the prediction error in each case. How well does one predict using G content alone? This might all be reconstructed from the data supplement helpfully provided, but the authors should consider providing the analysis. It also seems reasonable to ask for more details about the performance of other sequence windows.</p>
<p>The main other missing piece from this paper is any discussion or evidence that the method works beyond the narrow confines of the DREAM setup. Even for the RP genes, does it make a useful prediction, that increasing the G content of RP promoters in that 100 bp window will decrease promoter activity? I am fine with leaving this as “future work” but it would be worth mentioning.</p>
<p>Figure 2B is apparently the same as part of Figure 1E from Meyer
<italic>et al</italic>
., except FiRST is not marked (actually there is a small difference in the values plotted; the combined score for FiRST looks closer to 2 than the 1.87 reported and plotted in Meyer
<italic>et al</italic>
.). The authors should clearly cite Meyer
<italic>et al</italic>
. in the figure caption as the source of the data for this figure, or simply point the readers to Meyer
<italic>et al</italic>
., or else explain where the data came from if not from Meyer
<italic>et al</italic>
.</p>
<p>I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
</body>
</sub-article>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000897 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd -nk 000897 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Curation
   |type=    RBID
   |clé=     PMC:4916984
   |texte=   Prediction of fine-tuned promoter activity from DNA sequence
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Curation/RBID.i   -Sk "pubmed:27347373" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Curation/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021