Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000306 ( Pmc/Corpus ); précédent : 0003059; suivant : 0003070 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks</title>
<author>
<name sortKey="Nguyen, Quang H" sort="Nguyen, Quang H" uniqKey="Nguyen Q" first="Quang H." last="Nguyen">Quang H. Nguyen</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.440792.c</institution-id>
<institution>School of Information and Communication Technology, Hanoi University of Science and Technology,</institution>
</institution-wrap>
1 Dai Co Viet, Hanoi 100000, Vietnam</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nguyen Vo, Thanh Hoang" sort="Nguyen Vo, Thanh Hoang" uniqKey="Nguyen Vo T" first="Thanh-Hoang" last="Nguyen-Vo">Thanh-Hoang Nguyen-Vo</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2292 3111</institution-id>
<institution-id institution-id-type="GRID">grid.267827.e</institution-id>
<institution>School of Mathematics and Statistics, Victoria University of Wellington,</institution>
</institution-wrap>
Gate 7, Kelburn Parade, Wellington, 6142 New Zealand</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Le, Nguyen Quoc Khanh" sort="Le, Nguyen Quoc Khanh" uniqKey="Le N" first="Nguyen Quoc Khanh" last="Le">Nguyen Quoc Khanh Le</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9337 0481</institution-id>
<institution-id institution-id-type="GRID">grid.412896.0</institution-id>
<institution>Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University,</institution>
</institution-wrap>
Keelung Road, Da’an Distric, Taipei City, 106 Taiwan (R.O.C.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Do, Trang T T" sort="Do, Trang T T" uniqKey="Do T" first="Trang T. T." last="Do">Trang T. T. Do</name>
<affiliation>
<nlm:aff id="Aff4">Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rahardja, Susanto" sort="Rahardja, Susanto" uniqKey="Rahardja S" first="Susanto" last="Rahardja">Susanto Rahardja</name>
<affiliation>
<nlm:aff id="Aff5">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0307 1240</institution-id>
<institution-id institution-id-type="GRID">grid.440588.5</institution-id>
<institution>School of Marine Science and Technology, Northwestern Polytechnical University,</institution>
</institution-wrap>
127 West Youyi Road, Xi’an 710072, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nguyen, Binh P" sort="Nguyen, Binh P" uniqKey="Nguyen B" first="Binh P." last="Nguyen">Binh P. Nguyen</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2292 3111</institution-id>
<institution-id institution-id-type="GRID">grid.267827.e</institution-id>
<institution>School of Mathematics and Statistics, Victoria University of Wellington,</institution>
</institution-wrap>
Gate 7, Kelburn Parade, Wellington, 6142 New Zealand</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">31874637</idno>
<idno type="pmc">6929481</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929481</idno>
<idno type="RBID">PMC:6929481</idno>
<idno type="doi">10.1186/s12864-019-6336-3</idno>
<date when="2019">2019</date>
<idno type="wicri:Area/Pmc/Corpus">000306</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000306</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks</title>
<author>
<name sortKey="Nguyen, Quang H" sort="Nguyen, Quang H" uniqKey="Nguyen Q" first="Quang H." last="Nguyen">Quang H. Nguyen</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="GRID">grid.440792.c</institution-id>
<institution>School of Information and Communication Technology, Hanoi University of Science and Technology,</institution>
</institution-wrap>
1 Dai Co Viet, Hanoi 100000, Vietnam</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nguyen Vo, Thanh Hoang" sort="Nguyen Vo, Thanh Hoang" uniqKey="Nguyen Vo T" first="Thanh-Hoang" last="Nguyen-Vo">Thanh-Hoang Nguyen-Vo</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2292 3111</institution-id>
<institution-id institution-id-type="GRID">grid.267827.e</institution-id>
<institution>School of Mathematics and Statistics, Victoria University of Wellington,</institution>
</institution-wrap>
Gate 7, Kelburn Parade, Wellington, 6142 New Zealand</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Le, Nguyen Quoc Khanh" sort="Le, Nguyen Quoc Khanh" uniqKey="Le N" first="Nguyen Quoc Khanh" last="Le">Nguyen Quoc Khanh Le</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9337 0481</institution-id>
<institution-id institution-id-type="GRID">grid.412896.0</institution-id>
<institution>Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University,</institution>
</institution-wrap>
Keelung Road, Da’an Distric, Taipei City, 106 Taiwan (R.O.C.)</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Do, Trang T T" sort="Do, Trang T T" uniqKey="Do T" first="Trang T. T." last="Do">Trang T. T. Do</name>
<affiliation>
<nlm:aff id="Aff4">Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rahardja, Susanto" sort="Rahardja, Susanto" uniqKey="Rahardja S" first="Susanto" last="Rahardja">Susanto Rahardja</name>
<affiliation>
<nlm:aff id="Aff5">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0307 1240</institution-id>
<institution-id institution-id-type="GRID">grid.440588.5</institution-id>
<institution>School of Marine Science and Technology, Northwestern Polytechnical University,</institution>
</institution-wrap>
127 West Youyi Road, Xi’an 710072, China</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Nguyen, Binh P" sort="Nguyen, Binh P" uniqKey="Nguyen B" first="Binh P." last="Nguyen">Binh P. Nguyen</name>
<affiliation>
<nlm:aff id="Aff2">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2292 3111</institution-id>
<institution-id institution-id-type="GRID">grid.267827.e</institution-id>
<institution>School of Mathematics and Statistics, Victoria University of Wellington,</institution>
</institution-wrap>
Gate 7, Kelburn Parade, Wellington, 6142 New Zealand</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2019">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p id="Par1">Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and
<italic>k</italic>
-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.’s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance.</p>
</sec>
<sec>
<title>Results</title>
<p id="Par2">Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews’s correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively.</p>
</sec>
<sec>
<title>Conclusions</title>
<p id="Par3">iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Pennacchio, La" uniqKey="Pennacchio L">LA Pennacchio</name>
</author>
<author>
<name sortKey="Bickmore, W" uniqKey="Bickmore W">W Bickmore</name>
</author>
<author>
<name sortKey="Dean, A" uniqKey="Dean A">A Dean</name>
</author>
<author>
<name sortKey="Nobrega, Ma" uniqKey="Nobrega M">MA Nobrega</name>
</author>
<author>
<name sortKey="Bejerano, G" uniqKey="Bejerano G">G Bejerano</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Fang, L" uniqKey="Fang L">L Fang</name>
</author>
<author>
<name sortKey="Long, R" uniqKey="Long R">R Long</name>
</author>
<author>
<name sortKey="Lan, X" uniqKey="Lan X">X Lan</name>
</author>
<author>
<name sortKey="Chou, K C" uniqKey="Chou K">K-C Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heintzman, Nd" uniqKey="Heintzman N">ND Heintzman</name>
</author>
<author>
<name sortKey="Stuart, Rk" uniqKey="Stuart R">RK Stuart</name>
</author>
<author>
<name sortKey="Hon, G" uniqKey="Hon G">G Hon</name>
</author>
<author>
<name sortKey="Fu, Y" uniqKey="Fu Y">Y Fu</name>
</author>
<author>
<name sortKey="Ching, Cw" uniqKey="Ching C">CW Ching</name>
</author>
<author>
<name sortKey="Hawkins, Rd" uniqKey="Hawkins R">RD Hawkins</name>
</author>
<author>
<name sortKey="Barrera, Lo" uniqKey="Barrera L">LO Barrera</name>
</author>
<author>
<name sortKey="Calcar, Sv" uniqKey="Calcar S">SV Calcar</name>
</author>
<author>
<name sortKey="Qu, C" uniqKey="Qu C">C Qu</name>
</author>
<author>
<name sortKey="Ching, Ka" uniqKey="Ching K">KA Ching</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
<author>
<name sortKey="Weng, Z" uniqKey="Weng Z">Z Weng</name>
</author>
<author>
<name sortKey="Green, Rd" uniqKey="Green R">RD Green</name>
</author>
<author>
<name sortKey="Crawford, Ge" uniqKey="Crawford G">GE Crawford</name>
</author>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Blow, Mj" uniqKey="Blow M">MJ Blow</name>
</author>
<author>
<name sortKey="Li, Z" uniqKey="Li Z">Z Li</name>
</author>
<author>
<name sortKey="Zhang, T" uniqKey="Zhang T">T Zhang</name>
</author>
<author>
<name sortKey="Akiyama, Ja" uniqKey="Akiyama J">JA Akiyama</name>
</author>
<author>
<name sortKey="Holt, A" uniqKey="Holt A">A Holt</name>
</author>
<author>
<name sortKey="Plajzer Frick, I" uniqKey="Plajzer Frick I">I Plajzer-Frick</name>
</author>
<author>
<name sortKey="Shoukry, M" uniqKey="Shoukry M">M Shoukry</name>
</author>
<author>
<name sortKey="Wright, C" uniqKey="Wright C">C Wright</name>
</author>
<author>
<name sortKey="Chen, F" uniqKey="Chen F">F Chen</name>
</author>
<author>
<name sortKey="Afzal, V" uniqKey="Afzal V">V Afzal</name>
</author>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
<author>
<name sortKey="Rubin, Em" uniqKey="Rubin E">EM Rubin</name>
</author>
<author>
<name sortKey="Pennacchio, La" uniqKey="Pennacchio L">LA Pennacchio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kulaeva, Oi" uniqKey="Kulaeva O">OI Kulaeva</name>
</author>
<author>
<name sortKey="Nizovtseva, Ev" uniqKey="Nizovtseva E">EV Nizovtseva</name>
</author>
<author>
<name sortKey="Polikanov, Ys" uniqKey="Polikanov Y">YS Polikanov</name>
</author>
<author>
<name sortKey="Ulianov, Sv" uniqKey="Ulianov S">SV Ulianov</name>
</author>
<author>
<name sortKey="Studitsky, Vm" uniqKey="Studitsky V">VM Studitsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, G" uniqKey="Zhang G">G Zhang</name>
</author>
<author>
<name sortKey="Shi, J" uniqKey="Shi J">J Shi</name>
</author>
<author>
<name sortKey="Zhu, S" uniqKey="Zhu S">S Zhu</name>
</author>
<author>
<name sortKey="Lan, Y" uniqKey="Lan Y">Y Lan</name>
</author>
<author>
<name sortKey="Xu, L" uniqKey="Xu L">L Xu</name>
</author>
<author>
<name sortKey="Yuan, H" uniqKey="Yuan H">H Yuan</name>
</author>
<author>
<name sortKey="Liao, G" uniqKey="Liao G">G Liao</name>
</author>
<author>
<name sortKey="Liu, X" uniqKey="Liu X">X Liu</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Xiao, Y" uniqKey="Xiao Y">Y Xiao</name>
</author>
<author>
<name sortKey="Li, X" uniqKey="Li X">X Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Corradin, O" uniqKey="Corradin O">O Corradin</name>
</author>
<author>
<name sortKey="Scacheri, Pc" uniqKey="Scacheri P">PC Scacheri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Herz, H M" uniqKey="Herz H">H-M Herz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Boyd, M" uniqKey="Boyd M">M Boyd</name>
</author>
<author>
<name sortKey="Thodberg, M" uniqKey="Thodberg M">M Thodberg</name>
</author>
<author>
<name sortKey="Vitezic, M" uniqKey="Vitezic M">M Vitezic</name>
</author>
<author>
<name sortKey="Bornholdt, J" uniqKey="Bornholdt J">J Bornholdt</name>
</author>
<author>
<name sortKey="Vitting Seerup, K" uniqKey="Vitting Seerup K">K Vitting-Seerup</name>
</author>
<author>
<name sortKey="Chen, Y" uniqKey="Chen Y">Y Chen</name>
</author>
<author>
<name sortKey="Coskun, M" uniqKey="Coskun M">M Coskun</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Lo, Bzs" uniqKey="Lo B">BZS Lo</name>
</author>
<author>
<name sortKey="Klausen, P" uniqKey="Klausen P">P Klausen</name>
</author>
<author>
<name sortKey="Schweiger, Pj" uniqKey="Schweiger P">PJ Schweiger</name>
</author>
<author>
<name sortKey="Pedersen, Ag" uniqKey="Pedersen A">AG Pedersen</name>
</author>
<author>
<name sortKey="Rapin, N" uniqKey="Rapin N">N Rapin</name>
</author>
<author>
<name sortKey="Skovgaard, K" uniqKey="Skovgaard K">K Skovgaard</name>
</author>
<author>
<name sortKey="Dahlgaard, K" uniqKey="Dahlgaard K">K Dahlgaard</name>
</author>
<author>
<name sortKey="Andersson, R" uniqKey="Andersson R">R Andersson</name>
</author>
<author>
<name sortKey="Terkelsen, Tb" uniqKey="Terkelsen T">TB Terkelsen</name>
</author>
<author>
<name sortKey="Lilje, B" uniqKey="Lilje B">B Lilje</name>
</author>
<author>
<name sortKey="Troelsen, Jt" uniqKey="Troelsen J">JT Troelsen</name>
</author>
<author>
<name sortKey="Petersen, Am" uniqKey="Petersen A">AM Petersen</name>
</author>
<author>
<name sortKey="Jensen, Kb" uniqKey="Jensen K">KB Jensen</name>
</author>
<author>
<name sortKey="Gogenur, I" uniqKey="Gogenur I">I Gögenur</name>
</author>
<author>
<name sortKey="Thielsen, P" uniqKey="Thielsen P">P Thielsen</name>
</author>
<author>
<name sortKey="Seidelin, Jb" uniqKey="Seidelin J">JB Seidelin</name>
</author>
<author>
<name sortKey="Nielsen, Oh" uniqKey="Nielsen O">OH Nielsen</name>
</author>
<author>
<name sortKey="Bjerrum, Jt" uniqKey="Bjerrum J">JT Bjerrum</name>
</author>
<author>
<name sortKey="Sandelin, A" uniqKey="Sandelin A">A Sandelin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Visel, A" uniqKey="Visel A">A Visel</name>
</author>
<author>
<name sortKey="Bristow, J" uniqKey="Bristow J">J Bristow</name>
</author>
<author>
<name sortKey="A Pennacchio, L" uniqKey="A Pennacchio L">L A.Pennacchio</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zacher, B" uniqKey="Zacher B">B Zacher</name>
</author>
<author>
<name sortKey="Michel, M" uniqKey="Michel M">M Michel</name>
</author>
<author>
<name sortKey="Schwalb, B" uniqKey="Schwalb B">B Schwalb</name>
</author>
<author>
<name sortKey="Cramer, P" uniqKey="Cramer P">P Cramer</name>
</author>
<author>
<name sortKey="Tresch, A" uniqKey="Tresch A">A Tresch</name>
</author>
<author>
<name sortKey="Gagneur, J" uniqKey="Gagneur J">J Gagneur</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lai, Y T" uniqKey="Lai Y">Y-T Lai</name>
</author>
<author>
<name sortKey="Deem, Kd" uniqKey="Deem K">KD Deem</name>
</author>
<author>
<name sortKey="Borras Castells, F" uniqKey="Borras Castells F">F Borràs-Castells</name>
</author>
<author>
<name sortKey="Sambrani, N" uniqKey="Sambrani N">N Sambrani</name>
</author>
<author>
<name sortKey="Rudolf, H" uniqKey="Rudolf H">H Rudolf</name>
</author>
<author>
<name sortKey="Suryamohan, K" uniqKey="Suryamohan K">K Suryamohan</name>
</author>
<author>
<name sortKey="El Sherif, E" uniqKey="El Sherif E">E El-Sherif</name>
</author>
<author>
<name sortKey="Halfon, Ms" uniqKey="Halfon M">MS Halfon</name>
</author>
<author>
<name sortKey="Tomoyasu, Djm" uniqKey="Tomoyasu D">DJM Tomoyasu</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yip, Ky" uniqKey="Yip K">KY Yip</name>
</author>
<author>
<name sortKey="Cheng, C" uniqKey="Cheng C">C Cheng</name>
</author>
<author>
<name sortKey="Bhardwaj, N" uniqKey="Bhardwaj N">N Bhardwaj</name>
</author>
<author>
<name sortKey="Brown, Jb" uniqKey="Brown J">JB Brown</name>
</author>
<author>
<name sortKey="Leng, J" uniqKey="Leng J">J Leng</name>
</author>
<author>
<name sortKey="Kundaje, A" uniqKey="Kundaje A">A Kundaje</name>
</author>
<author>
<name sortKey="Rozowsky, J" uniqKey="Rozowsky J">J Rozowsky</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E Birney</name>
</author>
<author>
<name sortKey="Bickel, P" uniqKey="Bickel P">P Bickel</name>
</author>
<author>
<name sortKey="Snyder, M" uniqKey="Snyder M">M Snyder</name>
</author>
<author>
<name sortKey="Gerstein, M" uniqKey="Gerstein M">M Gerstein</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bernstein, Be" uniqKey="Bernstein B">BE Bernstein</name>
</author>
<author>
<name sortKey="Stamatoyannopoulos, Ja" uniqKey="Stamatoyannopoulos J">JA Stamatoyannopoulos</name>
</author>
<author>
<name sortKey="Costello, Jf" uniqKey="Costello J">JF Costello</name>
</author>
<author>
<name sortKey="Ren, B" uniqKey="Ren B">B Ren</name>
</author>
<author>
<name sortKey="Milosavljevic, A" uniqKey="Milosavljevic A">A Milosavljevic</name>
</author>
<author>
<name sortKey="Meissner, A" uniqKey="Meissner A">A Meissner</name>
</author>
<author>
<name sortKey="Kellis, M" uniqKey="Kellis M">M Kellis</name>
</author>
<author>
<name sortKey="Marra, Ma" uniqKey="Marra M">MA Marra</name>
</author>
<author>
<name sortKey="Beaudet, Al" uniqKey="Beaudet A">AL Beaudet</name>
</author>
<author>
<name sortKey="Ecker, Jr" uniqKey="Ecker J">JR Ecker</name>
</author>
<author>
<name sortKey="Farnham, Pj" uniqKey="Farnham P">PJ Farnham</name>
</author>
<author>
<name sortKey="Hirst, M" uniqKey="Hirst M">M Hirst</name>
</author>
<author>
<name sortKey="Lander, Es" uniqKey="Lander E">ES Lander</name>
</author>
<author>
<name sortKey="Mikkelsen, Ts" uniqKey="Mikkelsen T">TS Mikkelsen</name>
</author>
<author>
<name sortKey="Thomson, Ja" uniqKey="Thomson J">JA Thomson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rabani, M" uniqKey="Rabani M">M Rabani</name>
</author>
<author>
<name sortKey="Raychowdhury, R" uniqKey="Raychowdhury R">R Raychowdhury</name>
</author>
<author>
<name sortKey="Jovanovic, M" uniqKey="Jovanovic M">M Jovanovic</name>
</author>
<author>
<name sortKey="Rooney, M" uniqKey="Rooney M">M Rooney</name>
</author>
<author>
<name sortKey="Stumpo, Dj" uniqKey="Stumpo D">DJ Stumpo</name>
</author>
<author>
<name sortKey="Pauli, A" uniqKey="Pauli A">A Pauli</name>
</author>
<author>
<name sortKey="Hacohen, N" uniqKey="Hacohen N">N Hacohen</name>
</author>
<author>
<name sortKey="Schier, Af" uniqKey="Schier A">AF Schier</name>
</author>
<author>
<name sortKey="Blackshear, Pj" uniqKey="Blackshear P">PJ Blackshear</name>
</author>
<author>
<name sortKey="Friedman, N" uniqKey="Friedman N">N Friedman</name>
</author>
<author>
<name sortKey="Amit, I" uniqKey="Amit I">I Amit</name>
</author>
<author>
<name sortKey="Regev, A" uniqKey="Regev A">A Regev</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Churchman, Ls" uniqKey="Churchman L">LS Churchman</name>
</author>
<author>
<name sortKey="Weissman, Js" uniqKey="Weissman J">JS Weissman</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fang, Y" uniqKey="Fang Y">Y Fang</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
<author>
<name sortKey="Zhu, Q" uniqKey="Zhu Q">Q Zhu</name>
</author>
<author>
<name sortKey="Wang, J" uniqKey="Wang J">J Wang</name>
</author>
<author>
<name sortKey="Li, G" uniqKey="Li G">G Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Firpi, Ha" uniqKey="Firpi H">HA Firpi</name>
</author>
<author>
<name sortKey="Ucar, D" uniqKey="Ucar D">D Ucar</name>
</author>
<author>
<name sortKey="Tan, K" uniqKey="Tan K">K Tan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Erwin, Gd" uniqKey="Erwin G">GD Erwin</name>
</author>
<author>
<name sortKey="Oksenberg, N" uniqKey="Oksenberg N">N Oksenberg</name>
</author>
<author>
<name sortKey="Truty, Rm" uniqKey="Truty R">RM Truty</name>
</author>
<author>
<name sortKey="Kostka, D" uniqKey="Kostka D">D Kostka</name>
</author>
<author>
<name sortKey="Murphy, Kk" uniqKey="Murphy K">KK Murphy</name>
</author>
<author>
<name sortKey="Ahituv, N" uniqKey="Ahituv N">N Ahituv</name>
</author>
<author>
<name sortKey="Pollard, Ks" uniqKey="Pollard K">KS Pollard</name>
</author>
<author>
<name sortKey="Capra, Ja" uniqKey="Capra J">JA Capra</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bu, H" uniqKey="Bu H">H Bu</name>
</author>
<author>
<name sortKey="Gan, Y" uniqKey="Gan Y">Y Gan</name>
</author>
<author>
<name sortKey="Wang, Y" uniqKey="Wang Y">Y Wang</name>
</author>
<author>
<name sortKey="Zhou, S" uniqKey="Zhou S">S Zhou</name>
</author>
<author>
<name sortKey="Guan, J" uniqKey="Guan J">J Guan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Min, X" uniqKey="Min X">X Min</name>
</author>
<author>
<name sortKey="Zeng, W" uniqKey="Zeng W">W Zeng</name>
</author>
<author>
<name sortKey="Chen, S" uniqKey="Chen S">S Chen</name>
</author>
<author>
<name sortKey="Chen, N" uniqKey="Chen N">N Chen</name>
</author>
<author>
<name sortKey="Chen, T" uniqKey="Chen T">T Chen</name>
</author>
<author>
<name sortKey="Jiang, R" uniqKey="Jiang R">R Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Fang, L" uniqKey="Fang L">L Fang</name>
</author>
<author>
<name sortKey="Long, R" uniqKey="Long R">R Long</name>
</author>
<author>
<name sortKey="Lan, X" uniqKey="Lan X">X Lan</name>
</author>
<author>
<name sortKey="Chou, K C" uniqKey="Chou K">K-C Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiaa, C" uniqKey="Jiaa C">C Jiaa</name>
</author>
<author>
<name sortKey="He, W" uniqKey="He W">W He</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liu, B" uniqKey="Liu B">B Liu</name>
</author>
<author>
<name sortKey="Li, K" uniqKey="Li K">K Li</name>
</author>
<author>
<name sortKey="Huang, D S" uniqKey="Huang D">D-S Huang</name>
</author>
<author>
<name sortKey="Chou, K C" uniqKey="Chou K">K-C Chou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Crooks, Ge" uniqKey="Crooks G">GE Crooks</name>
</author>
<author>
<name sortKey="Hon, G" uniqKey="Hon G">G Hon</name>
</author>
<author>
<name sortKey="Chandonia, J M" uniqKey="Chandonia J">J-M Chandonia</name>
</author>
<author>
<name sortKey="Brenner, Se" uniqKey="Brenner S">SE Brenner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="D Schneider, T" uniqKey="D Schneider T">T D.Schneider</name>
</author>
<author>
<name sortKey="Stephens, Rm" uniqKey="Stephens R">RM Stephens</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chicco, D" uniqKey="Chicco D">D Chicco</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Min, X" uniqKey="Min X">X Min</name>
</author>
<author>
<name sortKey="Zeng, W" uniqKey="Zeng W">W Zeng</name>
</author>
<author>
<name sortKey="Chen, S" uniqKey="Chen S">S Chen</name>
</author>
<author>
<name sortKey="Chen, N" uniqKey="Chen N">N Chen</name>
</author>
<author>
<name sortKey="Chen, T" uniqKey="Chen T">T Chen</name>
</author>
<author>
<name sortKey="Jiang, R" uniqKey="Jiang R">R Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fu, L" uniqKey="Fu L">L Fu</name>
</author>
<author>
<name sortKey="Niu, B" uniqKey="Niu B">B Niu</name>
</author>
<author>
<name sortKey="Zhu, Z" uniqKey="Zhu Z">Z Zhu</name>
</author>
<author>
<name sortKey="Wu, S" uniqKey="Wu S">S Wu</name>
</author>
<author>
<name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">31874637</article-id>
<article-id pub-id-type="pmc">6929481</article-id>
<article-id pub-id-type="publisher-id">6336</article-id>
<article-id pub-id-type="doi">10.1186/s12864-019-6336-3</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Nguyen</surname>
<given-names>Quang H.</given-names>
</name>
<address>
<email>quangnh@soict.hust.edu.vn</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Nguyen-Vo</surname>
<given-names>Thanh-Hoang</given-names>
</name>
<address>
<email>thanhhoang.nguyenvo@vuw.ac.nz</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Le</surname>
<given-names>Nguyen Quoc Khanh</given-names>
</name>
<address>
<email>khanhlee@tmu.edu.tw</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Do</surname>
<given-names>Trang T.T.</given-names>
</name>
<address>
<email>dotrang@alumni.nus.edu.sg</email>
</address>
<xref ref-type="aff" rid="Aff4">4</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Rahardja</surname>
<given-names>Susanto</given-names>
</name>
<address>
<email>susantorahardja@ieee.org</email>
</address>
<xref ref-type="aff" rid="Aff5">5</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Nguyen</surname>
<given-names>Binh P.</given-names>
</name>
<address>
<email>binh.p.nguyen@vuw.ac.nz</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="GRID">grid.440792.c</institution-id>
<institution>School of Information and Communication Technology, Hanoi University of Science and Technology,</institution>
</institution-wrap>
1 Dai Co Viet, Hanoi 100000, Vietnam</aff>
<aff id="Aff2">
<label>2</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 2292 3111</institution-id>
<institution-id institution-id-type="GRID">grid.267827.e</institution-id>
<institution>School of Mathematics and Statistics, Victoria University of Wellington,</institution>
</institution-wrap>
Gate 7, Kelburn Parade, Wellington, 6142 New Zealand</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9337 0481</institution-id>
<institution-id institution-id-type="GRID">grid.412896.0</institution-id>
<institution>Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University,</institution>
</institution-wrap>
Keelung Road, Da’an Distric, Taipei City, 106 Taiwan (R.O.C.)</aff>
<aff id="Aff4">
<label>4</label>
Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam</aff>
<aff id="Aff5">
<label>5</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0001 0307 1240</institution-id>
<institution-id institution-id-type="GRID">grid.440588.5</institution-id>
<institution>School of Marine Science and Technology, Northwestern Polytechnical University,</institution>
</institution-wrap>
127 West Youyi Road, Xi’an 710072, China</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>24</day>
<month>12</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>24</day>
<month>12</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>20</volume>
<issue>Suppl 9</issue>
<elocation-id>951</elocation-id>
<permissions>
<copyright-statement>© The Author(s) 2019</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p id="Par1">Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and
<italic>k</italic>
-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.’s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance.</p>
</sec>
<sec>
<title>Results</title>
<p id="Par2">Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews’s correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively.</p>
</sec>
<sec>
<title>Conclusions</title>
<p id="Par3">iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Enhancer</kwd>
<kwd>Identification</kwd>
<kwd>Classification</kwd>
<kwd>Ensemble</kwd>
<kwd>One-hot encoding</kwd>
<kwd>Convolutional neural network</kwd>
<kwd>Deep learning</kwd>
</kwd-group>
<conference xlink:href="https://incob2019.org/">
<conf-name>International Conference on Bioinformatics (InCoB 2019)</conf-name>
<conf-acronym>InCoB 2019</conf-acronym>
<conf-loc>Jakarta, Indonesia</conf-loc>
<conf-date>10-12 September 2019</conf-date>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2019</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>‘Omics’ science, including studies on genomics, transcriptomics, proteomics, and metabolomics, is a new research field combining background of molecular genetics and power of computer science to address biological problems. In transcriptomics, enhancers [
<xref ref-type="bibr" rid="CR1">1</xref>
] refer to a group of non-coding DNA fragments holding responsibility for regulating gene expression in both transcription and translation. Unlike a promoter which is the transcriptional initializer of a particular gene [
<xref ref-type="bibr" rid="CR2">2</xref>
] located at the upstream region of the gene, an enhancer can be found at a region of up to 20kb upstream/downstream with respect to the gene or even at other chromosomes not carrying that gene. Identification of new enhancers is therefore challenging due to their nature of locational variation. Besides, since enhancers are sequences not encoding for any proteins, they freely dispense into 98% of the total human non-encoding genome carrying billions of base pairs [
<xref ref-type="bibr" rid="CR1">1</xref>
]. While molecular mechanisms of protein-coding genes can be relatively simply addressed, biological patterns of enhancers have not been well generalized. Furthermore, activities of enhancers vary depending on specific types of cells, time, and intrinsic/extrinsic stimulations [
<xref ref-type="bibr" rid="CR1">1</xref>
]. Previously, to identify and locate enhancers, scientists had no choice but performing in vitro [
<xref ref-type="bibr" rid="CR3">3</xref>
] or in vivo [
<xref ref-type="bibr" rid="CR4">4</xref>
] experiments. Recent findings have revealed there are a large number of recognized enhancers shared by both human and other species including eukaryotes and prokaryotes [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR5">5</xref>
]. Moreover, genetic variation in enhancers has been demonstrated linking to many human illnesses [
<xref ref-type="bibr" rid="CR6">6</xref>
,
<xref ref-type="bibr" rid="CR7">7</xref>
] such as various types of cancer [
<xref ref-type="bibr" rid="CR6">6</xref>
,
<xref ref-type="bibr" rid="CR8">8</xref>
] and inflammatory bowel disease [
<xref ref-type="bibr" rid="CR9">9</xref>
].</p>
<p>As an essential transcriptional factor facilitating gene expression, enhancer identification/classification is currently one of hot topics in biological research that are appealing to both experimental and computational biologists [
<xref ref-type="bibr" rid="CR10">10</xref>
<xref ref-type="bibr" rid="CR12">12</xref>
]. In 2007, a comparative analysis on genomics was done by Pennacchio et al. [
<xref ref-type="bibr" rid="CR10">10</xref>
] to identify enhancers. Since the study used a small training dataset, the limited prediction accuracy was one of their big challenges at that time. In 2017, Zacher et al. proposed a novel unsupervised genome segmentation algorithm called GenoSTAN (Genomic STate ANnotation) [
<xref ref-type="bibr" rid="CR11">11</xref>
] to improve the accuracy in enhancer/promoter identification by directly learning from sequencing data of chromatin states (no data transformation required). GenoSTAN used 127 cell types and tissues collected from the ENCODE [
<xref ref-type="bibr" rid="CR13">13</xref>
,
<xref ref-type="bibr" rid="CR14">14</xref>
] and NIH Roadmap Epigenomics Program [
<xref ref-type="bibr" rid="CR15">15</xref>
]. Although their study using chromatin state data to identify enhancers ended up with good results, the model sensitivity was still lower than that of other methods using transcription-based data because transcription-based predictive models using transient transcriptome profiling [
<xref ref-type="bibr" rid="CR16">16</xref>
,
<xref ref-type="bibr" rid="CR17">17</xref>
] and nascent transcriptome profiling [
<xref ref-type="bibr" rid="CR18">18</xref>
] could significantly boost up the model sensitivity. A year later, Lai et al. [
<xref ref-type="bibr" rid="CR12">12</xref>
] conducted wet-lab experiments to identify the enhancers of red flour beetle (
<italic>Tribolium castaneum</italic>
) and evaluated their activity.</p>
<p>Unlike in the past, computational scientists are now equipped with high-performance computing resources and advanced techniques to deal with the outgrowth of biological data, especially ‘omic’ data. Troubleshooting biological problems using various in silico approaches is one of the best ways to take advantages of redundant and available ‘omic’ data. For enhancer identification and classification, some in silico studies have also been conducted using genetic regulatory elements such as transcriptional factors binding motif occurrences [
<xref ref-type="bibr" rid="CR19">19</xref>
], chromatin signatures [
<xref ref-type="bibr" rid="CR20">20</xref>
], and combined multiple datasets [
<xref ref-type="bibr" rid="CR21">21</xref>
]. To improve model performance, computational scientists have applied various learning algorithms, e.g. the Random Forest (RF) [
<xref ref-type="bibr" rid="CR22">22</xref>
], deep belief networks [
<xref ref-type="bibr" rid="CR23">23</xref>
], deep-learning-based hybrid [
<xref ref-type="bibr" rid="CR24">24</xref>
] and neural network [
<xref ref-type="bibr" rid="CR20">20</xref>
] architectures. In 2016, iEnhancer-2L [
<xref ref-type="bibr" rid="CR25">25</xref>
] by Liu et al. and EnhancerPred [
<xref ref-type="bibr" rid="CR26">26</xref>
] by Jia and He were introduced as two effective methods using the same learning algorithm - Support Vector Machine (SVM). While iEnhancer-2L used pseudo k-tuple nucleotide composition (PseKNC) for sequence encoding scheme, EnhancerPred used bi-profile Bayes and pseudo-nucleotide composition. Both methods reported acceptable performances; however, their MCCs were relatively low. EnhancerPred performs slightly better than iEnhancer-2L with small improvement in MCC; however, its efficiency is still insufficient. In 2018, Liu et al. proposed iEnhancer-EL [
<xref ref-type="bibr" rid="CR27">27</xref>
] which is an upgraded version of iEnhancer-2L. It has a very complicated structure with two ensemble models from 16 individual key classifiers, and the key classifiers were constructed from 171 SVM-based elementary classifiers with three different types of features: the PseKNC, subsequence profile, and
<italic>k</italic>
-mers. Although iEnhancer-EL is currently one of the best methods for identifying enhancers and their strength, it should be possible to develop better models using novel learning algorithms and encoding schemes.</p>
<p>In this study, we propose a more efficient prediction framework called iEnhancer-ECNN using a combination of one-hot encoding (OHE) and
<italic>k</italic>
-mers as a sequence encoding scheme and ensembles of convolutional neural networks (CNNs). In order to make a fair comparison with other previous studies, the same dataset used in Liu et al.’s studies [
<xref ref-type="bibr" rid="CR25">25</xref>
,
<xref ref-type="bibr" rid="CR27">27</xref>
] and Jia and He’s study [
<xref ref-type="bibr" rid="CR26">26</xref>
] was used in our model construction and evaluation.</p>
</sec>
<sec id="Sec2">
<title>Results and discussions</title>
<sec id="Sec3">
<title>Sequence analysis</title>
<p>To perform comparative sequence analysis on biological patterns between enhancers and non-enhancers as well as those between strong enhancers and weak enhancers, Two Sample Logo [
<xref ref-type="bibr" rid="CR28">28</xref>
] with independent
<italic>t</italic>
-test (
<italic>p</italic>
<0.05) was adopted to generate a logo to visualize the sequence. An initial concept of presenting consensus sequences to visualize shared biological patterns in a set of aligned sequences was first proposed by Schneider et al. [
<xref ref-type="bibr" rid="CR29">29</xref>
] in 1990. Each sequence-logo map displays information about (i) the most prevalently found nucleotides scoring from the head of each certain location, (ii) the occurrence frequency of every nucleotide signified by the proportional height of the character, and (iii) the significance of every particular location relying on by the height of the entire stack of characters.</p>
<p>For both layers in this study, a significance testing for the variance of biological patterns between enhancers and non-enhancers as well as between strong enhancers and weak enhancers was conducted. For layers 1 and 2, the enhancer set and strong enhancer set are considered positive sets while the non-enhancer set and weak enhancer set are considered negative sets. The constructed map for each layer provides information about two groups of nucleotides observed in the positive set and the negative set (base for comparison) sequentially. A nucleotide which is commonly detected in a certain location of numerous samples from the positive set is named ‘enriched nucleotide’ whereas a nucleotide which is seldom detected in a certain location of numerous samples from the positive set is named ‘depleted nucleotide’. Independent
<italic>t</italic>
-test was done using the calculated occurrence frequencies of a nucleotide at certain locations to gain information on which nucleotide occurrence is accidental or directional.</p>
<p>Figure 
<xref rid="Fig1" ref-type="fig">1</xref>
indicates sequence characteristics of sites between enhancers and non-enhancers and between strong enhancers and weak enhancers, respectively, in the development set. It is obviously seen that along most of the enhancer sequences, each location is enriched with only G and C while depleted with A and T. This significant difference between enhancers and non-enhancers indicates a great separation in biological patterns between two groups, or in other words, this finding is meaningful for our classification model. Besides, structural differences between strong enhancers and weak enhancers are evidently smaller than those between enhancers and non-enhancers due to many shared biological patterns. As shown in Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
B, strong enhancers have a tendency to accumulate G and C more rather than A and T while weak enhancers show a completely reverse trend with a condensed population of A and T and a sparse population of G and C.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Sequence characteristics of
<bold>a</bold>
enhancers versus non-enhancers and
<bold>b</bold>
strong enhancers versus weak enhancers. Sequence analysis using logo representations were created by Two Sample Logo with
<italic>t</italic>
-test (
<italic>p</italic>
<0.05) with A, T, G, and C are colored with Green, Red, Yellow, and Blue, respectively</p>
</caption>
<graphic xlink:href="12864_2019_6336_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
</sec>
<sec id="Sec4">
<title>Model evaluation</title>
<p>Tables 
<xref rid="Tab1" ref-type="table">1</xref>
and
<xref rid="Tab3" ref-type="table">3</xref>
compare the performances on the independent test set of 5 single CNN models versus the ensemble model in layers 1 and 2, respectively, to examine the efficiency of using ensemble learning. Tables 
<xref rid="Tab2" ref-type="table">2</xref>
and
<xref rid="Tab4" ref-type="table">4</xref>
provide information on 10 testing trials in layers 1 and 2, respectively. For each trial, a random seed in the range from 3 to 21 was used to split the development dataset into five parts using stratified sampling. Each part was in turn used as the validation set for training a CNN model from the remaining 4 parts.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Results of an enhancer identification trial (trial 5 in Table 
<xref rid="Tab2" ref-type="table">2</xref>
) on the independent test dataset</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Training : Validation (Ratio 4:1)</th>
<th align="left">ACC (%)</th>
<th align="left">AUC (%)</th>
<th align="left">SN (%)</th>
<th align="left">SP (%)</th>
<th align="left">MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Model 1 (Parts 2, 3, 4, 5 : Part 1)</td>
<td align="left">0.756</td>
<td align="left">0.815</td>
<td align="left">0.750</td>
<td align="left">
<bold>0.765</bold>
</td>
<td align="left">0.515</td>
</tr>
<tr>
<td align="left">Model 2 (Parts 1, 3, 4, 5 : Part 2)</td>
<td align="left">0.753</td>
<td align="left">0.829</td>
<td align="left">0.775</td>
<td align="left">0.730</td>
<td align="left">0.506</td>
</tr>
<tr>
<td align="left">Model 3 (Parts 1, 2, 4, 5 : Part 3)</td>
<td align="left">0.740</td>
<td align="left">0.825</td>
<td align="left">
<bold>0.810</bold>
</td>
<td align="left">0.670</td>
<td align="left">0.485</td>
</tr>
<tr>
<td align="left">Model 4 (Parts 1, 2, 3, 5 : Part 4)</td>
<td align="left">0.776</td>
<td align="left">0.831</td>
<td align="left">0.790</td>
<td align="left">0.765</td>
<td align="left">
<bold>0.555</bold>
</td>
</tr>
<tr>
<td align="left">Model 5 (Parts 1, 2, 3, 4 : Part 5)</td>
<td align="left">0.746</td>
<td align="left">0.821</td>
<td align="left">0.745</td>
<td align="left">0.750</td>
<td align="left">0.495</td>
</tr>
<tr>
<td align="left">Ensemble Model</td>
<td align="left">
<bold>0.765</bold>
</td>
<td align="left">
<bold>0.834</bold>
</td>
<td align="left">0.790</td>
<td align="left">0.740</td>
<td align="left">0.531</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The highest value for each metric is in bold</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Independent test identifying enhancers and non-enhancers under 10 trials</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">No. of Trials</th>
<th align="left">ACC (%)</th>
<th align="left">AUC (%)</th>
<th align="left">SN (%)</th>
<th align="left">SP(%)</th>
<th align="left">MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">0.768</td>
<td align="left">0.831</td>
<td align="left">0.780</td>
<td align="left">0.755</td>
<td align="left">0.535</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">0.765</td>
<td align="left">0.834</td>
<td align="left">0.790</td>
<td align="left">0.740</td>
<td align="left">0.531</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">0.770</td>
<td align="left">0.835</td>
<td align="left">0.775</td>
<td align="left">0.765</td>
<td align="left">0.540</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">0.768</td>
<td align="left">0.831</td>
<td align="left">0.795</td>
<td align="left">0.740</td>
<td align="left">0.536</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">0.773</td>
<td align="left">0.832</td>
<td align="left">0.785</td>
<td align="left">0.760</td>
<td align="left">0.545</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">0.778</td>
<td align="left">0.837</td>
<td align="left">0.800</td>
<td align="left">0.755</td>
<td align="left">0.556</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">0.773</td>
<td align="left">0.832</td>
<td align="left">0.780</td>
<td align="left">0.765</td>
<td align="left">0.545</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">0.773</td>
<td align="left">0.832</td>
<td align="left">0.780</td>
<td align="left">0.765</td>
<td align="left">0.545</td>
</tr>
<tr>
<td align="left">9</td>
<td align="left">0.758</td>
<td align="left">0.830</td>
<td align="left">0.785</td>
<td align="left">0.730</td>
<td align="left">0.516</td>
</tr>
<tr>
<td align="left">10</td>
<td align="left">0.763</td>
<td align="left">0.830</td>
<td align="left">0.780</td>
<td align="left">0.745</td>
<td align="left">0.525</td>
</tr>
<tr>
<td align="left">Mean</td>
<td align="left">0.769</td>
<td align="left">0.832</td>
<td align="left">0.785</td>
<td align="left">0.752</td>
<td align="left">0.537</td>
</tr>
<tr>
<td align="left">SD</td>
<td align="left">0.006</td>
<td align="left">0.002</td>
<td align="left">0.008</td>
<td align="left">0.013</td>
<td align="left">0.011</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>Results of an enhancer classification trial (trial 9 in Table 
<xref rid="Tab4" ref-type="table">4</xref>
) on the independent test dataset</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Training : Validation (Ratio 4:1)</th>
<th align="left">ACC (%)</th>
<th align="left">AUC (%)</th>
<th align="left">SN(%)</th>
<th align="left">SP (%)</th>
<th align="left">MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Model 1 (Parts 2, 3, 4, 5 : Part 1)</td>
<td align="left">
<bold>0.700</bold>
</td>
<td align="left">
<bold>0.764</bold>
</td>
<td align="left">0.780</td>
<td align="left">0.620</td>
<td align="left">0.405</td>
</tr>
<tr>
<td align="left">Model 2 (Parts 1, 3, 4, 5 : Part 2)</td>
<td align="left">0.660</td>
<td align="left">0.740</td>
<td align="left">0.720</td>
<td align="left">0.600</td>
<td align="left">0.322</td>
</tr>
<tr>
<td align="left">Model 3 (Parts 1, 2, 4, 5 : Part 3)</td>
<td align="left">0.670</td>
<td align="left">0.730</td>
<td align="left">
<bold>0.850</bold>
</td>
<td align="left">0.490</td>
<td align="left">0.364</td>
</tr>
<tr>
<td align="left">Model 4 (Parts 1, 2, 3, 5 : Part 4)</td>
<td align="left">0.665</td>
<td align="left">0.715</td>
<td align="left">0.660</td>
<td align="left">
<bold>0.670</bold>
</td>
<td align="left">0.330</td>
</tr>
<tr>
<td align="left">Model 5 (Parts 1, 2, 3, 4 : Part 5)</td>
<td align="left">0.600</td>
<td align="left">0.681</td>
<td align="left">0.680</td>
<td align="left">0.520</td>
<td align="left">0.203</td>
</tr>
<tr>
<td align="left">Ensemble Model</td>
<td align="left">0.695</td>
<td align="left">0.759</td>
<td align="left">0.840</td>
<td align="left">0.550</td>
<td align="left">
<bold>0.408</bold>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The highest value for each metric is in bold</p>
</table-wrap-foot>
</table-wrap>
<table-wrap id="Tab4">
<label>Table 4</label>
<caption>
<p>Independent test classifying strong enhancers and weak enhancers under 10 trials</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">No. of Trials</th>
<th align="left">ACC (%)</th>
<th align="left">AUC (%)</th>
<th align="left">SN (%)</th>
<th align="left">SP(%)</th>
<th align="left">MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">0.650</td>
<td align="left">0.728</td>
<td align="left">0.680</td>
<td align="left">0.620</td>
<td align="left">0.301</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">0.710</td>
<td align="left">0.795</td>
<td align="left">0.880</td>
<td align="left">0.540</td>
<td align="left">0.447</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">0.695</td>
<td align="left">0.751</td>
<td align="left">0.920</td>
<td align="left">0.470</td>
<td align="left">0.437</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">0.670</td>
<td align="left">0.749</td>
<td align="left">0.750</td>
<td align="left">0.590</td>
<td align="left">0.344</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">0.660</td>
<td align="left">0.724</td>
<td align="left">0.720</td>
<td align="left">0.600</td>
<td align="left">0.322</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">0.690</td>
<td align="left">0.779</td>
<td align="left">0.810</td>
<td align="left">0.570</td>
<td align="left">0.391</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">0.670</td>
<td align="left">0.736</td>
<td align="left">0.740</td>
<td align="left">0.600</td>
<td align="left">0.343</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">0.660</td>
<td align="left">0.728</td>
<td align="left">0.750</td>
<td align="left">0.570</td>
<td align="left">0.325</td>
</tr>
<tr>
<td align="left">9</td>
<td align="left">0.695</td>
<td align="left">0.759</td>
<td align="left">0.840</td>
<td align="left">0.550</td>
<td align="left">0.408</td>
</tr>
<tr>
<td align="left">10</td>
<td align="left">0.675</td>
<td align="left">0.735</td>
<td align="left">0.820</td>
<td align="left">0.530</td>
<td align="left">0.366</td>
</tr>
<tr>
<td align="left">Mean</td>
<td align="left">0.678</td>
<td align="left">0.748</td>
<td align="left">0.791</td>
<td align="left">0.564</td>
<td align="left">0.368</td>
</tr>
<tr>
<td align="left">SD</td>
<td align="left">0.019</td>
<td align="left">0.024</td>
<td align="left">0.076</td>
<td align="left">0.044</td>
<td align="left">0.050</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<sec id="Sec5">
<title>Layer 1: enhancer identification</title>
<p>From five parts split from the development set, after 5 rotations, 5 trained CNN models were obtained to build up an ensemble model. As seen from Table 
<xref rid="Tab1" ref-type="table">1</xref>
, the model accuracy of these models varies between 0.740 and 0.776 with a very small standard deviation. For the AUC, all values are over 0.800 with the highest AUC value of 0.831. Model 3 ends with an opposing result between sensitivity and specificity together with the MCC. Model 3 obtains the highest sensitivity but lowest specificity and MCC compared to others which leads to higher standard deviations in these metrics. In terms of the specificity and MCC, models 1 and 4 were at the first place, respectively. Although some metrics in single CNN models are slightly higher than those of the ensemble model, the ensemble model remains the one having higher efficiency in total examination. In comparison, the specificity of the ensemble model only smaller than that of model 1 while its sensitivity and MCC are only smaller than sensitivity and MCC of models 3 and 4, respectively. To observe the variation in all the evaluation metrics of the ensemble model, 10 trials were done on the independent test set (Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
a and Table 
<xref rid="Tab2" ref-type="table">2</xref>
). The results indicate a very small variation in evaluation metrics among 10 trials with no outlier found, especially the AUC – the least varied metric. The sensitivity is the second lowest metric, followed by the accuracy and specificity. Moreover, the small variation of the MCC implies highly stable prediction over many trials.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Variation in evaluation metrics from 10 trials of independent test for
<bold>a</bold>
Layer 1: Enhancer Identication and
<bold>b</bold>
Layer 2: Enhancer Classication</p>
</caption>
<graphic xlink:href="12864_2019_6336_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
</sec>
<sec id="Sec6">
<title>Layer 2: enhancer classification</title>
<p>Similarly, layer 2 also had its development set split into five parts containing strong enhancers and weak enhancers in an equal ratio in which 4 parts were used as a training set and 1 part was used as a validation set. The ensemble model was finally built up from the five separate CNN models (Table 
<xref rid="Tab3" ref-type="table">3</xref>
). Generally, the variation in evaluation metrics among the 5 models for enhancer classification is greater than those of the five models for enhancer identification. This fact can be explained by the different numbers of samples between the two prediction layers. The sample size of the development set used in layer 1 is obviously significantly larger than the sample size of the development set used in layer 2. Furthermore, differences between enhancers and non-enhancers are more specific than those between strong enhancers and weak enhancers (Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
a). Regardless of their strength, strong enhancers and weak enhancer are still functional enhancers sharing more structural similarities (Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
b). The sensitivity of the ensemble model holds the first place, followed by the AUC, accuracy, and specificity. The MCC of the ensemble model is only over 0.408 but it is the highest value compared to those of 5 single CNN models. Among these evaluation metrics, the AUC is the most stable with the smallest variation compared to the others. The accuracy and AUC of model 1 is higher than those of the rest of the models. Models 3 and 4 have the highest sensitivity and highest specificity, respectively. Although the specificity of the ensemble model is relatively lower than some single CNN models, its high sensitivity promises an effective computational framework because correctly detecting strong enhancers is somehow more important than correctly finding weak ones. The MCC of the enhancer classification model varies more broadly compared to that of the enhancer identification model. To observe the variation in all evaluation metrics of the ensemble model, 10 trials were done on the independent test set to collect data (Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
b and Table 
<xref rid="Tab4" ref-type="table">4</xref>
). The results indicate a quite large variation in sensitivity and MCC among 10 trials. Despite large variation, no outlier is found in all evaluation metrics. The averaged sensitivity of the model is significantly greater than the others but its variation is also higher than the rest of metrics. The MCC is the least varied metric, followed by the AUC, accuracy, and specificity.</p>
</sec>
</sec>
<sec id="Sec7">
<title>Comparative analysis</title>
<p>Table 
<xref rid="Tab5" ref-type="table">5</xref>
gives a detailed comparative analysis on the model performance between iEnhancer-ECNN and other existing state-of-the-art methods in previous studies. Except for specificity, iEnhancer-ECNN achieves a significant improvement in model performance based on the rest of the evaluation metrics. For both layers 1 and 2, the proposed method attains slightly lower value compared to other methods introduced in previous studies. On the other hand, remarkable improvements in the AUC, sensitivity, and MCC are observed, especially those in the model of layer 2 with a boost of about 11.0%, 46.5%, and 65.0%, respectively. A significant increase in the MCC indicates that the proposed method considerably improves the model stability as well as overall performance in comparison with the state-of-the-art methods that have relatively small MCCs. This improvement is essential in the model development to confirm the reliability in the binary classification problem. The MCC is considered to be more informative than the accuracy when it considers the proportion of all the four categories (TF, TN, FP, and FN) of the confusion matrix to show a balanced evaluation in model assessment [
<xref ref-type="bibr" rid="CR30">30</xref>
]. Undoubtedly, iEnhancer-ECNN performs better than other previously proposed methods with the surge in most of the evaluation metrics.
<table-wrap id="Tab5">
<label>Table 5</label>
<caption>
<p>Comparative analysis between results of the proposed method and other studies</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left"></th>
<th align="left">Method</th>
<th align="left">ACC</th>
<th align="left">AUC</th>
<th align="left">SN</th>
<th align="left">SP</th>
<th align="left">MCC</th>
<th align="left">Source</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Enhancer Identification</td>
<td align="left">iEnhancer-2L</td>
<td align="left">0.730</td>
<td align="left">0.806</td>
<td align="left">0.710</td>
<td align="left">0.750</td>
<td align="left">0.460</td>
<td align="left">Liu et al., 2016</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">EnhancerPred</td>
<td align="left">0.740</td>
<td align="left">0.801</td>
<td align="left">0.735</td>
<td align="left">0.745</td>
<td align="left">0.480</td>
<td align="left">Jia and He, 2016</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">iEnhancer-EL</td>
<td align="left">0.748</td>
<td align="left">0.817</td>
<td align="left">0.710</td>
<td align="left">
<bold>0.785</bold>
</td>
<td align="left">0.496</td>
<td align="left">Liu et al., 2018</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">iEnhancer-ECNN</td>
<td align="left">
<bold>0.769</bold>
</td>
<td align="left">
<bold>0.832</bold>
</td>
<td align="left">
<bold>0.785</bold>
</td>
<td align="left">0.752</td>
<td align="left">
<bold>0.537</bold>
</td>
<td align="left">This study</td>
</tr>
<tr>
<td align="left">Enhancer Classification</td>
<td align="left">iEnhancer-2L</td>
<td align="left">0.605</td>
<td align="left">0.668</td>
<td align="left">0.470</td>
<td align="left">
<bold>0.740</bold>
</td>
<td align="left">0.218</td>
<td align="left">Liu et al., 2016</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">EnhancerPred</td>
<td align="left">0.550</td>
<td align="left">0.579</td>
<td align="left">0.450</td>
<td align="left">0.650</td>
<td align="left">0.102</td>
<td align="left">Jia and He, 2016</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">iEnhancer-EL</td>
<td align="left">0.610</td>
<td align="left">0.680</td>
<td align="left">0.540</td>
<td align="left">0.680</td>
<td align="left">0.222</td>
<td align="left">Liu et al., 2018</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">iEnhancer-ECNN</td>
<td align="left">
<bold>0.678</bold>
</td>
<td align="left">
<bold>0.748</bold>
</td>
<td align="left">
<bold>0.791</bold>
</td>
<td align="left">0.564</td>
<td align="left">
<bold>0.368</bold>
</td>
<td align="left">This study</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Values which are significantly higher than the others are in bold</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>CNNs and OHE have been used in prediction of enhancer-promoter interactions [
<xref ref-type="bibr" rid="CR31">31</xref>
] and enhancer identification (layer 1 only) [
<xref ref-type="bibr" rid="CR32">32</xref>
]. However, CNNs only can detect local features from OHE. Our method goes beyond that by including global features of the whole sequence through the statistics of 4 different types of
<italic>k</italic>
-mers. In addition, in ensemble learning, the training sub-sets of all the individual CNN models cover the whole development set. This leads to better generalization of the ensemble model compared to each individual CNN model. This is the reason why iEnhancer-ECNN outperforms other previously proposed methods using the same dataset with significant improvements in most of the evaluation metrics.</p>
</sec>
</sec>
<sec id="Sec8" sec-type="conclusion">
<title>Conclusion</title>
<p>iEnhancer-ECNN using ensembles of convolutional neural networks combining with one-hot encoding and
<italic>k</italic>
-mers descriptor as the sequence encoding scheme is an efficient computational framework to identify enhancers and classify their strength. The results confirm that the proposed method can robustly and effectively address difficulties in enhancer identification and classification with significant improvements in most of the evaluation metrics compared to other state-of-the-art methods using the same benchmark dataset. In the future, other sequence encoding schemes and advanced ensemble learning methods will be explored to have a trained model to automatically aggregate the predictions of all the CNN models.</p>
</sec>
<sec id="Sec9">
<title>Methods</title>
<sec id="Sec10">
<title>Benchmark dataset</title>
<p>The dataset used in our experiments was collected from Liu et al.’s studies [
<xref ref-type="bibr" rid="CR25">25</xref>
,
<xref ref-type="bibr" rid="CR27">27</xref>
]. This dataset was also used in the development of iEnhancer-2L [
<xref ref-type="bibr" rid="CR25">25</xref>
], EnhancerPred [
<xref ref-type="bibr" rid="CR26">26</xref>
] and iEnhancer-EL [
<xref ref-type="bibr" rid="CR27">27</xref>
]. In this dataset, information about enhancers from 9 different cell lines was collected and DNA sequences were extracted in the form of short fragments with the same length of 200bp. The CD-HIT software [
<xref ref-type="bibr" rid="CR33">33</xref>
] was then used to exclude pairwise sequences whose similarities were more than 20%. The dataset comprises of a development (or cross-validation) set and an independent test set. The development set encompasses 1,484 enhancer samples (742 strong enhancer and 742 weak enhancer samples) and 1,484 non-enhancer samples. The independent test set contains 200 enhancers (100 strong enhancers and 100 weak enhancers) and 200 non-enhancers. Similar to other studies, we used the development set to construct two models for two problems: enhancer identification (layer 1) and enhancer classification (layer 2), then used the independent test set to test the models. For each layer, we first randomly divided the development set into 5 folds (or parts) using stratified sampling. Each fold was in turn used as the validation set while the remaining 4 folds were used as the training set for training a CNN model. Then the five trained CNN models were combined to create an ensemble model for the layer. The ensemble model was then used to test on samples from the independent test set (Fig. 
<xref rid="Fig3" ref-type="fig">3</xref>
). This whole process, including data partitioning, model training and model testing, was repeated for 10 times to observe the variation in model performance across 10 trials. Tables 
<xref rid="Tab6" ref-type="table">6</xref>
and
<xref rid="Tab7" ref-type="table">7</xref>
present the data distribution in 5 folds used in model training for layers 1 and 2, respectively.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Overview of the model development</p>
</caption>
<graphic xlink:href="12864_2019_6336_Fig3_HTML" id="MO3"></graphic>
</fig>
<table-wrap id="Tab6">
<label>Table 6</label>
<caption>
<p>Data distribution of 5 parts in the development set for identifying enhancers and non-enhancers</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Part</th>
<th align="left">Non-enhancers</th>
<th align="left" colspan="2">Enhancers</th>
</tr>
<tr>
<th align="left"></th>
<th align="left"></th>
<th align="left">Strong</th>
<th align="left">Weak</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">301</td>
<td align="left">151</td>
<td align="left">142</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">295</td>
<td align="left">153</td>
<td align="left">146</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">295</td>
<td align="left">148</td>
<td align="left">151</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">292</td>
<td align="left">153</td>
<td align="left">149</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">301</td>
<td align="left">137</td>
<td align="left">154</td>
</tr>
<tr>
<td align="left">
<bold>Total</bold>
</td>
<td align="left">
<bold>1484</bold>
</td>
<td align="left">
<bold>742</bold>
</td>
<td align="left">
<bold>742</bold>
</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="Tab7">
<label>Table 7</label>
<caption>
<p>Data distribution of 5 parts in the development set for classifying strong enhancers and weak enhancers</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Part</th>
<th align="left" colspan="2">Number of enhancers</th>
</tr>
<tr>
<th align="left"></th>
<th align="left">Strong</th>
<th align="left">Weak</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">150</td>
<td align="left">147</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">154</td>
<td align="left">143</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">146</td>
<td align="left">151</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">148</td>
<td align="left">149</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">144</td>
<td align="left">152</td>
</tr>
<tr>
<td align="left">
<bold>Total</bold>
</td>
<td align="left">
<bold>742</bold>
</td>
<td align="left">
<bold>742</bold>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
</sec>
<sec id="Sec11">
<title>Sequence encoding scheme</title>
<p>We used one-hot encoding (OHE) and
<italic>k</italic>
-mer descriptor to encode each input sequence for our CNN model. Every enhancer in this study has a length of 200bp built up by four nucleic acids, including Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). Adenine (A) and Guanine (G) are purines while Cytosine (C), and Thymine (T) are pyrimidines. For OHE, each character was transformed into a new matrix built from a set of 4 binary numbers representing four types of nucleic acids. For each matrix corresponding to a certain type of nucleic acids, there are three values assigned as 0 and one value assigned as 1 (Table 
<xref rid="Tab8" ref-type="table">8</xref>
).
<table-wrap id="Tab8">
<label>Table 8</label>
<caption>
<p>The corresponding code of each nucleic acid in one-hot encoding</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Nucleic Acid</th>
<th align="left">Code</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">‘A’</td>
<td align="left">[ 1 0 0 0 ]</td>
</tr>
<tr>
<td align="left">‘C’</td>
<td align="left">[ 0 1 0 0 ]</td>
</tr>
<tr>
<td align="left">‘G’</td>
<td align="left">[ 0 0 1 0 ]</td>
</tr>
<tr>
<td align="left">‘T’</td>
<td align="left">[ 0 0 0 1 ]</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>In addition to OHE, we also used
<italic>k</italic>
-mers which are the occurrence frequencies of
<italic>k</italic>
neighboring nucleic acids. With respect to the nucleic acid
<italic>N</italic>
<sub>
<italic>i</italic>
</sub>
in a DNA sequence
<italic>S</italic>
with length
<italic>L</italic>
(
<italic>i</italic>
=1..
<italic>L</italic>
and
<italic>L</italic>
=200 in this study), in addition to the 4 binary values encoding
<italic>N</italic>
<sub>
<italic>i</italic>
</sub>
by OHE, the following 4 values
<italic>x,y</italic>
,
<italic>z,t</italic>
were formed and added to the encoding of
<italic>N</italic>
<sub>
<italic>i</italic>
</sub>
:
<list list-type="bullet">
<list-item>
<p>1-mer feature:
<inline-formula id="IEq1">
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$x = \frac {{\# N_{i} \, \text {in} \, S}}{L}$\end{document}</tex-math>
<mml:math id="M2">
<mml:mi>x</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mi>#</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext>in</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:math>
<inline-graphic xlink:href="12864_2019_6336_Article_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
,
<italic>N</italic>
<sub>
<italic>i</italic>
</sub>
∈{
<italic>A,C</italic>
,
<italic>G,T</italic>
}</p>
</list-item>
<list-item>
<p>2-mer (right) feature:
<disp-formula id="Equa">
<alternatives>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ y = \left\{ {\begin{array}{cc} {\frac{{\# N_{i,i + 1} \, \text{in} \, S}}{{L - 1}}} & {\text{if} \,\, i < L} \\ 0 & {\text{if}\, \, i = L} \\ \end{array}} \right. $$ \end{document}</tex-math>
<mml:math id="M4">
<mml:mrow>
<mml:mi>y</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable class="array" columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mfrac>
<mml:mrow>
<mml:mi>#</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext>in</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo><</mml:mo>
<mml:mi>L</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>L</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr></mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
<disp-formula id="Equb">
<alternatives>
<tex-math id="M5">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ N_{i,i + 1} \in \left\{ {AA,AC,AG,...,TG,TT} \right\} $$ \end{document}</tex-math>
<mml:math id="M6">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mfenced close="}" open="{" separators="">
<mml:mrow>
<mml:mtext mathvariant="italic">AA</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">AC</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">AG</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mi>...</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">TG</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">TT</mml:mtext>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equb.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</list-item>
<list-item>
<p>2-mer (left) feature:
<disp-formula id="Equc">
<alternatives>
<tex-math id="M7">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ z = \left\{ {\begin{array}{cc} {\frac{{\# N_{i-1,i} \, \text{in} \, S}}{{L - 1}}} & {\text{if} \,\, i > 1} \\ 0 & {\text{if} \,\, i = 1} \\ \end{array}} \right. $$ \end{document}</tex-math>
<mml:math id="M8">
<mml:mrow>
<mml:mi>z</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable class="array" columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mfrac>
<mml:mrow>
<mml:mi>#</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext>in</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo>></mml:mo>
<mml:mn>1</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr></mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equc.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
<disp-formula id="Equd">
<alternatives>
<tex-math id="M9">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ N_{i-1,i} \in \left\{ {AA,AC,AG,...,TG,TT} \right\} $$ \end{document}</tex-math>
<mml:math id="M10">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mfenced close="}" open="{" separators="">
<mml:mrow>
<mml:mtext mathvariant="italic">AA</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">AC</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">AG</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mi>...</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">TG</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">TT</mml:mtext>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equd.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</list-item>
<list-item>
<p>3-mer feature:
<disp-formula id="Eque">
<alternatives>
<tex-math id="M11">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ t = \left\{ {\begin{array}{cc} {\frac{{\# N_{i,i+1,i+2} \, \text{in} \, S}}{{L - 2}}} & {\text{if} \,\, i < L-1} \\ 0 & {\text{otherwise }} \\ \end{array}} \right. $$ \end{document}</tex-math>
<mml:math id="M12">
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable class="array" columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mfrac>
<mml:mrow>
<mml:mi>#</mml:mi>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext>in</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mo></mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:mfrac>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>i</mml:mi>
<mml:mo><</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mn>0</mml:mn>
</mml:mtd>
<mml:mtd>
<mml:mtext>otherwise</mml:mtext>
</mml:mtd>
</mml:mtr>
<mml:mtr></mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Eque.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
<disp-formula id="Equf">
<alternatives>
<tex-math id="M13">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$ N_{i,i+1,i+2} \in \left\{ {AAA,AAC,AAG,...,TTG,TTT} \right\} $$ \end{document}</tex-math>
<mml:math id="M14">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>,</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mfenced close="}" open="{" separators="">
<mml:mrow>
<mml:mtext mathvariant="italic">AAA</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">AAC</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">AAG</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mi>...</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">TTG</mml:mtext>
<mml:mo>,</mml:mo>
<mml:mtext mathvariant="italic">TTT</mml:mtext>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equf.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</list-item>
</list>
</p>
<p>Thus, each enhancer sample with length 200 is encoded by a matrix of size 200×8.</p>
</sec>
<sec id="Sec12">
<title>CNN architecture</title>
<p>Our proposed CNN architecture is described in Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
. The network input is a 200×8 matrix encoding a sequence with length 200. The network consists of six 1-D CNN blocks with batch normalization. Besides, for every three blocks of 1-D CNN, there is one 1-D max pooling layer. After the CNN and the max pooling layers, 768 features are obtained and fed into two fully connected layers with 768 and 256 input neurons using the rectified linear unit (ReLU) and sigmoid activation functions, respectively, to produce a probability of being an enhancer for the input sequence. The same architecture is used to classify strong enhancers and weak enhancers. The models were trained within 20 epochs using the binary cross entropy loss with Adam optimizer [
<xref ref-type="bibr" rid="CR34">34</xref>
] and the learning rate of 0.0001. For each CNN model, the optimal network was selected corresponding to the epoch at which the loss on the validation set was minimal.
<fig id="Fig4">
<label>Fig. 4</label>
<caption>
<p>Architecture of the proposed CNN models</p>
</caption>
<graphic xlink:href="12864_2019_6336_Fig4_HTML" id="MO4"></graphic>
</fig>
</p>
</sec>
<sec id="Sec13">
<title>Ensemble model</title>
<p>The training process finished with 5 trained CNN models for each layer. For each independent test sample passing through those 5 CNN models, 5 hypotheses (probabilities):
<italic>H</italic>
<sub>1</sub>
,
<italic>H</italic>
<sub>2</sub>
,
<italic>H</italic>
<sub>3</sub>
,
<italic>H</italic>
<sub>4</sub>
, and
<italic>H</italic>
<sub>5</sub>
were independently computed. We tested the following ensemble methods in order to select the most effective one.
<list list-type="bullet">
<list-item>
<p>
<italic>The Voting method</italic>
: At first, the class of each hypothesis under the threshold of 0.5 were determined to collect 5 class hypotheses. The resultant class was decided based on the frequency of the outcome.</p>
</list-item>
<list-item>
<p>
<italic>The Averaging method</italic>
: The hypothesis
<italic>H</italic>
was calculated as the average value of these five hypotheses under the threshold of 0.5 to give the final result.</p>
</list-item>
<list-item>
<p>
<italic>The Median method</italic>
: The hypothesis
<italic>H</italic>
was calculated as the median value of these five hypotheses under the threshold of 0.5 to suggest the final result.</p>
</list-item>
</list>
</p>
<p>The threshold of 0.5 was chosen since that value is the default decision threshold in most of classification algorithms. Since our preliminary screening shows the Averaging method worked more effectively compared to others in this study, we adopted this method to construct the ensemble models.</p>
</sec>
<sec id="Sec14">
<title>Model evaluation</title>
<p>To evaluate the model performance, evaluation metrics including accuracy (ACC), sensitivity (SN), specificity (SP), Matthews’s correlation coefficient (MCC), and Area Under the ROC Curve (AUC), were used. TP, FP, TN, and FN are abbreviated terms of True Positive, False Positive, True Negative, and False Negative values, respectively. The mathematical formulas of these metrics are expressed below:
<disp-formula id="Equ1">
<label>1</label>
<alternatives>
<tex-math id="M15">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{@{}rcl@{}} \text{Accuracy}\:(ACC) = \frac{TP+TN}{TP+TN+FP+FN}, \end{array} $$ \end{document}</tex-math>
<mml:math id="M16">
<mml:mtable class="eqnarray" columnalign="left center right">
<mml:mtr>
<mml:mtd class="eqnarray-1">
<mml:mtext>Accuracy</mml:mtext>
<mml:mspace width="2.22144pt"></mml:mspace>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">ACC</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equ1.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>
<disp-formula id="Equ2">
<label>2</label>
<alternatives>
<tex-math id="M17">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{@{}rcl@{}} \text{Specificity}\:(SP) = \frac{TN}{TN+FP}, \end{array} $$ \end{document}</tex-math>
<mml:math id="M18">
<mml:mtable class="eqnarray" columnalign="left center right">
<mml:mtr>
<mml:mtd class="eqnarray-1">
<mml:mtext>Specificity</mml:mtext>
<mml:mspace width="2.22144pt"></mml:mspace>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">SP</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext mathvariant="italic">TN</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equ2.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>
<disp-formula id="Equ3">
<label>3</label>
<alternatives>
<tex-math id="M19">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{@{}rcl@{}} \text{Sensitivity}\:(SN) = \frac{TP}{TP+FN}, \end{array} $$ \end{document}</tex-math>
<mml:math id="M20">
<mml:mtable class="eqnarray" columnalign="left center right">
<mml:mtr>
<mml:mtd class="eqnarray-1">
<mml:mtext>Sensitivity</mml:mtext>
<mml:mspace width="2.22144pt"></mml:mspace>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">SN</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
</mml:mrow>
</mml:mfrac>
<mml:mo>,</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equ3.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>
<disp-formula id="Equ4">
<label>4</label>
<alternatives>
<tex-math id="M21">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{@{}rcl@{}} \textrm{MCC} = \frac{TP{\times}TN-FP{\times}FN}{\sqrt{(TP+FP)(TP\,+\,FN)(TN\,+\,FP)(TN\,+\,FN)}}. \end{array} $$ \end{document}</tex-math>
<mml:math id="M22">
<mml:mtable class="eqnarray" columnalign="left center right">
<mml:mtr>
<mml:mtd class="eqnarray-1">
<mml:mtext>MCC</mml:mtext>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>×</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mo></mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>×</mml:mo>
<mml:mtext mathvariant="italic">FN</mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:msqrt>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mo>+</mml:mo>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TP</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext mathvariant="italic">FN</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext mathvariant="italic">FP</mml:mtext>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mtext mathvariant="italic">TN</mml:mtext>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mtext mathvariant="italic">FN</mml:mtext>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2019_6336_Article_Equ4.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</sec>
</sec>
</body>
<back>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>AUC</term>
<def>
<p>Area under the ROC curve</p>
</def>
</def-item>
<def-item>
<term>CNN</term>
<def>
<p>Convolutional neural network</p>
</def>
</def-item>
<def-item>
<term>ECNN</term>
<def>
<p>Ensemble of CNN</p>
</def>
</def-item>
<def-item>
<term>MCC</term>
<def>
<p>Matthew’s correlation coefficient</p>
</def>
</def-item>
<def-item>
<term>OHE</term>
<def>
<p>One-hot encoding</p>
</def>
</def-item>
<def-item>
<term>PseKNC</term>
<def>
<p>Pseudo k-tuple nucleotide composition</p>
</def>
</def-item>
<def-item>
<term>ReLU</term>
<def>
<p>Rectified Linear Unit</p>
</def>
</def-item>
<def-item>
<term>RF</term>
<def>
<p>Random Forest</p>
</def>
</def-item>
<def-item>
<term>ROC</term>
<def>
<p>Reciever operating characteristic</p>
</def>
</def-item>
<def-item>
<term>SVM</term>
<def>
<p>Support vector machine</p>
</def>
</def-item>
</def-list>
</glossary>
<fn-group>
<fn>
<p>
<bold>Publisher’s Note</bold>
</p>
<p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
</fn>
</fn-group>
<ack>
<title>Acknowledgements</title>
<p>BPN and QHN gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V and Titan Xp GPUs used for this research.</p>
<sec id="d29e2875">
<title>About this supplement</title>
<p>This article has been published as part of
<italic>BMC Genomics, Volume 20 Supplement 9, 2019: 18th International Conference on Bioinformatics</italic>
. The full contents of the supplement are available at
<ext-link ext-link-type="uri" xlink:href="https://bmcgenomics.biomedcentral.com/articles/supplements/volume-20-supplement-9">https://bmcgenomics.biomedcentral.com/articles/supplements/volume-20-supplement-9</ext-link>
.</p>
</sec>
</ack>
<notes notes-type="author-contribution">
<title>Authors’ contributions</title>
<p>BPN and QHN designed the framework and experiments. QHN developed the code and performed the experiments. THNV and BPN wrote the manuscript. NQKL contributed to data preparation and interpretation of experimental results. TTTD contributed to the first draft of the manuscript. SR interpreted experimental results and significantly revised the manuscript. All authors have read and approved the final manuscript.</p>
</notes>
<notes notes-type="funding-information">
<title>Funding</title>
<p>The work of S. Rahardja was supported in part by the Overseas Expertise Introduction Project for Discipline Innovation (111 project: B18041). Publication of this supplement was covered by the authors.</p>
</notes>
<notes notes-type="data-availability">
<title>Availability of data and materials</title>
<p>The benchmark dataset used in this study were collected from the previous work of Liu et al., 2016. The benchmark dataset were downloaded from the Supplementary Section of the paper entitled “iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach" by Liu et al.. (10.1093/bioinformatics/bty458). Our source code is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/ngphubinh/enhancers">https://github.com/ngphubinh/enhancers</ext-link>
.</p>
</notes>
<notes>
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</notes>
<notes>
<title>Consent for publication</title>
<p>Not applicable.</p>
</notes>
<notes notes-type="COI-statement">
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</notes>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pennacchio</surname>
<given-names>LA</given-names>
</name>
<name>
<surname>Bickmore</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Dean</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Nobrega</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Bejerano</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>Enhancers: five essential questions</article-title>
<source>Nat Rev Genet</source>
<year>2013</year>
<volume>14</volume>
<issue>4</issue>
<fpage>288</fpage>
<pub-id pub-id-type="doi">10.1038/nrg3458</pub-id>
<pub-id pub-id-type="pmid">23503198</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Long</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lan</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K-C</given-names>
</name>
</person-group>
<article-title>iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition</article-title>
<source>Bioinformatics</source>
<year>2015</year>
<volume>32</volume>
<issue>3</issue>
<fpage>362</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btv604</pub-id>
<pub-id pub-id-type="pmid">26476782</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Heintzman</surname>
<given-names>ND</given-names>
</name>
<name>
<surname>Stuart</surname>
<given-names>RK</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Fu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Ching</surname>
<given-names>CW</given-names>
</name>
<name>
<surname>Hawkins</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Barrera</surname>
<given-names>LO</given-names>
</name>
<name>
<surname>Calcar</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Qu</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Ching</surname>
<given-names>KA</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Weng</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Green</surname>
<given-names>RD</given-names>
</name>
<name>
<surname>Crawford</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
</person-group>
<article-title>Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome</article-title>
<source>Nat Genet</source>
<year>2007</year>
<volume>39</volume>
<issue>3</issue>
<fpage>311</fpage>
<pub-id pub-id-type="doi">10.1038/ng1966</pub-id>
<pub-id pub-id-type="pmid">17277777</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Blow</surname>
<given-names>MJ</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Akiyama</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Holt</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Plajzer-Frick</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Shoukry</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Wright</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Afzal</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Rubin</surname>
<given-names>EM</given-names>
</name>
<name>
<surname>Pennacchio</surname>
<given-names>LA</given-names>
</name>
</person-group>
<article-title>ChIP-seq accurately predicts tissue-specific activity of enhancers</article-title>
<source>Nature</source>
<year>2009</year>
<volume>457</volume>
<issue>7231</issue>
<fpage>854</fpage>
<pub-id pub-id-type="doi">10.1038/nature07730</pub-id>
<pub-id pub-id-type="pmid">19212405</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kulaeva</surname>
<given-names>OI</given-names>
</name>
<name>
<surname>Nizovtseva</surname>
<given-names>EV</given-names>
</name>
<name>
<surname>Polikanov</surname>
<given-names>YS</given-names>
</name>
<name>
<surname>Ulianov</surname>
<given-names>SV</given-names>
</name>
<name>
<surname>Studitsky</surname>
<given-names>VM</given-names>
</name>
</person-group>
<article-title>Distant activation of transcription: Mechanisms of enhancer action</article-title>
<source>Mol Cell Biol</source>
<year>2012</year>
<volume>32</volume>
<issue>24</issue>
<fpage>4892</fpage>
<lpage>7</lpage>
<pub-id pub-id-type="doi">10.1128/MCB.01127-12</pub-id>
<pub-id pub-id-type="pmid">23045397</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Lan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Yuan</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Liao</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Xiao</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>X</given-names>
</name>
</person-group>
<article-title>DiseaseEnhancer: a resource of human disease-associated enhancer catalog</article-title>
<source>Nucleic Acids Res</source>
<year>2017</year>
<volume>46</volume>
<issue>D1</issue>
<fpage>78</fpage>
<lpage>84</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkx920</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Corradin</surname>
<given-names>O</given-names>
</name>
<name>
<surname>Scacheri</surname>
<given-names>PC</given-names>
</name>
</person-group>
<article-title>Enhancer variants: evaluating functions in common disease</article-title>
<source>Genome Med</source>
<year>2014</year>
<volume>6</volume>
<issue>10</issue>
<fpage>85</fpage>
<pub-id pub-id-type="doi">10.1186/s13073-014-0085-3</pub-id>
<pub-id pub-id-type="pmid">25473424</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Herz</surname>
<given-names>H-M</given-names>
</name>
</person-group>
<article-title>Enhancer deregulation in cancer and other diseases</article-title>
<source>BioEssays</source>
<year>2016</year>
<volume>38</volume>
<issue>10</issue>
<fpage>1003</fpage>
<lpage>15</lpage>
<pub-id pub-id-type="doi">10.1002/bies.201600106</pub-id>
<pub-id pub-id-type="pmid">27570183</pub-id>
</element-citation>
</ref>
<ref id="CR9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Boyd</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Thodberg</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Vitezic</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Bornholdt</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Vitting-Seerup</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Coskun</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Lo</surname>
<given-names>BZS</given-names>
</name>
<name>
<surname>Klausen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Schweiger</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Pedersen</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Rapin</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Skovgaard</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Dahlgaard</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Andersson</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Terkelsen</surname>
<given-names>TB</given-names>
</name>
<name>
<surname>Lilje</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Troelsen</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Petersen</surname>
<given-names>AM</given-names>
</name>
<name>
<surname>Jensen</surname>
<given-names>KB</given-names>
</name>
<name>
<surname>Gögenur</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Thielsen</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Seidelin</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Nielsen</surname>
<given-names>OH</given-names>
</name>
<name>
<surname>Bjerrum</surname>
<given-names>JT</given-names>
</name>
<name>
<surname>Sandelin</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>Characterization of the enhancer and promoter landscape of inflammatory bowel disease from human colon biopsies</article-title>
<source>Nat Commun</source>
<year>2018</year>
<volume>9</volume>
<issue>1</issue>
<fpage>1661</fpage>
<pub-id pub-id-type="doi">10.1038/s41467-018-03766-z</pub-id>
<pub-id pub-id-type="pmid">29695774</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Visel</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Bristow</surname>
<given-names>J</given-names>
</name>
<name>
<surname>A.Pennacchio</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Enhancer identification through comparative genomics</article-title>
<source>Semin Cell Dev Biol</source>
<year>2007</year>
<volume>18</volume>
<issue>1</issue>
<fpage>140</fpage>
<lpage>52</lpage>
<pub-id pub-id-type="doi">10.1016/j.semcdb.2006.12.014</pub-id>
<pub-id pub-id-type="pmid">17276707</pub-id>
</element-citation>
</ref>
<ref id="CR11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zacher</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Michel</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schwalb</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Cramer</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Tresch</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Gagneur</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Accurate promoter and enhancer identification in 127 ENCODE and Roadmap Epigenomics Cell Types and Tissues by GenoSTAN</article-title>
<source>PloS ONE</source>
<year>2017</year>
<volume>12</volume>
<issue>1</issue>
<fpage>0169249</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0169249</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lai</surname>
<given-names>Y-T</given-names>
</name>
<name>
<surname>Deem</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Borràs-Castells</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Sambrani</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Rudolf</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Suryamohan</surname>
<given-names>K</given-names>
</name>
<name>
<surname>El-Sherif</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Halfon</surname>
<given-names>MS</given-names>
</name>
<name>
<surname>Tomoyasu</surname>
<given-names>DJM</given-names>
</name>
</person-group>
<article-title>Enhancer identification and activity evaluation in the red flour beetle, Tribolium castaneum</article-title>
<source>Development</source>
<year>2018</year>
<volume>145</volume>
<issue>7</issue>
<fpage>160663</fpage>
<pub-id pub-id-type="doi">10.1242/dev.160663</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<collab>The ENCODE Project Consortium</collab>
</person-group>
<article-title>An integrated encyclopedia of DNA elements in the human genome</article-title>
<source>Nature</source>
<year>2012</year>
<volume>489</volume>
<issue>7414</issue>
<fpage>57</fpage>
<pub-id pub-id-type="doi">10.1038/nature11247</pub-id>
<pub-id pub-id-type="pmid">22955616</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yip</surname>
<given-names>KY</given-names>
</name>
<name>
<surname>Cheng</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Bhardwaj</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Brown</surname>
<given-names>JB</given-names>
</name>
<name>
<surname>Leng</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Kundaje</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Rozowsky</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Bickel</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Snyder</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Gerstein</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors</article-title>
<source>Genome Biol</source>
<year>2012</year>
<volume>13</volume>
<issue>9</issue>
<fpage>48</fpage>
<pub-id pub-id-type="doi">10.1186/gb-2012-13-9-r48</pub-id>
</element-citation>
</ref>
<ref id="CR15">
<label>15</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bernstein</surname>
<given-names>BE</given-names>
</name>
<name>
<surname>Stamatoyannopoulos</surname>
<given-names>JA</given-names>
</name>
<name>
<surname>Costello</surname>
<given-names>JF</given-names>
</name>
<name>
<surname>Ren</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Milosavljevic</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Meissner</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Kellis</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Marra</surname>
<given-names>MA</given-names>
</name>
<name>
<surname>Beaudet</surname>
<given-names>AL</given-names>
</name>
<name>
<surname>Ecker</surname>
<given-names>JR</given-names>
</name>
<name>
<surname>Farnham</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Hirst</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Lander</surname>
<given-names>ES</given-names>
</name>
<name>
<surname>Mikkelsen</surname>
<given-names>TS</given-names>
</name>
<name>
<surname>Thomson</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>The NIH Roadmap Epigenomics Mapping Consortium</article-title>
<source>Nat Biotechnol</source>
<year>2010</year>
<volume>28</volume>
<issue>10</issue>
<fpage>1045</fpage>
<pub-id pub-id-type="doi">10.1038/nbt1010-1045</pub-id>
<pub-id pub-id-type="pmid">20944595</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rabani</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Raychowdhury</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Jovanovic</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rooney</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Stumpo</surname>
<given-names>DJ</given-names>
</name>
<name>
<surname>Pauli</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Hacohen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Schier</surname>
<given-names>AF</given-names>
</name>
<name>
<surname>Blackshear</surname>
<given-names>PJ</given-names>
</name>
<name>
<surname>Friedman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Amit</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Regev</surname>
<given-names>A</given-names>
</name>
</person-group>
<article-title>High-resolution sequencing and modeling identifies distinct dynamic rna regulatory strategies</article-title>
<source>Cell</source>
<year>2014</year>
<volume>159</volume>
<issue>7</issue>
<fpage>1698</fpage>
<lpage>710</lpage>
<pub-id pub-id-type="doi">10.1016/j.cell.2014.11.015</pub-id>
<pub-id pub-id-type="pmid">25497548</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17</label>
<mixed-citation publication-type="other">Miller C, Schwalb B, Maier K, Schulz D, Dümcke S, Zacher B, Mayer A, Sydow J, Marcinowski L, Dölken L, Martin DE, Tresch A, Cramer P. Dynamic transcriptome analysis measures rates of mRNA synthesis and decay in yeast. Mol Syst Biol. 2011; 7(1). 10.1038/msb.2010.112.</mixed-citation>
</ref>
<ref id="CR18">
<label>18</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Churchman</surname>
<given-names>LS</given-names>
</name>
<name>
<surname>Weissman</surname>
<given-names>JS</given-names>
</name>
</person-group>
<article-title>Nascent transcript sequencing visualizes transcription at nucleotide resolution</article-title>
<source>Nature</source>
<year>2011</year>
<volume>469</volume>
<issue>7330</issue>
<fpage>368</fpage>
<pub-id pub-id-type="doi">10.1038/nature09652</pub-id>
<pub-id pub-id-type="pmid">21248844</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G</given-names>
</name>
</person-group>
<article-title>In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences</article-title>
<source>Sci Rep</source>
<year>2016</year>
<volume>6</volume>
<fpage>32476</fpage>
<pub-id pub-id-type="doi">10.1038/srep32476</pub-id>
<pub-id pub-id-type="pmid">27582178</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Firpi</surname>
<given-names>HA</given-names>
</name>
<name>
<surname>Ucar</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Tan</surname>
<given-names>K</given-names>
</name>
</person-group>
<article-title>Discover regulatory DNA elements using chromatin signatures and artificial neural network</article-title>
<source>Bioinformatics</source>
<year>2010</year>
<volume>26</volume>
<issue>13</issue>
<fpage>1579</fpage>
<lpage>86</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btq248</pub-id>
<pub-id pub-id-type="pmid">20453004</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Erwin</surname>
<given-names>GD</given-names>
</name>
<name>
<surname>Oksenberg</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Truty</surname>
<given-names>RM</given-names>
</name>
<name>
<surname>Kostka</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Murphy</surname>
<given-names>KK</given-names>
</name>
<name>
<surname>Ahituv</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Pollard</surname>
<given-names>KS</given-names>
</name>
<name>
<surname>Capra</surname>
<given-names>JA</given-names>
</name>
</person-group>
<article-title>Integrating diverse datasets improves developmental enhancer prediction</article-title>
<source>PLoS Comput Biol</source>
<year>2014</year>
<volume>10</volume>
<issue>6</issue>
<fpage>1003677</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pcbi.1003677</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22</label>
<mixed-citation publication-type="other">Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: A random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol. 2013; 9(3). 10.1371/journal.pcbi.1003677.</mixed-citation>
</ref>
<ref id="CR23">
<label>23</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Gan</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>A new method for enhancer prediction based on deep belief network</article-title>
<source>BMC Bioinformatics</source>
<year>2017</year>
<volume>18</volume>
<issue>12</issue>
<fpage>418</fpage>
<pub-id pub-id-type="doi">10.1186/s12859-017-1828-0</pub-id>
<pub-id pub-id-type="pmid">29072144</pub-id>
</element-citation>
</ref>
<ref id="CR24">
<label>24</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Min</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Predicting enhancers with deep convolutional neural networks</article-title>
<source>BMC Bioinformatics</source>
<year>2017</year>
<volume>18</volume>
<issue>13</issue>
<fpage>478</fpage>
<pub-id pub-id-type="doi">10.1186/s12859-017-1878-3</pub-id>
<pub-id pub-id-type="pmid">29219068</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Fang</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Long</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Lan</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K-C</given-names>
</name>
</person-group>
<article-title>iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition</article-title>
<source>Bioinformatics</source>
<year>2015</year>
<volume>32</volume>
<issue>3</issue>
<fpage>362</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btv604</pub-id>
<pub-id pub-id-type="pmid">26476782</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiaa</surname>
<given-names>C</given-names>
</name>
<name>
<surname>He</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features</article-title>
<source>Sci Rep</source>
<year>2016</year>
<volume>6</volume>
<fpage>38741</fpage>
<pub-id pub-id-type="doi">10.1038/srep38741</pub-id>
<pub-id pub-id-type="pmid">27941893</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>D-S</given-names>
</name>
<name>
<surname>Chou</surname>
<given-names>K-C</given-names>
</name>
</person-group>
<article-title>iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach</article-title>
<source>Bioinformatics</source>
<year>2018</year>
<volume>34</volume>
<issue>22</issue>
<fpage>3835</fpage>
<lpage>42</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bty458</pub-id>
<pub-id pub-id-type="pmid">29878118</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Crooks</surname>
<given-names>GE</given-names>
</name>
<name>
<surname>Hon</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Chandonia</surname>
<given-names>J-M</given-names>
</name>
<name>
<surname>Brenner</surname>
<given-names>SE</given-names>
</name>
</person-group>
<article-title>WebLogo: a sequence logo generator</article-title>
<source>Genome Res</source>
<year>2004</year>
<volume>14</volume>
<issue>6</issue>
<fpage>1188</fpage>
<lpage>90</lpage>
<pub-id pub-id-type="doi">10.1101/gr.849004</pub-id>
<pub-id pub-id-type="pmid">15173120</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>D.Schneider</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Stephens</surname>
<given-names>RM</given-names>
</name>
</person-group>
<article-title>Sequence logos: a new way to display consensus sequences</article-title>
<source>Nucleic Acids Res</source>
<year>1990</year>
<volume>18</volume>
<issue>20</issue>
<fpage>6097</fpage>
<lpage>100</lpage>
<pub-id pub-id-type="doi">10.1093/nar/18.20.6097</pub-id>
<pub-id pub-id-type="pmid">2172928</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chicco</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Ten quick tips for machine learning in computational biology</article-title>
<source>BioData Min</source>
<year>2017</year>
<volume>10</volume>
<issue>1</issue>
<fpage>35</fpage>
<pub-id pub-id-type="doi">10.1186/s13040-017-0155-3</pub-id>
<pub-id pub-id-type="pmid">29234465</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<label>31</label>
<mixed-citation publication-type="other">Zhuang Z, Shen X, Pan W. A simple convolutional neural network for prediction of enhancer–promoter interactions with DNA sequence data. Bioinformatics. 2019:1–8. 10.1093/bioinformatics/bty1050.</mixed-citation>
</ref>
<ref id="CR32">
<label>32</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Min</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Zeng</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Jiang</surname>
<given-names>R</given-names>
</name>
</person-group>
<article-title>Predicting enhancers with deep convolutional neural networks</article-title>
<source>BMC Bioinformatics</source>
<year>2017</year>
<volume>18</volume>
<issue>13</issue>
<fpage>478</fpage>
<pub-id pub-id-type="doi">10.1186/s12859-017-1878-3</pub-id>
<pub-id pub-id-type="pmid">29219068</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fu</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Niu</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Zhu</surname>
<given-names>Z</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>CD-HIT: accelerated for clustering the next-generation sequencing data</article-title>
<source>Bioinformatics</source>
<year>2012</year>
<volume>28</volume>
<issue>23</issue>
<fpage>3150</fpage>
<lpage>2</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/bts565</pub-id>
<pub-id pub-id-type="pmid">23060610</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<label>34</label>
<mixed-citation publication-type="other">Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000306  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000306  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021