Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000266 ( Pmc/Corpus ); précédent : 0002659; suivant : 0002670 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">MicroRNA categorization using sequence motifs and k-mers</title>
<author>
<name sortKey="Yousef, Malik" sort="Yousef, Malik" uniqKey="Yousef M" first="Malik" last="Yousef">Malik Yousef</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI"> 0000 0004 0418 023X</institution-id>
<institution-id institution-id-type="GRID">grid.460169.c</institution-id>
<institution></institution>
<institution>Community Information Systems, Zefat Academic College,</institution>
</institution-wrap>
Zefat, 13206 Israel</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Khalifa, Waleed" sort="Khalifa, Waleed" uniqKey="Khalifa W" first="Waleed" last="Khalifa">Waleed Khalifa</name>
<affiliation>
<nlm:aff id="Aff2">Computer Science, The College of Sakhnin, Sakhnin, 30810 Israel</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Acar, Lhan Erkin" sort="Acar, Lhan Erkin" uniqKey="Acar " first=" Lhan Erkin" last="Acar"> Lhan Erkin Acar</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9261 240X</institution-id>
<institution-id institution-id-type="GRID">grid.419609.3</institution-id>
<institution></institution>
<institution>Biotechnology, Izmir Institute of Technology,</institution>
</institution-wrap>
35430 Urla, Izmir Turkey</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Allmer, Jens" sort="Allmer, Jens" uniqKey="Allmer J" first="Jens" last="Allmer">Jens Allmer</name>
<affiliation>
<nlm:aff id="Aff4">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9261 240X</institution-id>
<institution-id institution-id-type="GRID">grid.419609.3</institution-id>
<institution></institution>
<institution>Molecular Biology and Genetics, Izmir Institute of Technology,</institution>
</institution-wrap>
35430 Urla, Izmir Turkey</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Bionia Incorporated, IZTEKGEB A8, 35430 Urla, Izmir Turkey</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28292266</idno>
<idno type="pmc">5351198</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5351198</idno>
<idno type="RBID">PMC:5351198</idno>
<idno type="doi">10.1186/s12859-017-1584-1</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000266</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000266</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">MicroRNA categorization using sequence motifs and k-mers</title>
<author>
<name sortKey="Yousef, Malik" sort="Yousef, Malik" uniqKey="Yousef M" first="Malik" last="Yousef">Malik Yousef</name>
<affiliation>
<nlm:aff id="Aff1">
<institution-wrap>
<institution-id institution-id-type="ISNI"> 0000 0004 0418 023X</institution-id>
<institution-id institution-id-type="GRID">grid.460169.c</institution-id>
<institution></institution>
<institution>Community Information Systems, Zefat Academic College,</institution>
</institution-wrap>
Zefat, 13206 Israel</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Khalifa, Waleed" sort="Khalifa, Waleed" uniqKey="Khalifa W" first="Waleed" last="Khalifa">Waleed Khalifa</name>
<affiliation>
<nlm:aff id="Aff2">Computer Science, The College of Sakhnin, Sakhnin, 30810 Israel</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Acar, Lhan Erkin" sort="Acar, Lhan Erkin" uniqKey="Acar " first=" Lhan Erkin" last="Acar"> Lhan Erkin Acar</name>
<affiliation>
<nlm:aff id="Aff3">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9261 240X</institution-id>
<institution-id institution-id-type="GRID">grid.419609.3</institution-id>
<institution></institution>
<institution>Biotechnology, Izmir Institute of Technology,</institution>
</institution-wrap>
35430 Urla, Izmir Turkey</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Allmer, Jens" sort="Allmer, Jens" uniqKey="Allmer J" first="Jens" last="Allmer">Jens Allmer</name>
<affiliation>
<nlm:aff id="Aff4">
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9261 240X</institution-id>
<institution-id institution-id-type="GRID">grid.419609.3</institution-id>
<institution></institution>
<institution>Molecular Biology and Genetics, Izmir Institute of Technology,</institution>
</institution-wrap>
35430 Urla, Izmir Turkey</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="Aff5">Bionia Incorporated, IZTEKGEB A8, 35430 Urla, Izmir Turkey</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species.</p>
</sec>
<sec>
<title>Results</title>
<p>To achieve distinction into species, we used one species’ pre-miRNAs as the positive and another species’ pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and
<italic>k</italic>
-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-017-1584-1) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Erson Bensan, Ae" uniqKey="Erson Bensan A">AE Erson-Bensan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bartel, Dp" uniqKey="Bartel D">DP Bartel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grey, F" uniqKey="Grey F">F Grey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yousef, M" uniqKey="Yousef M">M Yousef</name>
</author>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
<author>
<name sortKey="Khalifaa, W" uniqKey="Khalifaa W">W Khalifaa</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kozomara, A" uniqKey="Kozomara A">A Kozomara</name>
</author>
<author>
<name sortKey="Griffiths Jones, S" uniqKey="Griffiths Jones S">S Griffiths-Jones</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Londin, E" uniqKey="Londin E">E Londin</name>
</author>
<author>
<name sortKey="Loher, P" uniqKey="Loher P">P Loher</name>
</author>
<author>
<name sortKey="Telonis, Ag" uniqKey="Telonis A">AG Telonis</name>
</author>
<author>
<name sortKey="Quann, K" uniqKey="Quann K">K Quann</name>
</author>
<author>
<name sortKey="Clark, P" uniqKey="Clark P">P Clark</name>
</author>
<author>
<name sortKey="Jing, Y" uniqKey="Jing Y">Y Jing</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sacar, Md" uniqKey="Sacar M">MD Saçar</name>
</author>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
<author>
<name sortKey="Yousef, M" uniqKey="Yousef M">M Yousef</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yones, Ca" uniqKey="Yones C">CA Yones</name>
</author>
<author>
<name sortKey="Stegmayer, G" uniqKey="Stegmayer G">G Stegmayer</name>
</author>
<author>
<name sortKey="Kamenetzky, L" uniqKey="Kamenetzky L">L Kamenetzky</name>
</author>
<author>
<name sortKey="Milone, Dh" uniqKey="Milone D">DH Milone</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yousef, M" uniqKey="Yousef M">M Yousef</name>
</author>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
<author>
<name sortKey="Khalifa, W" uniqKey="Khalifa W">W Khalifa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, J" uniqKey="Ding J">J Ding</name>
</author>
<author>
<name sortKey="Zhou, S" uniqKey="Zhou S">S Zhou</name>
</author>
<author>
<name sortKey="Guan, J" uniqKey="Guan J">J Guan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jiang, P" uniqKey="Jiang P">P Jiang</name>
</author>
<author>
<name sortKey="Wu, H" uniqKey="Wu H">H Wu</name>
</author>
<author>
<name sortKey="Wang, W" uniqKey="Wang W">W Wang</name>
</author>
<author>
<name sortKey="Ma, W" uniqKey="Ma W">W Ma</name>
</author>
<author>
<name sortKey="Sun, X" uniqKey="Sun X">X Sun</name>
</author>
<author>
<name sortKey="Lu, Z" uniqKey="Lu Z">Z Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Khalifa, W" uniqKey="Khalifa W">W Khalifa</name>
</author>
<author>
<name sortKey="Yousef, M" uniqKey="Yousef M">M Yousef</name>
</author>
<author>
<name sortKey="Sacar Demirci, Md" uniqKey="Sacar Demirci M">MD Saçar Demirci</name>
</author>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liang, H" uniqKey="Liang H">H Liang</name>
</author>
<author>
<name sortKey="Li, W H" uniqKey="Li W">W-H Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lu, J" uniqKey="Lu J">J Lu</name>
</author>
<author>
<name sortKey="Shen, Y" uniqKey="Shen Y">Y Shen</name>
</author>
<author>
<name sortKey="Wu, Q" uniqKey="Wu Q">Q Wu</name>
</author>
<author>
<name sortKey="Kumar, S" uniqKey="Kumar S">S Kumar</name>
</author>
<author>
<name sortKey="He, B" uniqKey="He B">B He</name>
</author>
<author>
<name sortKey="Shi, S" uniqKey="Shi S">S Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fahlgren, N" uniqKey="Fahlgren N">N Fahlgren</name>
</author>
<author>
<name sortKey="Howell, Md" uniqKey="Howell M">MD Howell</name>
</author>
<author>
<name sortKey="Kasschau, Kd" uniqKey="Kasschau K">KD Kasschau</name>
</author>
<author>
<name sortKey="Chapman, Ej" uniqKey="Chapman E">EJ Chapman</name>
</author>
<author>
<name sortKey="Sullivan, Cm" uniqKey="Sullivan C">CM Sullivan</name>
</author>
<author>
<name sortKey="Cumbie, Js" uniqKey="Cumbie J">JS Cumbie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ding, J" uniqKey="Ding J">J Ding</name>
</author>
<author>
<name sortKey="Zhou, S" uniqKey="Zhou S">S Zhou</name>
</author>
<author>
<name sortKey="Guan, J" uniqKey="Guan J">J Guan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="O N Lopes, I De" uniqKey="O N Lopes I">I de O. N. Lopes</name>
</author>
<author>
<name sortKey="Schliep, A" uniqKey="Schliep A">A Schliep</name>
</author>
<author>
<name sortKey="De L F De Carvalho, Ap" uniqKey="De L F De Carvalho A">AP de L. F. de Carvalho</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y Wu</name>
</author>
<author>
<name sortKey="Wei, B" uniqKey="Wei B">B Wei</name>
</author>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H Liu</name>
</author>
<author>
<name sortKey="Li, T" uniqKey="Li T">T Li</name>
</author>
<author>
<name sortKey="Rayner, S" uniqKey="Rayner S">S Rayner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gerlach, D" uniqKey="Gerlach D">D Gerlach</name>
</author>
<author>
<name sortKey="Kriventseva, Ev" uniqKey="Kriventseva E">EV Kriventseva</name>
</author>
<author>
<name sortKey="Rahman, N" uniqKey="Rahman N">N Rahman</name>
</author>
<author>
<name sortKey="Vejnar, Ce" uniqKey="Vejnar C">CE Vejnar</name>
</author>
<author>
<name sortKey="Zdobnov, Em" uniqKey="Zdobnov E">EM Zdobnov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ng, Kls" uniqKey="Ng K">KLS Ng</name>
</author>
<author>
<name sortKey="Mishra, Sk" uniqKey="Mishra S">SK Mishra</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xue, C" uniqKey="Xue C">C Xue</name>
</author>
<author>
<name sortKey="Li, F" uniqKey="Li F">F Li</name>
</author>
<author>
<name sortKey="He, T" uniqKey="He T">T He</name>
</author>
<author>
<name sortKey="Liu, G P" uniqKey="Liu G">G-P Liu</name>
</author>
<author>
<name sortKey="Li, Y" uniqKey="Li Y">Y Li</name>
</author>
<author>
<name sortKey="Zhang, X" uniqKey="Zhang X">X Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Batuwita, R" uniqKey="Batuwita R">R Batuwita</name>
</author>
<author>
<name sortKey="Palade, V" uniqKey="Palade V">V Palade</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Van Der Burgt, A" uniqKey="Van Der Burgt A">A van der Burgt</name>
</author>
<author>
<name sortKey="Fiers, Mwje" uniqKey="Fiers M">MWJE Fiers</name>
</author>
<author>
<name sortKey="Nap, J P" uniqKey="Nap J">J-P Nap</name>
</author>
<author>
<name sortKey="Van Ham, Rchj" uniqKey="Van Ham R">RCHJ van Ham</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ba C, C" uniqKey="Ba C C">C Bağcı</name>
</author>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Boden, M" uniqKey="Boden M">M Boden</name>
</author>
<author>
<name sortKey="Buske, Fa" uniqKey="Buske F">FA Buske</name>
</author>
<author>
<name sortKey="Frith, M" uniqKey="Frith M">M Frith</name>
</author>
<author>
<name sortKey="Grant, Ce" uniqKey="Grant C">CE Grant</name>
</author>
<author>
<name sortKey="Clementi, L" uniqKey="Clementi L">L Clementi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bailey, Tl" uniqKey="Bailey T">TL Bailey</name>
</author>
<author>
<name sortKey="Elkan, C" uniqKey="Elkan C">C Elkan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yousef, M" uniqKey="Yousef M">M Yousef</name>
</author>
<author>
<name sortKey="Khalifa, W" uniqKey="Khalifa W">W Khalifa</name>
</author>
<author>
<name sortKey="Acar, E" uniqKey="Acar ">İE Acar</name>
</author>
<author>
<name sortKey="Allmer, J" uniqKey="Allmer J">J Allmer</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Vapnik, Vn" uniqKey="Vapnik V">VN Vapnik</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Xu, Q S" uniqKey="Xu Q">Q-S Xu</name>
</author>
<author>
<name sortKey="Liang, Y Z" uniqKey="Liang Y">Y-Z Liang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Amaldi, E" uniqKey="Amaldi E">E Amaldi</name>
</author>
<author>
<name sortKey="Kann, V" uniqKey="Kann V">V Kann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Matthews, Bw" uniqKey="Matthews B">BW Matthews</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Letunic, I" uniqKey="Letunic I">I Letunic</name>
</author>
<author>
<name sortKey="Bork, P" uniqKey="Bork P">P Bork</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Bioinformatics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Bioinformatics</journal-id>
<journal-title-group>
<journal-title>BMC Bioinformatics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2105</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28292266</article-id>
<article-id pub-id-type="pmc">5351198</article-id>
<article-id pub-id-type="publisher-id">1584</article-id>
<article-id pub-id-type="doi">10.1186/s12859-017-1584-1</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>MicroRNA categorization using sequence motifs and k-mers</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Yousef</surname>
<given-names>Malik</given-names>
</name>
<address>
<email>malik.yousef@gmail.com</email>
</address>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Khalifa</surname>
<given-names>Waleed</given-names>
</name>
<address>
<email>walid.khalifa@gmail.com</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Acar</surname>
<given-names>İlhan Erkin</given-names>
</name>
<address>
<email>i.erkin.acar@gmail.com</email>
</address>
<xref ref-type="aff" rid="Aff3">3</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">http://orcid.org/0000-0002-2164-7335</contrib-id>
<name>
<surname>Allmer</surname>
<given-names>Jens</given-names>
</name>
<address>
<email>jens@allmer.de</email>
</address>
<xref ref-type="aff" rid="Aff4">4</xref>
<xref ref-type="aff" rid="Aff5">5</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
<institution-wrap>
<institution-id institution-id-type="ISNI"> 0000 0004 0418 023X</institution-id>
<institution-id institution-id-type="GRID">grid.460169.c</institution-id>
<institution></institution>
<institution>Community Information Systems, Zefat Academic College,</institution>
</institution-wrap>
Zefat, 13206 Israel</aff>
<aff id="Aff2">
<label>2</label>
Computer Science, The College of Sakhnin, Sakhnin, 30810 Israel</aff>
<aff id="Aff3">
<label>3</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9261 240X</institution-id>
<institution-id institution-id-type="GRID">grid.419609.3</institution-id>
<institution></institution>
<institution>Biotechnology, Izmir Institute of Technology,</institution>
</institution-wrap>
35430 Urla, Izmir Turkey</aff>
<aff id="Aff4">
<label>4</label>
<institution-wrap>
<institution-id institution-id-type="ISNI">0000 0000 9261 240X</institution-id>
<institution-id institution-id-type="GRID">grid.419609.3</institution-id>
<institution></institution>
<institution>Molecular Biology and Genetics, Izmir Institute of Technology,</institution>
</institution-wrap>
35430 Urla, Izmir Turkey</aff>
<aff id="Aff5">
<label>5</label>
Bionia Incorporated, IZTEKGEB A8, 35430 Urla, Izmir Turkey</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>14</day>
<month>3</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>14</day>
<month>3</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="collection">
<year>2017</year>
</pub-date>
<volume>18</volume>
<elocation-id>170</elocation-id>
<history>
<date date-type="received">
<day>18</day>
<month>11</month>
<year>2016</year>
</date>
<date date-type="accepted">
<day>4</day>
<month>3</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s). 2017</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Post-transcriptional gene dysregulation can be a hallmark of diseases like cancer and microRNAs (miRNAs) play a key role in the modulation of translation efficiency. Known pre-miRNAs are listed in miRBase, and they have been discovered in a variety of organisms ranging from viruses and microbes to eukaryotic organisms. The computational detection of pre-miRNAs is of great interest, and such approaches usually employ machine learning to discriminate between miRNAs and other sequences. Many features have been proposed describing pre-miRNAs, and we have previously introduced the use of sequence motifs and k-mers as useful ones. There have been reports of xeno-miRNAs detected via next generation sequencing. However, they may be contaminations and to aid that important decision-making process, we aimed to establish a means to differentiate pre-miRNAs from different species.</p>
</sec>
<sec>
<title>Results</title>
<p>To achieve distinction into species, we used one species’ pre-miRNAs as the positive and another species’ pre-miRNAs as the negative training and test data for the establishment of machine learned models based on sequence motifs and
<italic>k</italic>
-mers as features. This approach resulted in higher accuracy values between distantly related species while species with closer relation produced lower accuracy values.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>We were able to differentiate among species with increasing success when the evolutionary distance increases. This conclusion is supported by previous reports of fast evolutionary changes in miRNAs since even in relatively closely related species a fairly good discrimination was possible.</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12859-017-1584-1) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>microRNA</kwd>
<kwd>Sequence motifs</kwd>
<kwd>Pre-microRNA</kwd>
<kwd>Machine learning</kwd>
<kwd>Differentiate miRNAs among species</kwd>
<kwd>k-mer</kwd>
<kwd>miRNA categorization</kwd>
</kwd-group>
<funding-group>
<award-group>
<funding-source>
<institution>The Scientific and Technological Research Council of Turkey</institution>
</funding-source>
<award-id>113E326</award-id>
<principal-award-recipient>
<name>
<surname>Allmer</surname>
<given-names>Jens</given-names>
</name>
</principal-award-recipient>
</award-group>
<award-group>
<funding-source>
<institution>Zefat Academic College</institution>
</funding-source>
<award-id>-</award-id>
<principal-award-recipient>
<name>
<surname>Yousef</surname>
<given-names>Malik</given-names>
</name>
</principal-award-recipient>
</award-group>
</funding-group>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2017</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1" sec-type="introduction">
<title>Background</title>
<p>Gene expression can be fine-tuned on several levels, but dysregulation often leads to disease. MicroRNAs (miRNAs) are involved in post-transcriptional gene regulation [
<xref ref-type="bibr" rid="CR1">1</xref>
] which modulates protein abundance by fine-tuning translation rates [
<xref ref-type="bibr" rid="CR2">2</xref>
]. MicroRNAs contain a short stretch of nucleotides (~20) acting as a recognition sequence to direct the RNA-induced silencing complex (RISC) complex to its target mRNA. This regulation mechanism exists in a wide range of species like viruses [
<xref ref-type="bibr" rid="CR3">3</xref>
] and plants [
<xref ref-type="bibr" rid="CR4">4</xref>
]. Although the plant miRNA pathway is said to have evolved independently of the metazoan one [
<xref ref-type="bibr" rid="CR5">5</xref>
], the secondary pre-miRNA structures appear to be similar when visually inspected on miRBase [
<xref ref-type="bibr" rid="CR6">6</xref>
] which houses known pre-miRNAs and their mature miRNAs. Release 21 of miRBase contains 28,645 mature miRNAs (2588 for human), but the existence of many more miRNAs can be expected [
<xref ref-type="bibr" rid="CR7">7</xref>
]. The experimental detection of miRNAs is, however, convoluted by the fact that they can only convey function when co-expressed with their target mRNAs [
<xref ref-type="bibr" rid="CR8">8</xref>
]. Therefore, and since it seems futile to try and discover all miRNAs of an organism experimentally, computational prediction of miRNAs has become important. Most such approaches employ machine learning using two-class classification [
<xref ref-type="bibr" rid="CR9">9</xref>
,
<xref ref-type="bibr" rid="CR10">10</xref>
].</p>
<p>The so-called ab initio miRNA detection methodology has been well established in animals [
<xref ref-type="bibr" rid="CR11">11</xref>
], and we have shown that it also works well in plants [
<xref ref-type="bibr" rid="CR4">4</xref>
]. Machine learning depends on the parameterization of the biological structure, and many features have been described to represent a pre-miRNA numerically [
<xref ref-type="bibr" rid="CR12">12</xref>
,
<xref ref-type="bibr" rid="CR13">13</xref>
] to which we have recently added sequence motifs [
<xref ref-type="bibr" rid="CR14">14</xref>
]. These features are used to differentiate between the positive (miRNA) and the negative class employing a variety of classifiers like support vector machines [
<xref ref-type="bibr" rid="CR15">15</xref>
] and random forest [
<xref ref-type="bibr" rid="CR16">16</xref>
]. Unfortunately, bona fide negative pre-miRNA examples do not exist and, therefore, using two-class classification is limited and suffers from the use of arbitrary negative data of unknown quality [
<xref ref-type="bibr" rid="CR17">17</xref>
].</p>
<p>Here we used similar strategies as other two-class classification approaches for pre-miRNA detection, however, with a different intention. The purpose of the present study was to differentiate pre-miRNAs between two species. That means both positive and negative classes for training were derived from known pre-miRNAs which removed the need to employ pseudo negative data. This approach is viable for miRNAs because fast evolution has been shown to exist for them before [
<xref ref-type="bibr" rid="CR18">18</xref>
<xref ref-type="bibr" rid="CR20">20</xref>
] so that given larger evolutionary distances at least the miRNA sequences should deviate enough to allow discrimination. Hence, we focused on sequence-based features and motifs to achieve proper discrimination. Previously, Ding et al. used n-grams (same as our k-mers) to create miRNA families [
<xref ref-type="bibr" rid="CR21">21</xref>
], which was a similar intention but from a different perspective. Ding et al. tried to solve the multi-class problem of assigning an unknown miRNA to its correct miRNA family which does not represent a species but the membership of a miRNA to a family of miRNAs which consists of miRNAs from different species, which are evolutionary conserved. Lopes et al. also attempted to discriminate between species [
<xref ref-type="bibr" rid="CR22">22</xref>
], but used the same synthetic negative data that is generally used in pre-miRNA detection methods [
<xref ref-type="bibr" rid="CR23">23</xref>
<xref ref-type="bibr" rid="CR26">26</xref>
] and employed the same training and testing strategies as other approaches [
<xref ref-type="bibr" rid="CR16">16</xref>
,
<xref ref-type="bibr" rid="CR27">27</xref>
<xref ref-type="bibr" rid="CR29">29</xref>
]. They further focused on structural features which we found not to be useful for discriminating between closely related species since the structure is generally more conserved than sequence composition. An important contribution of the present work is that it overcomes the use of arbitrary negative examples of unknown quality by using the data of one species for positive examples and the data of the other species for negative examples and vice versa. In summary, one of the purposes of the present study was to discriminate between two species using pre-microRNAs. Additionally, we aimed to establish a range for evolutionary distance at which differentiation into species can be achieved. We were able to show that discrimination among hominids is fairly impossible while the comparison between, for example, human and worms is straightforward. In the future, pre-miRNA classification strategy which can assign an unknown pre-miRNA to the most likely species of origin may be developed, which will be important in studies depending on deep sequencing data which often contain contaminating sequences [
<xref ref-type="bibr" rid="CR30">30</xref>
].</p>
</sec>
<sec id="Sec2" sec-type="materials|methods">
<title>Methods</title>
<sec id="Sec3">
<title>Datasets</title>
<p>We downloaded microRNAs from three different clades (Hominidae, Nematoda, and Pisces) available on miRBase (Release 21); for details see Table 
<xref rid="Tab1" ref-type="table">1</xref>
.
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>List of the species whose miRNAs were used in the present study and their amounts available on miRBase. The number next to the species grouping (e.g.: Hominidae) indicates the total amount of miRNAs for that group</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th>Species</th>
<th>Number of pre-miRNAs</th>
<th>Species</th>
<th>Number of pre-miRNAs</th>
<th>Species</th>
<th>Number of pre-miRNAs</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hominidae</td>
<td>3629</td>
<td>Nematoda</td>
<td>1856</td>
<td>Pisces</td>
<td>1623</td>
</tr>
<tr>
<td>
<italic>Gorilla gorilla</italic>
</td>
<td>352</td>
<td>
<italic>Ascaris suum</italic>
</td>
<td>97</td>
<td>
<italic>Cyprinus carpio</italic>
</td>
<td>134</td>
</tr>
<tr>
<td>
<italic>Homo sapiens</italic>
</td>
<td>1881</td>
<td>
<italic>Brugia malayi</italic>
</td>
<td>115</td>
<td>
<italic>Danio rerio</italic>
</td>
<td>346</td>
</tr>
<tr>
<td>
<italic>Pan paniscus</italic>
</td>
<td>88</td>
<td>
<italic>Caenorhabditis brenneri</italic>
</td>
<td>214</td>
<td>
<italic>Fugu rubripes</italic>
</td>
<td>131</td>
</tr>
<tr>
<td>
<italic>Pongo pygmaeus</italic>
</td>
<td>642</td>
<td>
<italic>Caenorhabditis briggsae</italic>
</td>
<td>175</td>
<td>
<italic>Hippoglossus hippoglossus</italic>
</td>
<td>40</td>
</tr>
<tr>
<td>
<italic>Pan troglodytes</italic>
</td>
<td>655</td>
<td>
<italic>Caenorhabditis elegans</italic>
</td>
<td>250</td>
<td>
<italic>Ictalurus punctatus</italic>
</td>
<td>281</td>
</tr>
<tr>
<td>
<italic>Symphalangus syndactylus</italic>
</td>
<td>11</td>
<td>
<italic>Caenorhabditis remanei</italic>
</td>
<td>157</td>
<td>
<italic>Oryzias latipes</italic>
</td>
<td>168</td>
</tr>
<tr>
<td></td>
<td></td>
<td>
<italic>Haemonchus contortus</italic>
</td>
<td>188</td>
<td>
<italic>Paralichthys olivaceus</italic>
</td>
<td>20</td>
</tr>
<tr>
<td></td>
<td></td>
<td>
<italic>Pristionchus pacificus</italic>
</td>
<td>354</td>
<td>
<italic>Salmo salar</italic>
</td>
<td>371</td>
</tr>
<tr>
<td></td>
<td></td>
<td>
<italic>Panagrellus redivivus</italic>
</td>
<td>200</td>
<td>
<italic>Tetraodon nigroviridis</italic>
</td>
<td>132</td>
</tr>
<tr>
<td></td>
<td></td>
<td>
<italic>Strongyloides ratti</italic>
</td>
<td>106</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Pre-miRNAs in Table 
<xref rid="Tab1" ref-type="table">1</xref>
were filtered according to sequence similarity on a per species basis to ensure that there is no bias due to multiple identical pre-miRNAs and for human; for example, from the initial 1881 available pre-miRNAs 121 were filtered leaving 1760 for machine learning.</p>
<p>In addition to the main data used in this study (Table 
<xref rid="Tab1" ref-type="table">1</xref>
), we used several clades from miRBase and during those experiments; all pre-miRNAs from all species in those clades were combined into one dataset. For example, the Fabaceae dataset consisted of
<italic>Acacia auriculiformis</italic>
,
<italic>Arachis hypogaea</italic>
,
<italic>Acacia mangium</italic>
,
<italic>Glycine max</italic>
,
<italic>Glycine soja</italic>
,
<italic>Lotus japonicus</italic>
,
<italic>Medicago truncatula</italic>
,
<italic>Phaseolus vulgaris</italic>
, and
<italic>Vigna unguiculata</italic>
totaling about 1400 pre-miRNAs.</p>
</sec>
<sec id="Sec4">
<title>Parameterization of pre-miRNAs</title>
<sec id="Sec5">
<title>K-mers</title>
<p>Simple sequence-based features have been described and used for ab initio pre-miRNA detection in numerous studies. These sequence features, also called words, k-mers, or n-grams, describe a short sequence of nucleotides. For example, a 1-mer over the relevant alphabet can produce the words A, U, C, and G; while a 2-mer over {A, U, C, G} can generate: AA, AC, …, and UU. Higher k have also been used [
<xref ref-type="bibr" rid="CR31">31</xref>
], but here we chose 1, 2, and 3-mers as features since most previous studies restrict k (<= 3), because longer k are less likely to be exactly conserved among species, and since sequence motifs cover longer sequences as features. For counting their frequency, each k-mer was detected in the input sequences and divided by the number of k-mers in the sequence given by len(sequence) - k + 1. We calculated k-mers with k = {1, 2, 3} resulting in 84 different features per example.</p>
</sec>
<sec id="Sec6">
<title>Motif features</title>
<p>Motif features are different from
<italic>k</italic>
-mers in that they are not exact and allow some degree of error-tolerance. Here a sequence motif is a short stretch of nucleotides that is frequent among a set of pre-miRNAs. Motif discovery, in turn, is the process of finding such short sequences within a larger pool of sequences. The MEME (Multiple Expectation Maximization for Motif Elicitation) Suite [
<xref ref-type="bibr" rid="CR32">32</xref>
] was used for motif discovery. The algorithm is based on [
<xref ref-type="bibr" rid="CR33">33</xref>
] which works by repeatedly searching for ungapped sequence motifs that occur within input sequences. MEME turned out to be the bottleneck in our analysis workflow, causing long processing times for motif extraction. MEME provides the results as regular expressions and sequence profiles. In our previous work, we represented motifs by using the regular expressions provided by MEME [
<xref ref-type="bibr" rid="CR4">4</xref>
,
<xref ref-type="bibr" rid="CR14">14</xref>
,
<xref ref-type="bibr" rid="CR34">34</xref>
]. However, regular expressions only allow for equally probable options at each position and, therefore, profiles are more discriminative since they allow frequencies for each nucleotide option at each sequence position. We, thus, chose profiles to calculate motif scores. 100 motifs were discovered using MEME on a per species basis. Thus 200 motif features were calculated for each input sequence; 100 from either species. We chose 100 motifs per class since for some experiments in this work only few examples were available, and choosing more than 100 motifs would have led to few sequences supporting each discovered motif. 100 motifs mean, that on average (considering all experiments in this study) we expect ten examples to support each motif. For calculation, profiles were aligned with the target sequence and shifted along until the end of the profile reached the end of the sequence or vice versa in case the profile is longer than the sequence. At each position, a score was calculated by adding up the frequencies in the profile for matching nucleotides at their respective positions. The motif position leading to the highest score was reported as the final score for that input sequence. Motif lengths ranged between 11 and 50 with an average of 38. Among selected motifs (i.e.: passing feature selection; see below), the average length was about 40 (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1). The amount of selected motif features among experiments ranged between 15 and 84% with an average of about 40% motif features among the selected ones (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1, Selected Motifs). The number of selected motif features is strongly impacted by the amount of data available. This impact leads to fewest number of selected motifs for
<italic>Gorilla gorilla</italic>
(30%) followed by
<italic>Homo sapiens</italic>
(43%) and most selected motifs for experiments involving Hominidae (51%; Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1, Selected Motifs).</p>
</sec>
<sec id="Sec7">
<title>Feature vector and feature selection</title>
<p>Each example is described by 84 k-mer and 200 motif features. However, not all features are equally efficient in separating between positive and negative class. Since information gain has previously been used for feature selection [
<xref ref-type="bibr" rid="CR35">35</xref>
], we used KNIME (version 3.1.2) [
<xref ref-type="bibr" rid="CR36">36</xref>
] to calculate information gain on a per experiment basis. The 100 features with highest information gain were accepted as the feature set used during model establishment to select from the possible features in the present study:</p>
<p>A, C, G, U, AA, AC, AG, AU, CA, CC, CG, CU, GA, GC, GG, GU, UA, UC, UG, UU, AAA, AAC, AAG, AAU, ACA, ACC, ACG, ACU, AGA, AGC, AGG, AGU, AUA, AUC, AUG, AUU, CAA, CAC, CAG, CAU, CCA, CCC, CCG, CCU, CGA, CGC, CGG, CGU, CUA, CUC, CUG, CUU, GAA, GAC, GAG, GAU, GCA, GCC, GCG, GCU, GGA, GGC, GGG, GGU, GUA, GUC, GUG, GUU, UAA, UAC, UAG, UAU, UCG, UCU, UCA, UCU, UGA, UGC, UGG, UGU, UUA, UUC, UUG, UUU, Motif1, Motif2, Motif3, …, Motifn; where n = 200.</p>
<p>Information gain as available in KNIME is implemented according to Yang and Pedersen [
<xref ref-type="bibr" rid="CR37">37</xref>
]. It describes the goodness of a term and in this case how well a feature separates between the positive and negative class compared to other available features. We have previously shown that 50 features may be enough to establish successful models [
<xref ref-type="bibr" rid="CR12">12</xref>
] but chose to be conservative here and used 100 features. Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S6 shows the impact of number of features for test data and holdout data for this study.</p>
</sec>
</sec>
<sec id="Sec8">
<title>Classification approach</title>
<p>Initially, we performed tests using support vector machines [
<xref ref-type="bibr" rid="CR38">38</xref>
], decision trees (DT), Naive Bayes (NB), and random forest (RF) classifiers, but since RF generally outperformed the other methods, we only used RF for the remainder of the study. All classifiers used are part of the data analytics platform KNIME [
<xref ref-type="bibr" rid="CR36">36</xref>
], and we used that platform for all analyses. The classifiers were trained and tested using the following parameters. Initially, 10% of the examples were set aside as holdout data, and the remaining 90% of the data were split into 80% training and 20% testing data. Negative and positive examples were forced to equal amounts since we showed that that is important for the successful model establishment in pre-miRNA detection [
<xref ref-type="bibr" rid="CR12">12</xref>
]. 100-fold Monte Carlo cross-validation [
<xref ref-type="bibr" rid="CR39">39</xref>
] was used to establish the model, and its performance was recorded for each fold. Additionally, for each fold performance was tested on the holdout dataset (Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
). Feature selection is computationally expensive [
<xref ref-type="bibr" rid="CR40">40</xref>
] and was, therefore, done before training the models. Additionally, we tested the difference when performing feature selection in each cross-validation iteration (24) for one example (Hominidae vs. Laurasiatheria). We found that features generally achieved similar ranks for the 24 iterations (Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1; Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S1). Additionally, we observed that there was no relevant impact on the accuracy distribution for the 24 tests (Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figure S2). Therefore, we used the model establishment schema as described in Fig. 
<xref rid="Fig1" ref-type="fig">1</xref>
.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Workflow for model establishment. Data is transformed into a feature vector, and the best 100 features are selected. Initially, 10% data is withheld from the 100-fold MCCV training and testing scheme. All performance measures for testing and holdout data are collected during CV and reported at the end of the workflow</p>
</caption>
<graphic xlink:href="12859_2017_1584_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<sec id="Sec9">
<title>Performance evaluation</title>
<p>For each established model we calculated a number of performance measures like the Matthews correlation coefficient (MCC) for the evaluation of the classifier such as sensitivity, specificity and accuracy according to the following formulations (with TP: true positive, FP: false positive, TN: true negative, and FN referring to false negative classifications): [
<xref ref-type="bibr" rid="CR41">41</xref>
]
<disp-formula id="Equa">
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \begin{array}{l}\mathrm{Sensitivity} = \mathrm{TP}/\left(\mathrm{TP} + \mathrm{F}\mathrm{N}\right);\ \mathrm{SE},\ \mathrm{Recall}\\ {}\mathrm{Specificity} = \mathrm{TN}/\left(\mathrm{TN} + \mathrm{F}\mathrm{P}\right);\ \mathrm{SP}\\ {}\mathrm{P}\mathrm{recision} = \mathrm{TP}/\left(\mathrm{TP} + \mathrm{F}\mathrm{P}\right)\\ {}\mathrm{F}\hbox{-} \mathrm{Measure} = 2\ *\left(\mathrm{precision}\ *\ \mathrm{recall}\right)/\left(\mathrm{precision} + \mathrm{recall}\right)\\ {}\mathrm{Accuracy} = \left(\mathrm{TP} + \mathrm{TN}\right)/\left(\mathrm{TP} + \mathrm{TN} + \mathrm{F}\mathrm{P} + \mathrm{F}\mathrm{N}\right);\ \mathrm{ACC}\\ {}\mathrm{MCC}\kern0.5em =\frac{\left(\mathrm{TP}\backslash\ \mathrm{TN}\hbox{-} \mathrm{F}\mathrm{P}\backslash\ \mathrm{F}\mathrm{N}\right)}{\sqrt{\left(\mathrm{TP}+\mathrm{FP}\right)\left(\mathrm{TP}+\mathrm{FN}\right)\left(\mathrm{TN}+\mathrm{FN}\right)\left(\mathrm{TN}+\mathrm{FP}\right)}};\end{array} $$\end{document}</tex-math>
<mml:math id="M2">
<mml:mtable columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">Sensitivity</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mo stretchy="true">/</mml:mo>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>;</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">S</mml:mi>
<mml:mi mathvariant="normal">E</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">Recall</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">Specificity</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mo stretchy="true">/</mml:mo>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>;</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">S</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">Precision</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>=</mml:mo>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mo stretchy="true">/</mml:mo>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="normal">Measure</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>=</mml:mo>
<mml:mn>2</mml:mn>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>*</mml:mo>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">precision</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>*</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">recall</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo stretchy="true">/</mml:mo>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">precision</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">recall</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">Accuracy</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>=</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo stretchy="true">/</mml:mo>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mo>+</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo>;</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">A</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mi mathvariant="normal">M</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
<mml:mi mathvariant="normal">C</mml:mi>
<mml:mspace width="0.5em"></mml:mspace>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mo>\</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mo>\</mml:mo>
<mml:mspace width="0.25em"></mml:mspace>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:msqrt>
<mml:mrow>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mfenced close=")" open="(">
<mml:mrow>
<mml:mi mathvariant="normal">T</mml:mi>
<mml:mi mathvariant="normal">N</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi mathvariant="normal">F</mml:mi>
<mml:mi mathvariant="normal">P</mml:mi>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:msqrt>
</mml:mfrac>
<mml:mtext>;</mml:mtext>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12859_2017_1584_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>All reported performance measures refer to the average of 100-fold Monte Carlo Cross Validation (MCCV). Since single statistics (e.g.: averages) are of limited value to describe machine learned models, and since receiver operator characteristic curves for hundreds of trained models would be hard to assess, we calculated accuracy distribution for all models trained and used them to describe model performance.</p>
</sec>
</sec>
</sec>
<sec id="Sec10" sec-type="results">
<title>Results and discussion</title>
<p>The random forest (RF) classifier was used to establish machine learned models using a 10/80/20 split for holdout, training, and testing, respectively. 100-fold MCCV was used to train, test, and apply models to constant holdout data. The number of pre-miRNA examples available on miRBase per species is quite variable and to ensure similar numbers of positive and negative examples, groups of species had to be considered. One such group is Hominidae which consists of human and the great apes. Specifically,
<italic>Homo sapiens</italic>
,
<italic>Gorilla gorilla</italic>
,
<italic>Pan paniscus</italic>
,
<italic>Pongo pygmaeus</italic>
,
<italic>Pan troglodytes</italic>
, and
<italic>Symphalangus syndactylus</italic>
have available pre-miRNA examples in miRBase (Table 
<xref rid="Tab1" ref-type="table">1</xref>
). Taking Hominidae as positive data and pre-miRNAs from various other groups as negative data models to differentiate the groups were trained and their performance established (Table 
<xref rid="Tab2" ref-type="table">2</xref>
).
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Average performance of models trained to classify into hominidae or one of the listed clades. The best 100 features were selected based on information gain and training/testing was performed with a 10/80/20 split at 100-fold MCCV</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th rowspan="2">Hominidae
<break></break>
vs.</th>
<th colspan="3">Holdout</th>
<th colspan="3">Test</th>
</tr>
<tr>
<th>F-measure</th>
<th>Accuracy</th>
<th>MCC</th>
<th>F-measure</th>
<th>Accuracy</th>
<th>MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hexapoda</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.86</td>
</tr>
<tr>
<td>Brassicaceae</td>
<td char="." align="char">0.82</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.78</td>
<td char="." align="char">0.92</td>
<td char="." align="char">0.92</td>
<td char="." align="char">0.84</td>
</tr>
<tr>
<td>Monocotyle</td>
<td char="." align="char">0.88</td>
<td char="." align="char">0.92</td>
<td char="." align="char">0.83</td>
<td char="." align="char">0.91</td>
<td char="." align="char">0.91</td>
<td char="." align="char">0.82</td>
</tr>
<tr>
<td>Nematoda</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.91</td>
<td char="." align="char">0.80</td>
<td char="." align="char">0.90</td>
<td char="." align="char">0.90</td>
<td char="." align="char">0.80</td>
</tr>
<tr>
<td>Fabaceae</td>
<td char="." align="char">0.81</td>
<td char="." align="char">0.88</td>
<td char="." align="char">0.72</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.75</td>
</tr>
<tr>
<td>Pisces</td>
<td char="." align="char">0.80</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.70</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.72</td>
</tr>
<tr>
<td>Virus</td>
<td char="." align="char">0.44</td>
<td char="." align="char">0.83</td>
<td char="." align="char">0.43</td>
<td char="." align="char">0.82</td>
<td char="." align="char">0.82</td>
<td char="." align="char">0.64</td>
</tr>
<tr>
<td>Aves</td>
<td char="." align="char">0.59</td>
<td char="." align="char">0.75</td>
<td char="." align="char">0.41</td>
<td char="." align="char">0.72</td>
<td char="." align="char">0.72</td>
<td char="." align="char">0.45</td>
</tr>
<tr>
<td>Laurasiatheria</td>
<td char="." align="char">0.54</td>
<td char="." align="char">0.73</td>
<td char="." align="char">0.39</td>
<td char="." align="char">0.70</td>
<td char="." align="char">0.72</td>
<td char="." align="char">0.45</td>
</tr>
<tr>
<td>Rodentia</td>
<td char="." align="char">0.62</td>
<td char="." align="char">0.69</td>
<td char="." align="char">0.37</td>
<td char="." align="char">0.69</td>
<td char="." align="char">0.69</td>
<td char="." align="char">0.38</td>
</tr>
<tr>
<td>
<italic>Homo sapiens</italic>
</td>
<td char="." align="char">0.62</td>
<td char="." align="char">0.61</td>
<td char="." align="char">0.23</td>
<td char="." align="char">0.62</td>
<td char="." align="char">0.61</td>
<td char="." align="char">0.23</td>
</tr>
<tr>
<td>Cercopithecidae</td>
<td char="." align="char">0.26</td>
<td char="." align="char">0.51</td>
<td char="." align="char">0.01</td>
<td char="." align="char">0.50</td>
<td char="." align="char">0.50</td>
<td char="." align="char">0.01</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Note, that for the test Hominidae vs
<italic>H. sapiens</italic>
the
<italic>H. sapiens</italic>
examples were removed from Hominidae. Table is sorted according to average model accuracy. This table presents average accuracy values, but Additional file
<xref rid="MOESM2" ref-type="media">2</xref>
: Figures S3-S5 present the accuracy distributions for 100 fold MCCV</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>Performance on holdout data is very similar to the testing performance (Table 
<xref rid="Tab2" ref-type="table">2</xref>
). Classifying into Hominidae or Hexapoda was very accurate (0.93 accuracy) while classification into Hominidae or Cercopithecidae was impossible (0.50 accuracy) which is likely due to the very close evolutionary relationship (Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
). To assess this further, the human pre-miRNA examples were removed from the Hominidae dataset. This data was used to establish a model versus human. A slightly better accuracy of 0.61 compared to Hominidae vs. Cercopithecidae was achieved. Since in Hominidae about half of the pre-miRNA examples stem from human and the evolutionary distance is also very low, a similar result to the one of Hominidae vs. Cercopithecidae was expected.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Phylogenetic relationship among organisms and groups used in the present study (excluding viruses). Itol (
<ext-link ext-link-type="uri" xlink:href="http://itol2.embl.de/">http://itol2.embl.de/</ext-link>
) was used to create the phylogenetic tree [
<xref ref-type="bibr" rid="CR42">42</xref>
]. Newick and PhyloXML formatted files to build the tree are available as Additional files
<xref rid="MOESM3" ref-type="media">3</xref>
and
<xref rid="MOESM4" ref-type="media">4</xref>
: Files S2 and S3, respectively</p>
</caption>
<graphic xlink:href="12859_2017_1584_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p>Results in Table 
<xref rid="Tab2" ref-type="table">2</xref>
and phylogenetic relationship among organisms and groups used in the present study (Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
) show a similar trend. Organisms closely related also show similar average model accuracy, and with increasing phylogenetic distance the average model accuracy also increases in general.</p>
<p>Since an average accuracy can be misleading, the accuracy distribution over 100-fold MCCV during machine learning was reported (Fig. 
<xref rid="Fig3" ref-type="fig">3</xref>
). The interquartile ranges summarizing the 100 fold MCCV model training were quite small and only slightly increased with lower average accuracy. Thereby, confirming that training models was successful on average and not based on outliers or overfitting. Only few virus examples (<300) are available on miRBase and those targeting human also need similar sequences to human miRNAs. On the other hand, those targeting the viruses themselves should not have similar sequences to human. Therefore, the interquartile range is larger for viruses and the overall accuracy distribution is lower than for other examples.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Accuracy distribution over 100-fold MCCV for six selected species and groups of species against Hominidae</p>
</caption>
<graphic xlink:href="12859_2017_1584_Fig3_HTML" id="MO3"></graphic>
</fig>
</p>
<p>
<italic>Gorrilla gorilla</italic>
, also in the hominidae group, has a sufficient amount of pre-miRNA examples to establish a model and, therefore, for human and gorilla versus other species and groups of organisms models were trained in parallel for comparison (Table 
<xref rid="Tab3" ref-type="table">3</xref>
). Since human and gorilla are very closely related, they should show similar average model accuracies when trained against the same species.
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>Average accuracy (ACC) and Matthews correlation coefficient (MCC) for 100-fold MCCV model training using
<italic>Homo sapiens</italic>
(HSA) or
<italic>Gorilla gorilla</italic>
(GGO) as target class and Nematoda or Pisces as other class (sorted by HSA ACC). Results for HSA and GGO vs all Nematoda and Pisces are bolded</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th></th>
<th>Versus</th>
<th>HSA ACC</th>
<th>GGO ACC</th>
<th>HSA MCC</th>
<th>GGO MCC</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="11">Nematoda</td>
<td>
<italic>Caenorhabditis brenneri</italic>
</td>
<td char="." align="char">0.94</td>
<td char="." align="char">0.96</td>
<td char="." align="char">0.88</td>
<td char="." align="char">0.93</td>
</tr>
<tr>
<td>
<italic>Pristionchus pacificus</italic>
</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.94</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.88</td>
</tr>
<tr>
<td>
<italic>Panagrellus redivivus</italic>
</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.96</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.92</td>
</tr>
<tr>
<td>
<italic>Strongyloides ratti</italic>
</td>
<td char="." align="char">0.91</td>
<td char="." align="char">0.95</td>
<td char="." align="char">0.82</td>
<td char="." align="char">0.90</td>
</tr>
<tr>
<td>
<italic>Caenorhabditis remanei</italic>
</td>
<td char="." align="char">0.89</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.78</td>
<td char="." align="char">0.75</td>
</tr>
<tr>
<td>
<italic>Caenorhabditis briggsae</italic>
</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.75</td>
<td char="." align="char">0.72</td>
</tr>
<tr>
<td>
<italic>Ascaris suum</italic>
</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.73</td>
<td char="." align="char">0.75</td>
</tr>
<tr>
<td>
<italic>Haemonchus contortus</italic>
</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.72</td>
<td char="." align="char">0.75</td>
</tr>
<tr>
<td>
<italic>Caenorhabditis elegans</italic>
</td>
<td char="." align="char">0.86</td>
<td char="." align="char">0.87</td>
<td char="." align="char">0.71</td>
<td char="." align="char">0.73</td>
</tr>
<tr>
<td>
<italic>Brugia malayi</italic>
</td>
<td char="." align="char">0.84</td>
<td char="." align="char">0.80</td>
<td char="." align="char">0.68</td>
<td char="." align="char">0.60</td>
</tr>
<tr>
<td>
<italic>Nematoda</italic>
</td>
<td char="." align="char">0.89</td>
<td char="." align="char">0.88</td>
<td char="." align="char">0.78</td>
<td char="." align="char">0.68</td>
</tr>
<tr>
<td rowspan="10">Pisces</td>
<td>
<italic>Salmo salar</italic>
</td>
<td char="." align="char">0.92</td>
<td char="." align="char">0.97</td>
<td char="." align="char">0.84</td>
<td char="." align="char">0.94</td>
</tr>
<tr>
<td>
<italic>Ictalurus punctatus</italic>
</td>
<td char="." align="char">0.89</td>
<td char="." align="char">0.96</td>
<td char="." align="char">0.78</td>
<td char="." align="char">0.92</td>
</tr>
<tr>
<td>
<italic>Paralichthys olivaceus</italic>
</td>
<td char="." align="char">0.84</td>
<td char="." align="char">0.93</td>
<td char="." align="char">0.71</td>
<td char="." align="char">0.87</td>
</tr>
<tr>
<td>
<italic>Oryzias latipes</italic>
</td>
<td char="." align="char">0.83</td>
<td char="." align="char">0.77</td>
<td char="." align="char">0.67</td>
<td char="." align="char">0.56</td>
</tr>
<tr>
<td>
<italic>Danio rerio</italic>
</td>
<td char="." align="char">0.80</td>
<td char="." align="char">0.78</td>
<td char="." align="char">0.60</td>
<td char="." align="char">0.56</td>
</tr>
<tr>
<td>
<italic>Fugu rubripes</italic>
</td>
<td char="." align="char">0.77</td>
<td char="." align="char">0.79</td>
<td char="." align="char">0.55</td>
<td char="." align="char">0.59</td>
</tr>
<tr>
<td>
<italic>Cyprinus carpio</italic>
</td>
<td char="." align="char">0.76</td>
<td char="." align="char">0.77</td>
<td char="." align="char">0.53</td>
<td char="." align="char">0.53</td>
</tr>
<tr>
<td>
<italic>Tetraodon nigroviridis</italic>
</td>
<td char="." align="char">0.76</td>
<td char="." align="char">0.79</td>
<td char="." align="char">0.53</td>
<td char="." align="char">0.58</td>
</tr>
<tr>
<td>
<italic>Hippoglossus hippoglossus</italic>
</td>
<td char="." align="char">0.67</td>
<td char="." align="char">0.69</td>
<td char="." align="char">0.35</td>
<td char="." align="char">0.39</td>
</tr>
<tr>
<td>
<italic>Pisces</italic>
</td>
<td char="." align="char">0.84</td>
<td char="." align="char">0.83</td>
<td char="." align="char">0.68</td>
<td char="." align="char">0.57</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Nematoda are evolutionary distant from Hominidae and it was our expectation to create well-performing models. In general, that expectation correlates with the results and all models achieve more than 80% average accuracy. However, there is a trend towards species with more examples on miRBase to create models which better discriminate between species. More examples generally lead to better models and this finding is just a confirmation of that concept.
<italic>C. elegans</italic>
is an outlier in this respect since it has second most examples on miRBase which indicates that some of those reported pre-miRNAs may not actually be miRNAs. Pisces is evolutionarily closer to human than Nematoda but still distant and, therefore, we expected models with slightly lower performance. In general, this expectation held true although
<italic>H. hippoglossus</italic>
performed particularly bad which is likely due to the low amount of examples (40) some of which may additionally be wrong. Interestingly, the fish with lowest number of examples,
<italic>P. olivaceus</italic>
(20), performed quite well which is likely due to the calculation of performance measures which may return biased results for classes with very few members. It may additionally mean that the reported miRNAs are of high quality. Human and gorilla results are very similar and confirm that the results are not by chance. Furthermore, when training human or gorilla against the complete group of Pisces or Nematoda, results similar to the expected group average are obtained which shows that actual behavior is consistent with the expectation.</p>
<sec id="Sec11">
<title>Motif construction could cause spurious results</title>
<p>In order to ensure, that the results are not due to improper motif selection or due to chance, we performed an experiment with 10-fold MCCV where motifs were extracted from randomly chosen 50% of the input data in each fold. For this experiment, we selected Hominidae versus Laurasiatheria since they represented average performance compared to all other tested models (Table 
<xref rid="Tab2" ref-type="table">2</xref>
). In each fold, 10-fold MCCV was used to establish RF models which lead to a total of 100 RF models.</p>
<p>As expected, the average classification performance (0.71) overall 100-folds was similar to the previous performance (0.72), indicating that feature calculation and extraction were performed properly. Not only was the average performance very similar also the accuracy distribution for these two experiments was (Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
).
<fig id="Fig4">
<label>Fig. 4</label>
<caption>
<p>Model accuracy distribution for models trained with pre-created motifs and for the workflow where motifs were created in each iteration</p>
</caption>
<graphic xlink:href="12859_2017_1584_Fig4_HTML" id="MO4"></graphic>
</fig>
</p>
<p>The interquartile range for the pre-created motifs was somewhat smaller than for the motif re-creation approach (Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
), but that can be expected since only 50% of the data was used for motif finding which should lead to lower quality motifs. Additionally, the average accuracy for the second approach was about a percent lower, but in general, the distributions are similar. Finally, motif re-creation per MCCV fold introduces more outliers which are likely due to overfitting. Therefore, motifs should be discovered using the entire dataset, and they should not be recreated using a subset of the data in each training iteration.</p>
</sec>
</sec>
<sec id="Sec12" sec-type="conclusion">
<title>Conclusions</title>
<p>Machine learning has become an important part of pre-miRNA detection, but it suffers from missing
<italic>bona fide</italic>
negative data [
<xref ref-type="bibr" rid="CR8">8</xref>
]. The current aim in the field is to detect pre-miRNAs in, for example, genomes. A previous classification of pre-miRNAs into groups has also been performed and detected conserved miRNA families [
<xref ref-type="bibr" rid="CR9">9</xref>
]. On the other hand, it has been shown that miRNAs can evolve rapidly [
<xref ref-type="bibr" rid="CR10">10</xref>
<xref ref-type="bibr" rid="CR12">12</xref>
]. Therefore, we were interested in whether a machine learned model could be trained to classify miRNAs based on their species of origin. To achieve this, we used one species’ pre-miRNAs as positive and the other’s pre-miRNAs as negative data for the establishment of models. The features we employ are all sequence-based since structural features should be more conserved thereby concealing smaller evolutionary distances.</p>
<p>We showed that sequence motifs and
<italic>k</italic>
-mer features were properly created (Fig. 
<xref rid="Fig4" ref-type="fig">4</xref>
). In the same way, a model was established for Hominidae versus selected clades available on miRBase, and the average accuracy closely mirrored the evolutionary distance (Table 
<xref rid="Tab2" ref-type="table">2</xref>
; Fig. 
<xref rid="Fig2" ref-type="fig">2</xref>
). To check this result, human and gorilla were used as target species and trained against Nematoda and Pisces species available on miRBase. Both targets lead to comparable results (Table 
<xref rid="Tab3" ref-type="table">3</xref>
), thereby confirming the viability of this approach. In conclusion, we show that a classifier can differentiate between pre-microRNAs from different species using a combined motif and
<italic>k</italic>
-mer signature. In future studies, this may lead to the ability to classify unknown pre-miRNAs into their correct category which is important when attempting studies involving xeno-miRNAs in order to separate interesting results from contamination. To achieve that end, models for all known pairs of species need to be established. Applying all models to an unknown example then creates a fingerprint for that example. After that multi-class classification or clustering (self-organizing maps, nearest neighbor, etc.) can be applied to determine class/cluster membership of the unknown example. This approach would use the distance information of all trained species models and would be much more powerful than applying multi-class classification or clustering directly to the examples. Thereby, unknown examples can be assigned a species of origin using a fingerprint.</p>
</sec>
</body>
<back>
<app-group>
<app id="App1">
<sec id="Sec13">
<title>Additional files</title>
<p>
<media position="anchor" xlink:href="12859_2017_1584_MOESM1_ESM.xlsx" id="MOESM1">
<label>Additional file 1: Table S1.</label>
<caption>
<p>Extracted Motifs: All of the extracted sequence motifs are listed, as well as information gain scores for 24-fold cross validation. (XLSX 345 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2017_1584_MOESM2_ESM.docx" id="MOESM2">
<label>Additional file 2: Figures S1 to S6:</label>
<caption>
<p>
<bold>Figure S1</bold>
shows the rank distribution for k-mers and motif features.
<bold>Figure S2</bold>
displays how feature selection impacts accuracy.
<bold>Figures S3</bold>
and
<bold>S4</bold>
provide additional accuracy distributions for various clades versus hominidae and
<bold>Figure S5</bold>
provides similar information for Cercopitheciadae and Hominidae versus human. Figure S6 supports the choice of selected number of features. (DOCX 653 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2017_1584_MOESM3_ESM.txt" id="MOESM3">
<label>Additional file 3: File S2.</label>
<caption>
<p>Newick formatted phylogenetic tree: This file can be directly uploaded to Itol or other phylogenetic tree viewers for further analysis. (TXT 2 kb)</p>
</caption>
</media>
<media position="anchor" xlink:href="12859_2017_1584_MOESM4_ESM.txt" id="MOESM4">
<label>Additional file 4: File S3.</label>
<caption>
<p>PhyloXML formatted phylogenetic tree: This file can be directly uploaded to Itol or other phylogenetic tree viewers for further analysis. (TXT 7 kb)</p>
</caption>
</media>
</p>
</sec>
</app>
</app-group>
<glossary>
<title>Abbreviations</title>
<def-list>
<def-item>
<term>ACC</term>
<def>
<p>Accuracy</p>
</def>
</def-item>
<def-item>
<term>DT</term>
<def>
<p>Decision tree</p>
</def>
</def-item>
<def-item>
<term>FN</term>
<def>
<p>False negative</p>
</def>
</def-item>
<def-item>
<term>FP</term>
<def>
<p>False positive</p>
</def>
</def-item>
<def-item>
<term>MCC</term>
<def>
<p>Matthews correlation coefficient</p>
</def>
</def-item>
<def-item>
<term>MCCV</term>
<def>
<p>Monte Carlo cross validation</p>
</def>
</def-item>
<def-item>
<term>MEME</term>
<def>
<p>Multiple expectation maximization for motif elicitation</p>
</def>
</def-item>
<def-item>
<term>miRNA</term>
<def>
<p>microRNA</p>
</def>
</def-item>
<def-item>
<term>NB</term>
<def>
<p>Naïve Bayes</p>
</def>
</def-item>
<def-item>
<term>RF</term>
<def>
<p>Random forest</p>
</def>
</def-item>
<def-item>
<term>RISC</term>
<def>
<p>RNA-induced silencing complex</p>
</def>
</def-item>
<def-item>
<term>TN</term>
<def>
<p>True negative</p>
</def>
</def-item>
<def-item>
<term>TP</term>
<def>
<p>True positive</p>
</def>
</def-item>
</def-list>
</glossary>
<ack>
<title>Acknowledgements</title>
<p>Not applicable.</p>
<sec id="FPar2">
<title>Funding</title>
<p>The work was supported by the Scientific and Technological Research Council of Turkey [grant number 113E326] to JA and by Zefat Academic College to MY.</p>
</sec>
<sec id="FPar3">
<title>Availability of data and materials</title>
<p>All of the sequence data was obtained from
<ext-link ext-link-type="uri" xlink:href="http://www.mirbase.org/">www.mirbase.org</ext-link>
. Created motif files are available from the corresponding author upon request, but the finally selected motifs are directly available in Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
: Table S1 as regular expressions. Analysis workflow cannot be shared because of dependencies on only locally available software.</p>
</sec>
<sec id="FPar4">
<title>Authors’ contributions</title>
<p>MY formulated the idea of using motifs as features and configured them accordingly for the data used in this study. WK performed tests for the motif extraction process and automated motif extraction. İEA created the workflow under the supervision of JA and with feedback from MY and WK. JA and MY jointly made strategic decisions for the machine learning approach. JA and MY wrote the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec id="FPar5">
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec id="FPar6">
<title>Consent for publication</title>
<p>Not applicable.</p>
</sec>
<sec id="FPar7">
<title>Ethics approval and consent to participate</title>
<p>Not applicable.</p>
</sec>
<sec id="FPar1">
<title>Publisher’s Note</title>
<p>Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
</sec>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Erson-Bensan</surname>
<given-names>AE</given-names>
</name>
</person-group>
<article-title>Introduction to microRNAs in biological systems</article-title>
<source>Methods Mol Biol</source>
<year>2014</year>
<volume>1107</volume>
<fpage>1</fpage>
<lpage>14</lpage>
<pub-id pub-id-type="doi">10.1007/978-1-62703-748-8_1</pub-id>
<pub-id pub-id-type="pmid">24272428</pub-id>
</element-citation>
</ref>
<ref id="CR2">
<label>2.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bartel</surname>
<given-names>DP</given-names>
</name>
</person-group>
<article-title>MicroRNAs: genomics, biogenesis, mechanism, and function</article-title>
<source>Cell</source>
<year>2004</year>
<volume>116</volume>
<fpage>281</fpage>
<lpage>297</lpage>
<pub-id pub-id-type="doi">10.1016/S0092-8674(04)00045-5</pub-id>
<pub-id pub-id-type="pmid">14744438</pub-id>
</element-citation>
</ref>
<ref id="CR3">
<label>3.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grey</surname>
<given-names>F</given-names>
</name>
</person-group>
<article-title>Role of microRNAs in herpesvirus latency and persistence</article-title>
<source>J Gen Virol</source>
<year>2015</year>
<volume>96</volume>
<fpage>739</fpage>
<lpage>751</lpage>
<pub-id pub-id-type="doi">10.1099/vir.0.070862-0</pub-id>
<pub-id pub-id-type="pmid">25406174</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Yousef</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Khalifaa</surname>
<given-names>W</given-names>
</name>
</person-group>
<source>Plant MicroRNA Prediction employing Sequence Motifs Achieves High Accuracy</source>
<year>2015</year>
</element-citation>
</ref>
<ref id="CR5">
<label>5.</label>
<mixed-citation publication-type="other">Chapman EJ, Carrington JC. Specialization and evolution of endogenous small RNA pathways. Nat. Rev. Genet. Nature Publishing Group; 2007;8:884–96.</mixed-citation>
</ref>
<ref id="CR6">
<label>6.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kozomara</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Griffiths-Jones</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>miRBase: integrating microRNA annotation and deep-sequencing data</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>39</volume>
<fpage>D152</fpage>
<lpage>D157</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkq1027</pub-id>
<pub-id pub-id-type="pmid">21037258</pub-id>
</element-citation>
</ref>
<ref id="CR7">
<label>7.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Londin</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Loher</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Telonis</surname>
<given-names>AG</given-names>
</name>
<name>
<surname>Quann</surname>
<given-names>K</given-names>
</name>
<name>
<surname>Clark</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Jing</surname>
<given-names>Y</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs</article-title>
<source>Proc Natl Acad Sci</source>
<year>2015</year>
<volume>112</volume>
<fpage>E1106</fpage>
<lpage>E1115</lpage>
<pub-id pub-id-type="doi">10.1073/pnas.1420955112</pub-id>
<pub-id pub-id-type="pmid">25713380</pub-id>
</element-citation>
</ref>
<ref id="CR8">
<label>8.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Saçar</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Current Limitations for Computational Analysis of miRNAs in Cancer</article-title>
<source>Pakistan J Clin Biomed Res</source>
<year>2013</year>
<volume>1</volume>
<fpage>3</fpage>
<lpage>5</lpage>
</element-citation>
</ref>
<ref id="CR9">
<label>9.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Computational and bioinformatics methods for microRNA gene prediction</article-title>
<source>Methods Mol Biol</source>
<year>2014</year>
<volume>1107</volume>
<fpage>157</fpage>
<lpage>175</lpage>
<pub-id pub-id-type="doi">10.1007/978-1-62703-748-8_9</pub-id>
<pub-id pub-id-type="pmid">24272436</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10.</label>
<mixed-citation publication-type="other">Saçar M, Allmer J. Machine Learning Methods for MicroRNA Gene Prediction. In: Yousef M, Allmer J, editors. miRNomics MicroRNA Biol. Comput. Anal. SE - 10. Humana Press; 2014. p. 177–87.</mixed-citation>
</ref>
<ref id="CR11">
<label>11.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Yousef</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Computational methods for ab initio detection of microRNAs</article-title>
<source>Front Genet</source>
<year>2012</year>
<volume>3</volume>
<fpage>209</fpage>
<pub-id pub-id-type="doi">10.3389/fgene.2012.00209</pub-id>
<pub-id pub-id-type="pmid">23087705</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12.</label>
<mixed-citation publication-type="other">Sacar MD, Allmer J. Data mining for microrna gene prediction: On the impact of class imbalance and feature number for microrna gene prediction. 2013 8th Int. Symp. Heal. Informatics Bioinforma.IEEE; 2013 p. 1–6.</mixed-citation>
</ref>
<ref id="CR13">
<label>13.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yones</surname>
<given-names>CA</given-names>
</name>
<name>
<surname>Stegmayer</surname>
<given-names>G</given-names>
</name>
<name>
<surname>Kamenetzky</surname>
<given-names>L</given-names>
</name>
<name>
<surname>Milone</surname>
<given-names>DH</given-names>
</name>
</person-group>
<article-title>miRNAfe: A comprehensive tool for feature extraction in microRNA prediction. Biosystems</article-title>
<source>Biosystems</source>
<year>2015</year>
<volume>138</volume>
<fpage>1</fpage>
<lpage>5</lpage>
<pub-id pub-id-type="doi">10.1016/j.biosystems.2015.10.003</pub-id>
<pub-id pub-id-type="pmid">26499212</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yousef</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Khalifa</surname>
<given-names>W</given-names>
</name>
</person-group>
<article-title>Accurate Plant MicroRNA Prediction Can Be Achieved Using Sequence Motif Features</article-title>
<source>J Intell Learn Syst Appl</source>
<year>2016</year>
<volume>8</volume>
<fpage>9</fpage>
<lpage>22</lpage>
</element-citation>
</ref>
<ref id="CR15">
<label>15.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features</article-title>
<source>BMC Bioinformatics</source>
<year>2010</year>
<volume>11</volume>
<issue>Suppl 1</issue>
<fpage>S11</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-11-S11-S11</pub-id>
<pub-id pub-id-type="pmid">21172046</pub-id>
</element-citation>
</ref>
<ref id="CR16">
<label>16.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Jiang</surname>
<given-names>P</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Sun</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Lu</surname>
<given-names>Z</given-names>
</name>
</person-group>
<article-title>MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features</article-title>
<source>Nucleic Acids Res</source>
<year>2007</year>
<volume>35</volume>
<fpage>W339</fpage>
<lpage>W344</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkm368</pub-id>
<pub-id pub-id-type="pmid">17553836</pub-id>
</element-citation>
</ref>
<ref id="CR17">
<label>17.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khalifa</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Yousef</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Saçar Demirci</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>The impact of feature selection on one and two-class classification performance for plant microRNAs</article-title>
<source>PeerJ</source>
<year>2016</year>
<volume>4</volume>
<fpage>e2135</fpage>
<pub-id pub-id-type="doi">10.7717/peerj.2135</pub-id>
<pub-id pub-id-type="pmid">27366641</pub-id>
</element-citation>
</ref>
<ref id="CR18">
<label>18.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Liang</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>W-H</given-names>
</name>
</person-group>
<article-title>Lowly expressed human microRNA genes evolve rapidly</article-title>
<source>Mol Biol Evol</source>
<year>2009</year>
<volume>26</volume>
<fpage>1195</fpage>
<lpage>1198</lpage>
<pub-id pub-id-type="doi">10.1093/molbev/msp053</pub-id>
<pub-id pub-id-type="pmid">19299536</pub-id>
</element-citation>
</ref>
<ref id="CR19">
<label>19.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lu</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Shen</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Kumar</surname>
<given-names>S</given-names>
</name>
<name>
<surname>He</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Shi</surname>
<given-names>S</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The birth and death of microRNA genes in Drosophila</article-title>
<source>Nat Genet</source>
<year>2008</year>
<volume>40</volume>
<fpage>351</fpage>
<lpage>355</lpage>
<pub-id pub-id-type="doi">10.1038/ng.73</pub-id>
<pub-id pub-id-type="pmid">18278047</pub-id>
</element-citation>
</ref>
<ref id="CR20">
<label>20.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Fahlgren</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Howell</surname>
<given-names>MD</given-names>
</name>
<name>
<surname>Kasschau</surname>
<given-names>KD</given-names>
</name>
<name>
<surname>Chapman</surname>
<given-names>EJ</given-names>
</name>
<name>
<surname>Sullivan</surname>
<given-names>CM</given-names>
</name>
<name>
<surname>Cumbie</surname>
<given-names>JS</given-names>
</name>
<etal></etal>
</person-group>
<article-title>High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes</article-title>
<source>PLoS One</source>
<year>2007</year>
<volume>2</volume>
<fpage>e219</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0000219</pub-id>
<pub-id pub-id-type="pmid">17299599</pub-id>
</element-citation>
</ref>
<ref id="CR21">
<label>21.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ding</surname>
<given-names>J</given-names>
</name>
<name>
<surname>Zhou</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Guan</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>miRFam: an effective automatic miRNA classification method based on n-grams and a multiclass SVM</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<fpage>216</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-216</pub-id>
<pub-id pub-id-type="pmid">21619662</pub-id>
</element-citation>
</ref>
<ref id="CR22">
<label>22.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>O. N. Lopes</surname>
<given-names>I de</given-names>
</name>
<name>
<surname>Schliep</surname>
<given-names>A</given-names>
</name>
<name>
<surname>de L. F. de Carvalho</surname>
<given-names>AP</given-names>
</name>
</person-group>
<article-title>Automatic learning of pre-miRNAs from different species</article-title>
<source>BMC Bioinformatics</source>
<year>2016</year>
<volume>17</volume>
<fpage>224</fpage>
<pub-id pub-id-type="doi">10.1186/s12859-016-1036-3</pub-id>
<pub-id pub-id-type="pmid">27233515</pub-id>
</element-citation>
</ref>
<ref id="CR23">
<label>23.</label>
<mixed-citation publication-type="other">Teune J-H, Steger G. NOVOMIR: De Novo Prediction of MicroRNA-Coding Regions in a Single Plant-Genome. J Nucleic Acids. 2010;2010:10. doi:10.4061/2010/495904.</mixed-citation>
</ref>
<ref id="CR24">
<label>24.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wu</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Wei</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Rayner</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences</article-title>
<source>BMC Bioinformatics</source>
<year>2011</year>
<volume>12</volume>
<fpage>107</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-12-107</pub-id>
<pub-id pub-id-type="pmid">21504621</pub-id>
</element-citation>
</ref>
<ref id="CR25">
<label>25.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Gerlach</surname>
<given-names>D</given-names>
</name>
<name>
<surname>Kriventseva</surname>
<given-names>EV</given-names>
</name>
<name>
<surname>Rahman</surname>
<given-names>N</given-names>
</name>
<name>
<surname>Vejnar</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Zdobnov</surname>
<given-names>EM</given-names>
</name>
</person-group>
<article-title>miROrtho: computational survey of microRNA genes</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<fpage>D111</fpage>
<lpage>D117</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkn707</pub-id>
<pub-id pub-id-type="pmid">18927110</pub-id>
</element-citation>
</ref>
<ref id="CR26">
<label>26.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ng</surname>
<given-names>KLS</given-names>
</name>
<name>
<surname>Mishra</surname>
<given-names>SK</given-names>
</name>
</person-group>
<article-title>De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures</article-title>
<source>Bioinformatics</source>
<year>2007</year>
<volume>23</volume>
<fpage>1321</fpage>
<lpage>1330</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btm026</pub-id>
<pub-id pub-id-type="pmid">17267435</pub-id>
</element-citation>
</ref>
<ref id="CR27">
<label>27.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xue</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>F</given-names>
</name>
<name>
<surname>He</surname>
<given-names>T</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>G-P</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>X</given-names>
</name>
</person-group>
<article-title>Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine</article-title>
<source>BMC Bioinformatics</source>
<year>2005</year>
<volume>6</volume>
<fpage>310</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-6-310</pub-id>
<pub-id pub-id-type="pmid">16381612</pub-id>
</element-citation>
</ref>
<ref id="CR28">
<label>28.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Batuwita</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Palade</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>microPred: effective classification of pre-miRNAs for human miRNA gene prediction</article-title>
<source>Bioinformatics</source>
<year>2009</year>
<volume>25</volume>
<fpage>989</fpage>
<lpage>995</lpage>
<pub-id pub-id-type="doi">10.1093/bioinformatics/btp107</pub-id>
<pub-id pub-id-type="pmid">19233894</pub-id>
</element-citation>
</ref>
<ref id="CR29">
<label>29.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>van der Burgt</surname>
<given-names>A</given-names>
</name>
<name>
<surname>Fiers</surname>
<given-names>MWJE</given-names>
</name>
<name>
<surname>Nap</surname>
<given-names>J-P</given-names>
</name>
<name>
<surname>van Ham</surname>
<given-names>RCHJ</given-names>
</name>
</person-group>
<article-title>In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity</article-title>
<source>BMC Genomics</source>
<year>2009</year>
<volume>10</volume>
<fpage>204</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2164-10-204</pub-id>
<pub-id pub-id-type="pmid">19405940</pub-id>
</element-citation>
</ref>
<ref id="CR30">
<label>30.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bağcı</surname>
<given-names>C</given-names>
</name>
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>One Step Forward, Two Steps Back; Xeno-MicroRNAs Reported in Breast Milk Are Artifacts</article-title>
<source>PLoS One</source>
<year>2016</year>
<volume>11</volume>
<fpage>e0145065</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0145065</pub-id>
<pub-id pub-id-type="pmid">26824347</pub-id>
</element-citation>
</ref>
<ref id="CR31">
<label>31.</label>
<mixed-citation publication-type="other">Çakır MV, Allmer J. Systematic computational analysis of potential RNAi regulation in Toxoplasma gondii. 2010 5th Int. Symp. Heal. Informatics Bioinforma.Ankara, Turkey: IEEE; 2010 p. 31–8.</mixed-citation>
</ref>
<ref id="CR32">
<label>32.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Boden</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Buske</surname>
<given-names>FA</given-names>
</name>
<name>
<surname>Frith</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Grant</surname>
<given-names>CE</given-names>
</name>
<name>
<surname>Clementi</surname>
<given-names>L</given-names>
</name>
<etal></etal>
</person-group>
<article-title>MEME SUITE: tools for motif discovery and searching</article-title>
<source>Nucleic Acids Res</source>
<year>2009</year>
<volume>37</volume>
<fpage>W202</fpage>
<lpage>W208</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkp335</pub-id>
<pub-id pub-id-type="pmid">19458158</pub-id>
</element-citation>
</ref>
<ref id="CR33">
<label>33.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Bailey</surname>
<given-names>TL</given-names>
</name>
<name>
<surname>Elkan</surname>
<given-names>C</given-names>
</name>
</person-group>
<article-title>Fitting a mixture model by expectation maximization to discover motifs in biopolymers</article-title>
<source>Proc Int Conf Intell Syst Mol Biol</source>
<year>1994</year>
<volume>2</volume>
<fpage>28</fpage>
<lpage>36</lpage>
<pub-id pub-id-type="pmid">7584402</pub-id>
</element-citation>
</ref>
<ref id="CR34">
<label>34.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yousef</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Khalifa</surname>
<given-names>W</given-names>
</name>
<name>
<surname>Acar</surname>
<given-names>İE</given-names>
</name>
<name>
<surname>Allmer</surname>
<given-names>J</given-names>
</name>
</person-group>
<article-title>Distinguishing Between MicroRNA Targets From Diverse Species Using Sequence Motifs And K-Mers, Proceedings of BIOSTEC 2017, 10th International Joint Conference on Biomedical Engineering Systems and Technologies</article-title>
<source>Porto.</source>
<year>2017</year>
<volume>3</volume>
<fpage>133</fpage>
<lpage>39</lpage>
</element-citation>
</ref>
<ref id="CR35">
<label>35.</label>
<mixed-citation publication-type="other">Shaltout NAN, El-Hefnawi M, Rafea A, Moustafa A. Information gain as a feature selection method for the efficient classification of Influenza-A based on viral hosts. Proc. World Congr. Eng.Newswood Limited; 2014. p. 625–31.</mixed-citation>
</ref>
<ref id="CR36">
<label>36.</label>
<mixed-citation publication-type="other">Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, et al. KNIME: The Konstanz Information Miner. SIGKDD Explor. 2008. p. 319–26.</mixed-citation>
</ref>
<ref id="CR37">
<label>37.</label>
<mixed-citation publication-type="other">Yang Y, Pedersen JO. A Comparative Study on Feature Selection in Text Categorization. Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97). 1997;412–20.</mixed-citation>
</ref>
<ref id="CR38">
<label>38.</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Vapnik</surname>
<given-names>VN</given-names>
</name>
</person-group>
<source>The nature of statistical learning theory New York</source>
<year>1995</year>
<publisher-loc>New York, USA</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="CR39">
<label>39.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xu</surname>
<given-names>Q-S</given-names>
</name>
<name>
<surname>Liang</surname>
<given-names>Y-Z</given-names>
</name>
</person-group>
<article-title>Monte Carlo cross validation</article-title>
<source>Chemom Intell Lab Syst</source>
<year>2001</year>
<volume>56</volume>
<fpage>1</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="doi">10.1016/S0169-7439(00)00122-2</pub-id>
</element-citation>
</ref>
<ref id="CR40">
<label>40.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Amaldi</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Kann</surname>
<given-names>V</given-names>
</name>
</person-group>
<article-title>On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems</article-title>
<source>Theor Comput Sci</source>
<year>1998</year>
<volume>209</volume>
<fpage>237</fpage>
<lpage>260</lpage>
<pub-id pub-id-type="doi">10.1016/S0304-3975(97)00115-1</pub-id>
</element-citation>
</ref>
<ref id="CR41">
<label>41.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Matthews</surname>
<given-names>BW</given-names>
</name>
</person-group>
<article-title>Comparison of the predicted and observed secondary structure of T4 phage lysozyme</article-title>
<source>BBA - Protein Struct</source>
<year>1975</year>
<volume>405</volume>
<fpage>442</fpage>
<lpage>451</lpage>
<pub-id pub-id-type="doi">10.1016/0005-2795(75)90109-9</pub-id>
</element-citation>
</ref>
<ref id="CR42">
<label>42.</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Letunic</surname>
<given-names>I</given-names>
</name>
<name>
<surname>Bork</surname>
<given-names>P</given-names>
</name>
</person-group>
<article-title>Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy</article-title>
<source>Nucleic Acids Res</source>
<year>2011</year>
<volume>39</volume>
<fpage>W475</fpage>
<lpage>W478</lpage>
<pub-id pub-id-type="doi">10.1093/nar/gkr201</pub-id>
<pub-id pub-id-type="pmid">21470960</pub-id>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000266  | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000266  | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021