Serveur sur les données et bibliothèques médicales au Maghreb (version finale)

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0001689 ( Pmc/Corpus ); précédent : 0001688; suivant : 0001690 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">New methodology for repetitive sequences identification in
<italic>human</italic>
X and Y chromosomes</title>
<author>
<name sortKey="Touati, Rabeb" sort="Touati, Rabeb" uniqKey="Touati R" first="Rabeb" last="Touati">Rabeb Touati</name>
<affiliation>
<nlm:aff id="aff0005">University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tajouri, Asma" sort="Tajouri, Asma" uniqKey="Tajouri A" first="Asma" last="Tajouri">Asma Tajouri</name>
<affiliation>
<nlm:aff id="aff0005">University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mesaoudi, Imen" sort="Mesaoudi, Imen" uniqKey="Mesaoudi I" first="Imen" last="Mesaoudi">Imen Mesaoudi</name>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oueslati, Afef Elloumi" sort="Oueslati, Afef Elloumi" uniqKey="Oueslati A" first="Afef Elloumi" last="Oueslati">Afef Elloumi Oueslati</name>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lachiri, Zied" sort="Lachiri, Zied" uniqKey="Lachiri Z" first="Zied" last="Lachiri">Zied Lachiri</name>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kharrat, Maher" sort="Kharrat, Maher" uniqKey="Kharrat M" first="Maher" last="Kharrat">Maher Kharrat</name>
<affiliation>
<nlm:aff id="aff0005">University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">33101452</idno>
<idno type="pmc">7572123</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7572123</idno>
<idno type="RBID">PMC:7572123</idno>
<idno type="doi">10.1016/j.bspc.2020.102207</idno>
<date when="2020">2020</date>
<idno type="wicri:Area/Pmc/Corpus">000168</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000168</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">New methodology for repetitive sequences identification in
<italic>human</italic>
X and Y chromosomes</title>
<author>
<name sortKey="Touati, Rabeb" sort="Touati, Rabeb" uniqKey="Touati R" first="Rabeb" last="Touati">Rabeb Touati</name>
<affiliation>
<nlm:aff id="aff0005">University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</nlm:aff>
</affiliation>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Tajouri, Asma" sort="Tajouri, Asma" uniqKey="Tajouri A" first="Asma" last="Tajouri">Asma Tajouri</name>
<affiliation>
<nlm:aff id="aff0005">University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Mesaoudi, Imen" sort="Mesaoudi, Imen" uniqKey="Mesaoudi I" first="Imen" last="Mesaoudi">Imen Mesaoudi</name>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Oueslati, Afef Elloumi" sort="Oueslati, Afef Elloumi" uniqKey="Oueslati A" first="Afef Elloumi" last="Oueslati">Afef Elloumi Oueslati</name>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Lachiri, Zied" sort="Lachiri, Zied" uniqKey="Lachiri Z" first="Zied" last="Lachiri">Zied Lachiri</name>
<affiliation>
<nlm:aff id="aff0010">University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Kharrat, Maher" sort="Kharrat, Maher" uniqKey="Kharrat M" first="Maher" last="Kharrat">Maher Kharrat</name>
<affiliation>
<nlm:aff id="aff0005">University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Biomedical Signal Processing and Control</title>
<idno type="ISSN">1746-8094</idno>
<idno type="eISSN">1746-8094</idno>
<imprint>
<date when="2020">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Graphical abstract</title>
<fig id="fig0090" position="anchor">
<graphic xlink:href="fx1_lrg"></graphic>
</fig>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Venter, J C" uniqKey="Venter J">J.C. Venter</name>
</author>
<author>
<name sortKey="Adams, M D" uniqKey="Adams M">M.D. Adams</name>
</author>
<author>
<name sortKey="Myers, E W" uniqKey="Myers E">E.W. Myers</name>
</author>
<author>
<name sortKey="Li, P W" uniqKey="Li P">P.W. Li</name>
</author>
<author>
<name sortKey="Mural, R J" uniqKey="Mural R">R.J. Mural</name>
</author>
<author>
<name sortKey="Sutton, G G" uniqKey="Sutton G">G.G. Sutton</name>
</author>
<author>
<name sortKey="Gocayne, J D" uniqKey="Gocayne J">J.D. Gocayne</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Freitas, N L" uniqKey="De Freitas N">N.L. de Freitas</name>
</author>
<author>
<name sortKey="Al Rikabi, A B" uniqKey="Al Rikabi A">A.B. Al-Rikabi</name>
</author>
<author>
<name sortKey="Bertollo, L A C" uniqKey="Bertollo L">L.A.C. Bertollo</name>
</author>
<author>
<name sortKey="Ezaz, T" uniqKey="Ezaz T">T. Ezaz</name>
</author>
<author>
<name sortKey="Yano, C F" uniqKey="Yano C">C.F. Yano</name>
</author>
<author>
<name sortKey="De Oliveira, E A" uniqKey="De Oliveira E">E.A. de Oliveira</name>
</author>
<author>
<name sortKey="De Bello Cioffi, M" uniqKey="De Bello Cioffi M">M. de Bello Cioffi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ramel, C" uniqKey="Ramel C">C. Ramel</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Biscotti, M A" uniqKey="Biscotti M">M.A. Biscotti</name>
</author>
<author>
<name sortKey="Olmo, E" uniqKey="Olmo E">E. Olmo</name>
</author>
<author>
<name sortKey="Heslop Harrison, J P" uniqKey="Heslop Harrison J">J.P. Heslop-Harrison</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Treangen, T J" uniqKey="Treangen T">T.J. Treangen</name>
</author>
<author>
<name sortKey="Salzberg, S L" uniqKey="Salzberg S">S.L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jabs, E W" uniqKey="Jabs E">E.W. Jabs</name>
</author>
<author>
<name sortKey="Persico, M G" uniqKey="Persico M">M.G. Persico</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blackburn, E H" uniqKey="Blackburn E">E.H. Blackburn</name>
</author>
<author>
<name sortKey="Gall, J G" uniqKey="Gall J">J.G. Gall</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stewart, J A" uniqKey="Stewart J">J.A. Stewart</name>
</author>
<author>
<name sortKey="Chaiken, M F" uniqKey="Chaiken M">M.F. Chaiken</name>
</author>
<author>
<name sortKey="Wang, F" uniqKey="Wang F">F. Wang</name>
</author>
<author>
<name sortKey="Price, C M" uniqKey="Price C">C.M. Price</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Moyzis, R K" uniqKey="Moyzis R">R.K. Moyzis</name>
</author>
<author>
<name sortKey="Buckingham, J M" uniqKey="Buckingham J">J.M. Buckingham</name>
</author>
<author>
<name sortKey="Cram, L S" uniqKey="Cram L">L.S. Cram</name>
</author>
<author>
<name sortKey="Dani, M" uniqKey="Dani M">M. Dani</name>
</author>
<author>
<name sortKey="Deaven, L L" uniqKey="Deaven L">L.L. Deaven</name>
</author>
<author>
<name sortKey="Jones, M D" uniqKey="Jones M">M.D. Jones</name>
</author>
<author>
<name sortKey="Wu, J R" uniqKey="Wu J">J.R. Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zakian, V A" uniqKey="Zakian V">V.A. Zakian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Peng, J C" uniqKey="Peng J">J.C. Peng</name>
</author>
<author>
<name sortKey="Karpen, G H" uniqKey="Karpen G">G.H. Karpen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lim, Kian Guan" uniqKey="Lim K">Kian Guan Lim</name>
</author>
<author>
<name sortKey="Kwoh, Chee Keong" uniqKey="Kwoh C">Chee Keong Kwoh</name>
</author>
<author>
<name sortKey="Hsu, Li Yang" uniqKey="Hsu L">Li Yang Hsu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Thiel, Teresa" uniqKey="Thiel T">Teresa Thiel</name>
</author>
<author>
<name sortKey="Michalek, W" uniqKey="Michalek W">W. Michalek</name>
</author>
<author>
<name sortKey="Varshney, R" uniqKey="Varshney R">R. Varshney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kolpakov, Roman" uniqKey="Kolpakov R">Roman Kolpakov</name>
</author>
<author>
<name sortKey="Ghizlane, Bana" uniqKey="Ghizlane B">Bana Ghizlane</name>
</author>
<author>
<name sortKey="Kucherov" uniqKey="Kucherov">Kucherov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abajian, Chris" uniqKey="Abajian C">Chris Abajian</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sarachu" uniqKey="Sarachu">Sarachu</name>
</author>
<author>
<name sortKey="Martin Et Colet Marc" uniqKey="Martin Et Colet Marc">Martín et Colet, Marc</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benson, Gary" uniqKey="Benson G">Gary Benson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tarailo Raovac" uniqKey="Tarailo Raovac">Tarailo‐Graovac</name>
</author>
<author>
<name sortKey="Maja Et Chen" uniqKey="Maja Et Chen">Maja et Chen</name>
</author>
<author>
<name sortKey="Nansheng" uniqKey="Nansheng">Nansheng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Flicek, P" uniqKey="Flicek P">P. Flicek</name>
</author>
<author>
<name sortKey="Birney, E" uniqKey="Birney E">E. Birney</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="De Koning, A J" uniqKey="De Koning A">A.J. de Koning</name>
</author>
<author>
<name sortKey="Gu, W" uniqKey="Gu W">W. Gu</name>
</author>
<author>
<name sortKey="Castoe, T A" uniqKey="Castoe T">T.A. Castoe</name>
</author>
<author>
<name sortKey="Batzer, M A" uniqKey="Batzer M">M.A. Batzer</name>
</author>
<author>
<name sortKey="Pollock, D D" uniqKey="Pollock D">D.D. Pollock</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="The Ncbi" uniqKey="The Ncbi">The NCBI</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Venter, J C" uniqKey="Venter J">J.C. Venter</name>
</author>
<author>
<name sortKey="Adams, M D" uniqKey="Adams M">M.D. Adams</name>
</author>
<author>
<name sortKey="Myers, E W" uniqKey="Myers E">E.W. Myers</name>
</author>
<author>
<name sortKey="Li, P W" uniqKey="Li P">P.W. Li</name>
</author>
<author>
<name sortKey="Mural, R J" uniqKey="Mural R">R.J. Mural</name>
</author>
<author>
<name sortKey="Sutton, G G" uniqKey="Sutton G">G.G. Sutton</name>
</author>
<author>
<name sortKey="Gocayne, J D" uniqKey="Gocayne J">J.D. Gocayne</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Touati, R" uniqKey="Touati R">R. Touati</name>
</author>
<author>
<name sortKey="Haddad Boubaker, S" uniqKey="Haddad Boubaker S">S. Haddad-Boubaker</name>
</author>
<author>
<name sortKey="Ferchichi, I" uniqKey="Ferchichi I">I. Ferchichi</name>
</author>
<author>
<name sortKey="Messaoudi, I" uniqKey="Messaoudi I">I. Messaoudi</name>
</author>
<author>
<name sortKey="Ouesleti, A E" uniqKey="Ouesleti A">A.E. Ouesleti</name>
</author>
<author>
<name sortKey="Triki, H" uniqKey="Triki H">H. Triki</name>
</author>
<author>
<name sortKey="Kharrat, M" uniqKey="Kharrat M">M. Kharrat</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Touati, R" uniqKey="Touati R">R. Touati</name>
</author>
<author>
<name sortKey="Oueslati, A E" uniqKey="Oueslati A">A.E. Oueslati</name>
</author>
<author>
<name sortKey="Messaoudi, I" uniqKey="Messaoudi I">I. Messaoudi</name>
</author>
<author>
<name sortKey="Lachiri, Z" uniqKey="Lachiri Z">Z. Lachiri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Buchner, M" uniqKey="Buchner M">M. Buchner</name>
</author>
<author>
<name sortKey="Janjarasjitt, S" uniqKey="Janjarasjitt S">S. Janjarasjitt</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharma, S D" uniqKey="Sharma S">S.D. Sharma</name>
</author>
<author>
<name sortKey="Sharma, S N" uniqKey="Sharma S">S.N. Sharma</name>
</author>
<author>
<name sortKey="Saxena, R" uniqKey="Saxena R">R. Saxena</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chechetkin, V R" uniqKey="Chechetkin V">V.R. Chechetkin</name>
</author>
<author>
<name sortKey="Turygin, A Y" uniqKey="Turygin A">A.Y. Turygin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharma, D" uniqKey="Sharma D">D. Sharma</name>
</author>
<author>
<name sortKey="Issac, B" uniqKey="Issac B">B. Issac</name>
</author>
<author>
<name sortKey="Raghava, G P S" uniqKey="Raghava G">G.P.S. Raghava</name>
</author>
<author>
<name sortKey="Ramaswamy, R" uniqKey="Ramaswamy R">R. Ramaswamy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Touati, R" uniqKey="Touati R">R. Touati</name>
</author>
<author>
<name sortKey="Messaoudi, I" uniqKey="Messaoudi I">I. Messaoudi</name>
</author>
<author>
<name sortKey="Oueslati, A E" uniqKey="Oueslati A">A.E. Oueslati</name>
</author>
<author>
<name sortKey="Lachiri, Z" uniqKey="Lachiri Z">Z. Lachiri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Touati, R" uniqKey="Touati R">R. Touati</name>
</author>
<author>
<name sortKey="Messaoudi, I" uniqKey="Messaoudi I">I. Messaoudi</name>
</author>
<author>
<name sortKey="Oueslati, A E" uniqKey="Oueslati A">A.E. Oueslati</name>
</author>
<author>
<name sortKey="Lachiri, Z" uniqKey="Lachiri Z">Z. Lachiri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Touati, R" uniqKey="Touati R">R. Touati</name>
</author>
<author>
<name sortKey="Messaoudi, I" uniqKey="Messaoudi I">I. Messaoudi</name>
</author>
<author>
<name sortKey="Oueslati, A E" uniqKey="Oueslati A">A.E. Oueslati</name>
</author>
<author>
<name sortKey="Lachiri, Z" uniqKey="Lachiri Z">Z. Lachiri</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Grossmann, A" uniqKey="Grossmann A">A. Grossmann</name>
</author>
<author>
<name sortKey="Morlet, J" uniqKey="Morlet J">J. Morlet</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Merry, R J E" uniqKey="Merry R">R.J.E. Merry</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Najmi, A H" uniqKey="Najmi A">A.H. Najmi</name>
</author>
<author>
<name sortKey="Sadowsky, J" uniqKey="Sadowsky J">J. Sadowsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kumar, M" uniqKey="Kumar M">M. Kumar</name>
</author>
<author>
<name sortKey="Saxena, R" uniqKey="Saxena R">R. Saxena</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sahni, P" uniqKey="Sahni P">P. Sahni</name>
</author>
<author>
<name sortKey="Mittal, N" uniqKey="Mittal N">N. Mittal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Canny, J" uniqKey="Canny J">J. Canny</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bao, P" uniqKey="Bao P">P. Bao</name>
</author>
<author>
<name sortKey="Zhang, L" uniqKey="Zhang L">L. Zhang</name>
</author>
<author>
<name sortKey="Wu, X" uniqKey="Wu X">X. Wu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Soille, P" uniqKey="Soille P">P. Soille</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, W J" uniqKey="Kent W">W.J. Kent</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wheeler, T J" uniqKey="Wheeler T">T.J. Wheeler</name>
</author>
<author>
<name sortKey="Clements, J" uniqKey="Clements J">J. Clements</name>
</author>
<author>
<name sortKey="Eddy, S R" uniqKey="Eddy S">S.R. Eddy</name>
</author>
<author>
<name sortKey="Hubley, R" uniqKey="Hubley R">R. Hubley</name>
</author>
<author>
<name sortKey="Jones, T A" uniqKey="Jones T">T.A. Jones</name>
</author>
<author>
<name sortKey="Jurka, J" uniqKey="Jurka J">J. Jurka</name>
</author>
<author>
<name sortKey="Finn, R D" uniqKey="Finn R">R.D. Finn</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lecun, Y" uniqKey="Lecun Y">Y. LeCun</name>
</author>
<author>
<name sortKey="Bottou, L" uniqKey="Bottou L">L. Bottou</name>
</author>
<author>
<name sortKey="Bengio, Y" uniqKey="Bengio Y">Y. Bengio</name>
</author>
<author>
<name sortKey="Haffner, P" uniqKey="Haffner P">P. Haffner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Abd Lhalem, S M" uniqKey="Abd Lhalem S">S.M. Abd–Alhalem</name>
</author>
<author>
<name sortKey="Soliman, N F" uniqKey="Soliman N">N.F. Soliman</name>
</author>
<author>
<name sortKey="Eldin, S" uniqKey="Eldin S">S. Eldin</name>
</author>
<author>
<name sortKey="Abd Elrahman, S E" uniqKey="Abd Elrahman S">S.E. Abd Elrahman</name>
</author>
<author>
<name sortKey="Ismail, N A" uniqKey="Ismail N">N.A. Ismail</name>
</author>
<author>
<name sortKey="El Rabaie, E S M" uniqKey="El Rabaie E">E.S.M. El-Rabaie</name>
</author>
<author>
<name sortKey="El Samie, F E A" uniqKey="El Samie F">F.E.A. El-Samie</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zeng, H" uniqKey="Zeng H">H. Zeng</name>
</author>
<author>
<name sortKey="Edwards, M D" uniqKey="Edwards M">M.D. Edwards</name>
</author>
<author>
<name sortKey="Liu, G" uniqKey="Liu G">G. Liu</name>
</author>
<author>
<name sortKey="Gifford, D K" uniqKey="Gifford D">D.K. Gifford</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Al Ajlan, A" uniqKey="Al Ajlan A">A. Al-Ajlan</name>
</author>
<author>
<name sortKey="El Allali, A" uniqKey="El Allali A">A. El Allali</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Elbashir, M K" uniqKey="Elbashir M">M.K. Elbashir</name>
</author>
<author>
<name sortKey="Ezz, M" uniqKey="Ezz M">M. Ezz</name>
</author>
<author>
<name sortKey="Mohammed, M" uniqKey="Mohammed M">M. Mohammed</name>
</author>
<author>
<name sortKey="Saloum, S S" uniqKey="Saloum S">S.S. Saloum</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhou, J" uniqKey="Zhou J">J. Zhou</name>
</author>
<author>
<name sortKey="Luo, L Y" uniqKey="Luo L">L.Y. Luo</name>
</author>
<author>
<name sortKey="Dou, Q" uniqKey="Dou Q">Q. Dou</name>
</author>
<author>
<name sortKey="Chen, H" uniqKey="Chen H">H. Chen</name>
</author>
<author>
<name sortKey="Chen, C" uniqKey="Chen C">C. Chen</name>
</author>
<author>
<name sortKey="Li, G J" uniqKey="Li G">G.J. Li</name>
</author>
<author>
<name sortKey="Heng, P A" uniqKey="Heng P">P.A. Heng</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ghoneim, A" uniqKey="Ghoneim A">A. Ghoneim</name>
</author>
<author>
<name sortKey="Muhammad, G" uniqKey="Muhammad G">G. Muhammad</name>
</author>
<author>
<name sortKey="Hossain, M S" uniqKey="Hossain M">M.S. Hossain</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Porumb, M" uniqKey="Porumb M">M. Porumb</name>
</author>
<author>
<name sortKey="Iadanza, E" uniqKey="Iadanza E">E. Iadanza</name>
</author>
<author>
<name sortKey="Massaro, S" uniqKey="Massaro S">S. Massaro</name>
</author>
<author>
<name sortKey="Pecchia, L" uniqKey="Pecchia L">L. Pecchia</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mukhopadhyay, A K" uniqKey="Mukhopadhyay A">A.K. Mukhopadhyay</name>
</author>
<author>
<name sortKey="Samui, S" uniqKey="Samui S">S. Samui</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kundu, S" uniqKey="Kundu S">S. Kundu</name>
</author>
<author>
<name sortKey="Ari, S" uniqKey="Ari S">S. Ari</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, J" uniqKey="Zhang J">J. Zhang</name>
</author>
<author>
<name sortKey="Fan, D" uniqKey="Fan D">D. Fan</name>
</author>
<author>
<name sortKey="Jian, Z" uniqKey="Jian Z">Z. Jian</name>
</author>
<author>
<name sortKey="Chen, G G" uniqKey="Chen G">G.G. Chen</name>
</author>
<author>
<name sortKey="Lai, P B" uniqKey="Lai P">P.B. Lai</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kobayashi, T" uniqKey="Kobayashi T">T. Kobayashi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, G" uniqKey="Zhang G">G. Zhang</name>
</author>
<author>
<name sortKey="Luo, Y" uniqKey="Luo Y">Y. Luo</name>
</author>
<author>
<name sortKey="Li, G" uniqKey="Li G">G. Li</name>
</author>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L. Wang</name>
</author>
<author>
<name sortKey="Na, D" uniqKey="Na D">D. Na</name>
</author>
<author>
<name sortKey="Wu, X" uniqKey="Wu X">X. Wu</name>
</author>
<author>
<name sortKey="Wang, L" uniqKey="Wang L">L. Wang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Brown, W R" uniqKey="Brown W">W.R. Brown</name>
</author>
<author>
<name sortKey="Mackinnon, P J" uniqKey="Mackinnon P">P.J. MacKinnon</name>
</author>
<author>
<name sortKey="Villasante, A" uniqKey="Villasante A">A. Villasanté</name>
</author>
<author>
<name sortKey="Spurr, N" uniqKey="Spurr N">N. Spurr</name>
</author>
<author>
<name sortKey="Buckle, V J" uniqKey="Buckle V">V.J. Buckle</name>
</author>
<author>
<name sortKey="Dobson, M J" uniqKey="Dobson M">M.J. Dobson</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Biomed Signal Process Control</journal-id>
<journal-id journal-id-type="iso-abbrev">Biomed Signal Process Control</journal-id>
<journal-title-group>
<journal-title>Biomedical Signal Processing and Control</journal-title>
</journal-title-group>
<issn pub-type="ppub">1746-8094</issn>
<issn pub-type="epub">1746-8094</issn>
<publisher>
<publisher-name>Elsevier Ltd.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">33101452</article-id>
<article-id pub-id-type="pmc">7572123</article-id>
<article-id pub-id-type="pii">S1746-8094(20)30341-4</article-id>
<article-id pub-id-type="doi">10.1016/j.bspc.2020.102207</article-id>
<article-id pub-id-type="publisher-id">102207</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>New methodology for repetitive sequences identification in
<italic>human</italic>
X and Y chromosomes</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" id="aut0005">
<name>
<surname>Touati</surname>
<given-names>Rabeb</given-names>
</name>
<xref rid="aff0005" ref-type="aff">a</xref>
<xref rid="aff0010" ref-type="aff">b</xref>
<xref rid="cor0005" ref-type="corresp">*</xref>
</contrib>
<contrib contrib-type="author" id="aut0010">
<name>
<surname>Tajouri</surname>
<given-names>Asma</given-names>
</name>
<xref rid="aff0005" ref-type="aff">a</xref>
</contrib>
<contrib contrib-type="author" id="aut0015">
<name>
<surname>Mesaoudi</surname>
<given-names>Imen</given-names>
</name>
<xref rid="aff0010" ref-type="aff">b</xref>
</contrib>
<contrib contrib-type="author" id="aut0020">
<name>
<surname>Oueslati</surname>
<given-names>Afef Elloumi</given-names>
</name>
<xref rid="aff0010" ref-type="aff">b</xref>
</contrib>
<contrib contrib-type="author" id="aut0025">
<name>
<surname>Lachiri</surname>
<given-names>Zied</given-names>
</name>
<xref rid="aff0010" ref-type="aff">b</xref>
</contrib>
<contrib contrib-type="author" id="aut0030">
<name>
<surname>Kharrat</surname>
<given-names>Maher</given-names>
</name>
<xref rid="aff0005" ref-type="aff">a</xref>
</contrib>
<aff id="aff0005">
<label>a</label>
University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia</aff>
<aff id="aff0010">
<label>b</label>
University of Tunis El Manar, SITI Laboratory, National School of Engineers of Tunis, BP 37, Le Belvédère, 1002, Tunis, Tunisia</aff>
</contrib-group>
<author-notes>
<corresp id="cor0005">
<label></label>
Corresponding author at: University of Tunis El Manar, LR99ES10 Human Genetics Laboratory, Faculty of Medicine of Tunis (FMT), Tunisia.</corresp>
</author-notes>
<pub-date pub-type="pmc-release">
<day>19</day>
<month>10</month>
<year>2020</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on .</pmc-comment>
<pub-date pub-type="ppub">
<month>2</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="epub">
<day>19</day>
<month>10</month>
<year>2020</year>
</pub-date>
<volume>64</volume>
<fpage>102207</fpage>
<lpage>102207</lpage>
<history>
<date date-type="received">
<day>20</day>
<month>12</month>
<year>2019</year>
</date>
<date date-type="rev-recd">
<day>23</day>
<month>7</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>1</day>
<month>9</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>© 2020 Elsevier Ltd. All rights reserved.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Elsevier Ltd</copyright-holder>
<license>
<license-p>Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.</license-p>
</license>
</permissions>
<abstract abstract-type="graphical" id="abs0005">
<title>Graphical abstract</title>
<fig id="fig0090" position="anchor">
<graphic xlink:href="fx1_lrg"></graphic>
</fig>
</abstract>
<abstract abstract-type="author-highlights" id="abs0010">
<title>Highlights</title>
<p>
<list list-type="simple" id="lis0005">
<list-item id="lsti0005">
<label></label>
<p id="par0005">We converted X and Y chromosomes genomic sequences to numerical representation: DNA images.</p>
</list-item>
<list-item id="lsti0010">
<label></label>
<p id="par0010">We developed a new algorithm in the goal to localize the repetitive patterns in the DNA images corresponding to the repetitive sequences.</p>
</list-item>
<list-item id="lsti0015">
<label></label>
<p id="par0015">Based on Convolutional neural network (CNN), we developed a classification system to predict the repetitive DNA classes.</p>
</list-item>
<list-item id="lsti0020">
<label></label>
<p id="par0020">Furthermore, to the best of our knowledge, our work provides the first deep learning methods applied to DNA images classification task.</p>
</list-item>
</list>
</p>
</abstract>
<abstract id="abs0015">
<p>Repetitive DNA sequences occupy the major proportion of DNA in the human genome and even in the other species’ genomes. The importance of each repetitive DNA type depends on many factors: structural and functional roles, positions, lengths and numbers of these repetitions are clear examples. Conserving such DNA sequences or not in different locations in the chromosome remains a challenge for researchers in biology. Detecting their location despite their great variability and finding novel repetitive sequences remains a challenging task. To side-step this problem, we developed a new method based on signal and image processing tools. In fact, using this method we could find repetitive patterns in DNA images regardless of the repetition length. This new technique seems to be more efficient in detecting new repetitive sequences than bioinformatics tools. In fact, the classical tools present limited performances especially in case of mutations (insertion or deletion). However, modifying one or a few numbers of pixels in the image doesn’t affect the global form of the repetitive pattern. As a consequence, we generated a new repetitive patterns database which contains tandem and dispersed repeated sequences. The highly repetitive sequences, we have identified in X and Y chromosomes, are shown to be located in other human chromosomes or in other genomes. The data we have generated is then taken as input to a Convolutional neural network classifier in order to classify them. The system we have constructed is efficient and gives an average of 94.4% as recognition score.</p>
</abstract>
<kwd-group id="kwd0005">
<title>Keywords</title>
<kwd>Repetitive sequences</kwd>
<kwd>Satellites</kwd>
<kwd>Wavelet transform</kwd>
<kwd>Canny edge detection</kwd>
<kwd>Human genome</kwd>
<kwd>New repetitions database</kwd>
<kwd>Convolutional neural network (CNN)</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="sec0005">
<label>1</label>
<title>Introduction</title>
<p id="par0025">Repetitive DNAs are sequences with multiple copies in the genome. They are rarely associated with clearly defined biological functions. Some of the moderately-repetitive sequences may be involved in gene expression regulation. Other mobile DNA can be constituted by transposable genetic elements (TEs) that are involved in the genome evolution process. The transposition mechanism and the structure of these TEs are the keys to dividing this DNA into classes. Retrotransposons, are an example of TEs class that move via an RNA intermediate. This RNA is transcribed from the DNA and subsequently copied back into DNA. As repetitive DNA we can find tandem repeats or scattered repeated sequences. These repetitive DNA sequences can be classified into two types: highly repetitive or moderately repetitive sequences [
<xref rid="bib0005" ref-type="bibr">1</xref>
,
<xref rid="bib0010" ref-type="bibr">2</xref>
].</p>
<p id="par0030">The major repetitive sequences in all eukaryotic cells are classified into five types according to the sequence’s length. In this classification, the microsatellite sequences (Short Tandem Repeat: STR) are the smallest. They are characterized by periodicity between 2 and 4 nucleotides per unit. The second class is constituted by the minisatellites with a length varying between 10 and 60 base pairs (bp). The third class is composed of the satellites which can contain up to 100 nucleotides (100–200 base pairs) [
<xref rid="bib0015" ref-type="bibr">[3]</xref>
,
<xref rid="bib0020" ref-type="bibr">[4]</xref>
,
<xref rid="bib0025" ref-type="bibr">[5]</xref>
]. The retrotransposons like SINE and LINE are part of the fourth-class which is characterized by a length varying from 50 bp to 6 kb. The final class consists of Ribosomal RNA gene repeat (rDNA) which is the longest with a length between 9 and 45 kb.</p>
<p id="par0035">In the Human genome, rare fragile sites are chromosomal DNA regions especially characterized by repetitive sequences. In fact, in these regions, DNA damage occurs more frequently than in other locations. Due to chromosome structure, the common fragile sites can be sensitive to replication stress, and they are often rearranged in cancer. In the mammalian centromeres and telomeres, the presence of repetitive sequences is necessary in order to protect chromosomes from damage. For example, alphoid DNA is a kind of DNA satellite having a length of 173 bp. This DNA is located in the middle of a chromosome and makes up the larger part of the Human centromeres region [
<xref rid="bib0030" ref-type="bibr">6</xref>
]. Moreover, telomeres regions located at the chromosome extremities are made up of repeat sequences of 5–7 bp. These elements are called telomere repeats [
<xref rid="bib0035" ref-type="bibr">7</xref>
]. The repetitive sequence ‘TTAGGG’ is one example. The chromosome integrity is protected by telomere repeats [
<xref rid="bib0040" ref-type="bibr">8</xref>
,
<xref rid="bib0045" ref-type="bibr">9</xref>
]. In fact, telomeres hinder the chromosomes’ fusion and protect them against degradation by exonucleases [
<xref rid="bib0050" ref-type="bibr">10</xref>
].</p>
<p id="par0040">These repetitive functional elements are not susceptible to become fragile sites because they are hidden in heterochromatin. This heterochromatin prevents unusual DNA structures occurrence leading to recombination by not yet identified mechanisms [
<xref rid="bib0055" ref-type="bibr">11</xref>
].</p>
<p id="par0045">Repetitive sequences are abundant in various genomes, from bacteria to mammals, and they cover nearly half of the Human genome [
<xref rid="bib0025" ref-type="bibr">5</xref>
]. Finding new common repetitive sequences within and between different chromosomes and genomes is an important theme of research in biology. In fact, the detection of all repetitive sequences in DNA could serve in elucidating important biological phenomena. To identify the repetitive sequences, different bioinformatics tools were used [
<xref rid="bib0060" ref-type="bibr">12</xref>
,
<xref rid="bib0065" ref-type="bibr">13</xref>
]. Their principle is based on comparison between DNA consensus sequences and repeats candidates. The Mreps [
<xref rid="bib0070" ref-type="bibr">14</xref>
], MISA [
<xref rid="bib0065" ref-type="bibr">13</xref>
], Sputnik [
<xref rid="bib0075" ref-type="bibr">15</xref>
], EMBOSS (etandem and equitandem) [
<xref rid="bib0080" ref-type="bibr">16</xref>
], TRF [
<xref rid="bib0085" ref-type="bibr">17</xref>
] and RepeatMasker [
<xref rid="bib0090" ref-type="bibr">18</xref>
] are obvious examples. In the comparison step, these tools used different approaches such as regular expression [
<xref rid="bib0090" ref-type="bibr">18</xref>
], Hamming distance [
<xref rid="bib0060" ref-type="bibr">12</xref>
], recursive match and penalty scores [
<xref rid="bib0085" ref-type="bibr">17</xref>
]. Localizing new repetitive sequences presents always technical challenges. This is due to the ambiguities that such repeats can create in alignment and assembly programs [
<xref rid="bib0095" ref-type="bibr">19</xref>
]. In this work, we have developed a new algorithm to detect repetitive patterns that correspond to new repetitive sequences. For this purpose, we used a combination of coding techniques, signals, and image processing techniques. As a result, we have constructed a repetitive sequence database which we subdivided into two sub-databases. The first one contains the existing and validated repetitive sequences. The second DNA repetitive database regroups the newly detected sequences.</p>
<p id="par0050">In this context, we called "new repetitive sequence", a sequence that was not detected by all current bioinformatics systems as well as alignment programs. In this research, we converted all of the DNA sequences into a synthetic image representation. After that, we extracted all patterns that correspond to the repeat DNA sequences. The second part of this work consists in classifying the obtained data. A deep learning model is chosen for this purpose: Convolutional neural network (CNN).</p>
<p id="par0055">This paper is divided into four sections. After the introduction, we describe the materials and methods. In Section
<xref rid="sec0010" ref-type="sec">2</xref>
, we first present the biological database subject of this study. We also introduce the coding technique we used to transform the biological data into a numerical one. After that, we describe how we convert the obtained signal into an image based on the wavelet analysis. Further, we introduce the CNN architecture we establish for the repetitive DNA classification. The final parts of this section consist of the employed detection steps and the adopted evaluation system. In Section
<xref rid="sec0060" ref-type="sec">3</xref>
, we provide and discuss the results in terms of repetitive DNA sequences detection and classification. Finally, Section
<xref rid="sec0075" ref-type="sec">4</xref>
concludes the paper.</p>
</sec>
<sec id="sec0010">
<label>2</label>
<title>Material and methods</title>
<p id="par0060">Two-thirds of the human genome consists of repetitive DNA sequences [
<xref rid="bib0100" ref-type="bibr">20</xref>
]; which confers great importance to identification and localization of these elements. In this section, we expose a novel approach for the repetitive DNA sequence identification. This method is effective in detecting dispersed or tandem repeats such as minisatellites and satellites. The detection system is composed of four main blocks. The first one consists in extracting the Human DNA sequences from existing database. The second block is the DNA coding into a numerical representation. The third block consists of "Find Human Repetitive Sequences" (FHRS) method which we propose to the Repetitive DNA sequences detection. It is the application of the wavelet analysis and thus for detecting the repetitive patterns. The last block consists of determining the repetitive sequences and the repetitive DNA sequences database establishment.
<xref rid="fig0005" ref-type="fig">Fig. 1</xref>
shows the corresponding flowchart.
<fig id="fig0005">
<label>Fig. 1</label>
<caption>
<p>Organizational flowchart of the identification of the Repetitive sequences.</p>
</caption>
<alt-text id="at0290">Fig. 1</alt-text>
<graphic xlink:href="gr1_lrg"></graphic>
</fig>
</p>
<sec id="sec0015">
<label>2.1</label>
<title>Human sequences database (DNA library)</title>
<p id="par0065">The human genome (
<italic>Homosapiens</italic>
) contains 22 autosomes and two chromosomes that determine human sex: X and Y, with a total number of 46 chromosomes. We find one pair of sex chromosomes in each human cell. In females, the cell contains two X chromosomes, while in males we have one X and one Y chromosome. A detailed description of the human DNA material is available in the NCBI database (National Center for Biotechnology Information) [
<xref rid="bib0105" ref-type="bibr">21</xref>
]. From the human DNA data, we count 2.91-billion base pairs (bp) consensus sequence in the euchromatic portion [
<xref rid="bib0110" ref-type="bibr">22</xref>
]. Given that this is a huge amount of data, we based our work only on X and Y chromosomes. Even, at the level of these two chromosomes, we have an important mass of data. As an example, we give in
<xref rid="fig0010" ref-type="fig">Fig. 2</xref>
the number of apparition of dinucleotides in both X and Y chromosomes.
<fig id="fig0010">
<label>Fig. 2</label>
<caption>
<p>Dinucleotide occurrence in X and Y chromosomes in the human genome.</p>
</caption>
<alt-text id="at0295">Fig. 2</alt-text>
<graphic xlink:href="gr2_lrg"></graphic>
</fig>
</p>
<p id="par0070">Our goal is to find repetitive DNA on these chromosomes. It is important to mention that the more complex the genome is, the more difficult is to find new repetitive sequences within. Therefore, the challenge presented in this work is identifying new repetitive DNA sequences in human X and Y chromosomes.</p>
</sec>
<sec id="sec0020">
<label>2.2</label>
<title>DNA coding for numerical representation</title>
<p id="par0075">Aiming to visualize repetitive patterns in the human genome, the DNA sequences have to be transformed into numerical data. This transformation is called “DNA coding”. In this work, we opted for a special coding technique called “Order 2 Frequency Chaos Game Signal” (FCGS
<sub>2</sub>
) [
<xref rid="bib0115" ref-type="bibr">23</xref>
,
<xref rid="bib0120" ref-type="bibr">24</xref>
]. The FCGS
<sub>2</sub>
coding is a statistical representation of DNA. In the proposed method, chromosomes are transformed based on the occurrence probability of the successive dinucleotides groups. This technique represents the time-frequency evolution of the dinucleotides in the chromosome. In the following, we give the transformation equation (eq.
<xref rid="sec0005" ref-type="sec">1</xref>
).
<disp-formula id="eq0005">
<label>(1)</label>
<mml:math id="M1" altimg="si1.svg">
<mml:mfenced open="{">
<mml:mrow>
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:maligngroup></mml:maligngroup>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
<mml:mi>C</mml:mi>
<mml:mi>G</mml:mi>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>x</mml:mi>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mo stretchy="false"></mml:mo>
<mml:mi>x</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mo stretchy="false"></mml:mo>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mtext>é</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:maligngroup></mml:maligngroup>
<mml:msub>
<mml:mi>P</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mtext>é</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mi>N</mml:mi>
<mml:mrow>
<mml:msub>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mtext>e</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:mrow>
<mml:mo>/</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:math>
</disp-formula>
where
<inline-formula>
<mml:math id="M2" altimg="si2.svg">
<mml:msub>
<mml:mrow>
<mml:mtext></mml:mtext>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mn>2</mml:mn>
<mml:mrow>
<mml:mi>n</mml:mi>
<mml:mi>u</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>l</mml:mi>
<mml:mtext>e</mml:mtext>
<mml:mi>o</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>i</mml:mi>
<mml:mi>d</mml:mi>
<mml:mi>e</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mtext></mml:mtext>
</mml:math>
</inline-formula>
is the occurrence number of dinucleotides group in the whole chromosome and
<inline-formula>
<mml:math id="M3" altimg="si3.svg">
<mml:msub>
<mml:mrow>
<mml:mtext></mml:mtext>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mtext></mml:mtext>
</mml:math>
</inline-formula>
is the chromosome’s length.</p>
<p id="par0080">In this work, we coded the entire human chromosomes X and Y. The sequence that represents chromosome X is a signal with a length of 156,040,895 bp. As for chromosome Y, it is a signal of size equal to 57227415bp.</p>
</sec>
<sec id="sec0025">
<label>2.3</label>
<title>Find human repetitive sequences (FHRS) approach</title>
<p id="par0085">The identification of repetitive DNA sequences is taking greater and greater importance these days. Many algorithms, using various knowledge fields, have been implemented for repetitive sequences localization. In this context, signal processing approaches were used to detect repetitive sequences, according to the correspondent periodicity [
<xref rid="bib0125" ref-type="bibr">[25]</xref>
,
<xref rid="bib0130" ref-type="bibr">[26]</xref>
,
<xref rid="bib0135" ref-type="bibr">[27]</xref>
,
<xref rid="bib0140" ref-type="bibr">[28]</xref>
,
<xref rid="bib0145" ref-type="bibr">[29]</xref>
]. In this paper, we propose an efficient algorithm based on the signal and image processing tools to localize repetitive DNA sequences. This method has the advantage of being independent from prior knowledge about the repeated sequences. This section presents the new algorithm we designed to detect the repetitive DNA-sequences after transforming them into numerical signals. This algorithm is called Find Human Repetitive Sequences (FHRS). It contains three steps:
<list list-type="simple" id="lis0010">
<list-item id="lsti0025">
<label>-</label>
<p id="par0090">DNA signals to DNA images transformation: the scalogram representation;</p>
</list-item>
<list-item id="lsti0030">
<label>-</label>
<p id="par0095">Energy calculation of each scalogram image which is obtained by the wavelet analysis. After that, retaining the image whose energy amplitude exceeds a chosen threshold (equal to 10 here);</p>
</list-item>
<list-item id="lsti0035">
<label>-</label>
<p id="par0100">Finding the reference repetitive sequence in the retained image. It is the longest repeated unit in the considered DNA sequence.</p>
</list-item>
</list>
</p>
<sec id="sec0030">
<label>2.3.1</label>
<title>The DNA time frequency representation by the complex Morlet analysis</title>
<p id="par0105">The scalogram representation of a DNA sequence is an image that we obtain by wavelet analysis and encode in the RGB space (three color channels: Red, Green, and Blue). This time-frequency representation is shown to be efficient in terms of visualizing and detecting repetitive patterns. Here, the idea is to use this type of DNA image to find repetitive patterns that correspond to periodic sequences.</p>
<p id="par0110">The motivation behind this choice is that changing a pixel in the image has no influence on the overall shape of the repetitive pattern. Indeed even if the repetition pattern contains variations in nucleotide composition, this does not greatly impact the overall shape of the repetitive pattern at the level of DNA image. Furthermore, our choice for this method is reinforced by its performance in characterizing different classes of transposable elements [
<xref rid="bib0150" ref-type="bibr">30</xref>
,
<xref rid="bib0155" ref-type="bibr">31</xref>
]. For the wavelet analysis, we use the complex Morlet wavelet which is best suited to localize repetitive DNA in the time-frequency domain. The principle consists of applying the wavelet analysis to the signal obtained by the FCGS
<sub>2</sub>
coding. This analysis is done by decomposing a given DNA signal into a sum of basic functions called wavelets. The latter wavelets are issued from the mother wavelet by two operations: expansion and translation. These wavelets take into account both time and frequency variations, which allow them to easily capture all the different hidden frequencies in the signal [
<xref rid="bib0160" ref-type="bibr">[32]</xref>
,
<xref rid="bib0165" ref-type="bibr">[33]</xref>
,
<xref rid="bib0170" ref-type="bibr">[34]</xref>
]. Unlike the mother wavelet, which only has a time-varying parameter expressed by the function ψ(t), the daughter wavelet expression depends on time and scale parameters (a and b respectively). It is generated following this equation:
<disp-formula id="eq0010">
<label>(2)</label>
<mml:math id="M4" altimg="si4.svg">
<mml:msub>
<mml:mtext>ψ</mml:mtext>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mtext></mml:mtext>
<mml:mi>b</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msqrt>
<mml:mi>a</mml:mi>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:msup>
<mml:mtext>ψ</mml:mtext>
<mml:mtext>*</mml:mtext>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mtext></mml:mtext>
<mml:mtext>a</mml:mtext>
<mml:mo>></mml:mo>
<mml:mi>b</mml:mi>
<mml:mo></mml:mo>
<mml:mi mathvariant="double-struck">R</mml:mi>
</mml:math>
</disp-formula>
where* indicates the conjugate complex. As we have chosen a Gaussian-windowed complex sinusoid (complex Morlet) to be applied as analysis window, the Continuous Wavelet Transform (CWT) will be written as:
<disp-formula id="eq0015">
<label>(3)</label>
<mml:math id="M5" altimg="si5.svg">
<mml:msub>
<mml:mtext>ψ</mml:mtext>
<mml:mrow>
<mml:mi>c</mml:mi>
<mml:mi>m</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mtext>П</mml:mtext>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>4</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mtext>ɷ</mml:mtext>
<mml:mn>0</mml:mn>
</mml:msub>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>-</mml:mo>
<mml:msup>
<mml:mrow>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mn>2</mml:mn>
</mml:mfrac>
<mml:mi>i</mml:mi>
<mml:msub>
<mml:mtext>ɷ</mml:mtext>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:mrow>
</mml:msup>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
</mml:mfenced>
<mml:msup>
<mml:mi>e</mml:mi>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:msup>
<mml:mi>t</mml:mi>
<mml:mn>2</mml:mn>
</mml:msup>
</mml:mrow>
<mml:mn>2</mml:mn>
</mml:mfrac>
</mml:mrow>
</mml:msup>
</mml:math>
</disp-formula>
Here the oscillation’s number (
<inline-formula>
<mml:math id="M6" altimg="si6.svg">
<mml:msub>
<mml:mtext>ɷ</mml:mtext>
<mml:mn>0</mml:mn>
</mml:msub>
</mml:math>
</inline-formula>
) must be greater than 5 (admissibility condition). The continuous wavelet coefficients of a DNA signal
<inline-formula>
<mml:math id="M7" altimg="si7.svg">
<mml:mtext></mml:mtext>
<mml:mi>x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula>
is a matrix which elements are calculated by the following formula:
<disp-formula id="eq0020">
<label>(4)</label>
<mml:math id="M8" altimg="si8.svg">
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mfenced open="[" close="]">
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mfrac>
<mml:mn>1</mml:mn>
<mml:mrow>
<mml:msqrt>
<mml:mi>a</mml:mi>
</mml:msqrt>
</mml:mrow>
</mml:mfrac>
<mml:mrow>
<mml:msubsup>
<mml:mo stretchy="false"></mml:mo>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mtext></mml:mtext>
</mml:mrow>
<mml:mrow>
<mml:mo>+</mml:mo>
<mml:mtext></mml:mtext>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:mi>x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:mrow>
</mml:mrow>
<mml:msup>
<mml:mi>ψ</mml:mi>
<mml:mtext>*</mml:mtext>
</mml:msup>
<mml:mfenced open="(" close=")">
<mml:mrow>
<mml:mfrac>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>-</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mi>a</mml:mi>
</mml:mfrac>
</mml:mrow>
</mml:mfenced>
<mml:mi>d</mml:mi>
<mml:mi>t</mml:mi>
</mml:math>
</disp-formula>
</p>
<p id="par0115">The modulus of these coefficients |
<inline-formula>
<mml:math id="M9" altimg="si9.svg">
<mml:msub>
<mml:mi>W</mml:mi>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:msub>
</mml:math>
</inline-formula>
| provides the scalogram representation of the DNA sequence.</p>
</sec>
<sec id="sec0035">
<label>2.3.2</label>
<title>The energy calculation of the DNA scalograms</title>
<p id="par0120">Since chromosomes X and Y are too long, we decompose
<inline-formula>
<mml:math id="M10" altimg="si7.svg">
<mml:mtext></mml:mtext>
<mml:mi>x</mml:mi>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula>
, which is the correspondent FCGS
<sub>2</sub>
signal, in a set of segments. Each segment
<inline-formula>
<mml:math id="M11" altimg="si10.svg">
<mml:msub>
<mml:mi>x</mml:mi>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mfenced open="(" close=")">
<mml:mi>t</mml:mi>
</mml:mfenced>
</mml:math>
</inline-formula>
has a size of 1000 bp. After segment cut, we apply the CWT wavelet and calculate the correspondent energies. As a result, we obtain a new database of the human DNA representations. In total, we count 156,041 images of the X chromosome and 57,228 images of the Y chromosome. The wavelet coefficients matrix contains the time-frequency information about a signal. To further explore this information, we calculate the scale-energy (E) of each nucleotide position, according to following equation:
<disp-formula id="eq0025">
<label>(5)</label>
<mml:math id="M12" altimg="si11.svg">
<mml:mrow>
<mml:mo>{</mml:mo>
<mml:mrow>
<mml:mtable displaystyle="true">
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:maligngroup></mml:maligngroup>
<mml:msub>
<mml:mrow>
<mml:mi>E</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>=</mml:mo>
<mml:mrow>
<mml:msubsup>
<mml:mo stretchy="true"></mml:mo>
<mml:mrow>
<mml:mi>b</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mn>1000</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mo>|</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>a</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msub>
<mml:mrow>
<mml:mo>[</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>x</mml:mi>
</mml:mrow>
<mml:mi>i</mml:mi>
</mml:msub>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>]</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>|</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:mrow>
<mml:maligngroup></mml:maligngroup>
<mml:mi>f</mml:mi>
<mml:mi>o</mml:mi>
<mml:mi>r</mml:mi>
<mml:mtext>  </mml:mtext>
<mml:mi>e</mml:mi>
<mml:mi>a</mml:mi>
<mml:mi>c</mml:mi>
<mml:mi>h</mml:mi>
<mml:mtext></mml:mtext>
<mml:mtext></mml:mtext>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>:</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
<mml:mi>e</mml:mi>
<mml:mi>n</mml:mi>
<mml:mi>g</mml:mi>
<mml:mi>t</mml:mi>
<mml:mi>h</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>C</mml:mi>
<mml:mi>h</mml:mi>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>/</mml:mo>
<mml:mn>1000</mml:mn>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mrow>
</mml:math>
</disp-formula>
Here, the parameter
<inline-formula>
<mml:math id="M13" altimg="si12.svg">
<mml:mi>a</mml:mi>
</mml:math>
</inline-formula>
represents the scale in the wavelet analysis; it varies from 1 to 64. As for the indicator
<italic>i</italic>
, it represents the image number.</p>
<p id="par0125">By applying (eq.5), we obtain a vector that contains the energy of the DNA scalogram. Peak values higher than 10 in the vector indicate the existence of repetitive patterns in the DNA image.
<xref rid="fig0015" ref-type="fig">Fig. 3</xref>
shows an example of the FCGS
<sub>2</sub>
signal, the correspondent scalogram in a 3D representation and the energy wavelet of a sequence located in chromosome X of the human genome. This sequence corresponds to the portion [342,500 bp: 344,000 bp] in the
<italic>PPP2R3B</italic>
gene.
<fig id="fig0015">
<label>Fig. 3</label>
<caption>
<p>Illustration of the repetitive DNA detection steps based on CWT analysis: a) DNA coding with FCGS
<sub>2</sub>
b) 3D scalogram c) Energy peaks greater than 10 indicates the existence of repetitive sequences.</p>
</caption>
<alt-text id="at0300">Fig. 3</alt-text>
<graphic xlink:href="gr3_lrg"></graphic>
</fig>
</p>
<p id="par0130">As we can see, magnitude of the energy wavelet indicates the presence of periodicities in the sequence. If we consider the frequency content, we can note that the repetitive sequence is characterized by a specific frequency band. The limits of this frequency band correspond to the repetitive DNA portion in the analyzed sequence. As for the 3D representation, it contains repetitive patterns of particular shape that are related to the DA repetitions. Following this method, we have constructed our database of the repetitive DNA images. The patterned images were selected according to the energy-wavelet peaks. The generated database was named "
<italic>repeat-Data</italic>
".</p>
</sec>
<sec id="sec0040">
<label>2.3.3</label>
<title>The reference repetitive sequence search</title>
<p id="par0135">For each DNA image into the
<italic>repeat-Data</italic>
database, we aim to identify a DNA-reference sequence, to which corresponds the existing repetitive pattern in the scalogram. This DNA-reference sequence is the longest subsequence in terms of size and repetition numbers. After this step, we have built a database that contains the location and the repetition number of all the localized sequences of reference. As we focus on detecting new repetitive sequences in the human genome, we verified the availability of the reference repetitive sequence in the public databases. For this, we checked if this sequence is annotated or not in both
<italic>DFAM</italic>
and
<italic>NCBI</italic>
databases. Hence, if our new repetitive sequence is not listed in these public databases, we added it to our new database. This new repetitive sequence is called "
<italic>New-repeat-Data</italic>
".</p>
</sec>
</sec>
<sec id="sec0045">
<label>2.4</label>
<title>Patterns extraction based on adaptive local thresholding and morphological processing</title>
<p id="par0140">After collecting the new repetitive sequences using the FHRS algorithm, we move on to the step of extracting the repeat patterns using image processing tools. The
<xref rid="fig0020" ref-type="fig">Fig. 4</xref>
summarizes the proposed methodology of extracting tandem repeat patterns in the DNA images. It illustrates the results obtained when we considered the “TRseq1” sequence. The sequence is 261 base pairs lengthen; its position is 28,076,765 bp to 28,077,025 bp along the human X chromosome.
<fig id="fig0020">
<label>Fig. 4</label>
<caption>
<p>Flowchart diagram of the adopted segmentation methodology to extract the repetitive patterns.</p>
</caption>
<alt-text id="at0305">Fig. 4</alt-text>
<graphic xlink:href="gr4_lrg"></graphic>
</fig>
</p>
<p id="par0145">As in this example, the data we are treating here is the set of scalogram images that we stored before in the database "New
<italic>-repeat-Data</italic>
". The main goal of this part of work is to detect and localize the repetitive patterns in the scalogram representations. That’s why we based our work on a segmentation algorithm. Our method consists first in decomposing the DNA image into three color channels (red, green and blue) and choosing the blue one. This choice is justified after testing all the color bands. The best segmentation result corresponds to the bleu channel since it is best contrasted compared to the others. Then for a binarization purpose, a simple thresholding is applied to keep only the pixels having an intensity value less than or equal to 26. Then, to keep only the region of interest, we have used an edge detection technique. The Canny edge detector provides good detection and localization relatively to other operators [
<xref rid="bib0175" ref-type="bibr">35</xref>
,
<xref rid="bib0180" ref-type="bibr">36</xref>
]. The algorithm detects brightness discontinuities in the image using a Canny filter. It is a multi-stage algorithm used to detect a wide range of edges in images [
<xref rid="bib0185" ref-type="bibr">37</xref>
,
<xref rid="bib0190" ref-type="bibr">38</xref>
]. The Canny operator uses double thresholds: high and low thresholds. The high threshold algorithm detects important and significant information like lines and contours in the image. The low threshold algorithm ensures that no details are missing. The Canny edge detector is widely used to locate sharp intensity changes and to find object boundaries in an image, especially in computer vision domains. The classification of one pixel as an edge, using the Canny edge detector, is achieved by gradient magnitude computation of this pixel. The result is then compared with one of its neighbors, where the maximum intensity varies the most. Finally, we fill the holes in areas of interest based on morphological operators [
<xref rid="bib0195" ref-type="bibr">39</xref>
]. The result is an image that only contains repetitive patterns. Based on this method, we can then extract and isolate the particular regions of repetitive DNA patterns.</p>
</sec>
<sec id="sec0050">
<label>2.5</label>
<title>DNA-reference sequences location in other human chromosomes and other species</title>
<p id="par0150">After finding the DNA repetitive sequences in the human X and Y chromosomes (which can be tandem or scattered repeated sequences), we verified their existence in other chromosomes or even in other genomes. To achieve this goal, we have used two public bioinformatics algorithms: BLAT [
<xref rid="bib0200" ref-type="bibr">40</xref>
] and DFAM [
<xref rid="bib0205" ref-type="bibr">41</xref>
]. For each new repetitive sequence we detected, we searched it in the whole human genome and in all other genomes using the BLAT platform. As an example, we consider the new scattered repeated sequence “Rseq1”.</p>
<p id="par0155">Rseq1="CTTTAGAGTCTGCATTGGGCCTAGGTCTCATTGAGGACAGATAGAGAGCAGACTGTGCAAC".</p>
<p id="par0160">It is a 61 base pair (bp) lengthen sequence with a repetition number equal to12 in the whole human genome. The corresponding positions on both X and Y chromosomes are given in the following table (
<xref rid="tbl0005" ref-type="table">Table 1</xref>
).
<table-wrap position="float" id="tbl0005">
<label>Table 1</label>
<caption>
<p>Position of “Rseq1” on both X and Y chromosomes of the human genome.</p>
</caption>
<alt-text id="at0375">Table 1</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start (bp)</th>
<th align="left">End (bp)</th>
<th align="left">Start (bp)</th>
<th align="left">End (bp)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">26,609</td>
<td align="left">26,669</td>
<td align="left">41,241</td>
<td align="left">41,301</td>
</tr>
<tr>
<td align="left">26,792</td>
<td align="left">26,852</td>
<td align="left">42,400</td>
<td align="left">42,460</td>
</tr>
<tr>
<td align="left">28,317</td>
<td align="left">28,377</td>
<td align="left">243,312</td>
<td align="left">243,372</td>
</tr>
<tr>
<td align="left">34,474</td>
<td align="left">34,534</td>
<td align="left">244,958</td>
<td align="left">245,018</td>
</tr>
<tr>
<td align="left">34,657</td>
<td align="left">34,717</td>
<td align="left">246,787</td>
<td align="left">246,847</td>
</tr>
<tr>
<td align="left">41,058</td>
<td align="left">41,118</td>
<td align="left">248,556</td>
<td align="left">248,616</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0165">After localizing “Rseq1” in X and Y chromosomes, we searched for the existence of this sequence in other regions.
<xref rid="fig0025" ref-type="fig">Fig. 5</xref>
shows the result of the checking of the “Rseq1” existence in other species. As we can see, “Rseq1” exists in several genomes such as;
<italic>Human</italic>
,
<italic>Gorilla</italic>
,
<italic>Chimpanzee</italic>
,
<italic>Greenmonkey</italic>
,
<italic>Bonobo</italic>
, etc.
<fig id="fig0025">
<label>Fig. 5</label>
<caption>
<p>Illustrative example of using BLAT algorithm to search a new repetitive DNA sequence in the whole human genome and in other genomes.</p>
</caption>
<alt-text id="at0310">Fig. 5</alt-text>
<graphic xlink:href="gr5_lrg"></graphic>
</fig>
</p>
<p id="par0170">After proving the existence of the newly discovered repetitive sequence in all genomes, we tried to find whether this sequence is located in genes. We, especially, searched for its existence in exonic regions or in other families of DNA. If this sequence exists nowhere in these DNA types, we classified it as a new repetitive DNA sequence type. On the other hand, we verified the uniqueness of these new sequences using our approach FHRS, and thus by comparing the repetitive patterns in the scalogram representations.</p>
<p id="par0175">In order to ensure that our work is as meaningful and effective as possible, we thought of establishing a classification system to classify these new datasets (new repetitive DNA sequences). For this reason, we considered the scalogram representation (2D image) as input data to the system. As for the classifier, we have chosen CNNs as they are efficient in terms of images classification
<xref rid="fig0030" ref-type="fig">Fig. 6</xref>
.
<fig id="fig0030">
<label>Fig. 6</label>
<caption>
<p>Architecture of convolutional neural network for DNA images recognition. First layer is convolutional layer. It consists of 64 channels with kernel size of 3*3 voxels. The second is the maxpooling layer. Output of maxpooling layer is the input of the third layer: convolutional layer with 32 channels. Each convolutional layer is the input of the fourth layer: maxpooling layer. Then, the output of maxpooling layer is concatenated, a vector is formed and then inputted to the fully connected layer. The images from the dataset (N = 980) were splitted into 80% for training (780 images) and 20% for testing (200 images). Multiple epochs were used in the training procsess, where the epoch’s number used is equal to 100.</p>
</caption>
<alt-text id="at0315">Fig. 6</alt-text>
<graphic xlink:href="gr6_lrg"></graphic>
</fig>
</p>
</sec>
<sec id="sec0055">
<label>2.6</label>
<title>Convolutional neural network: CNN</title>
<p id="par0180">CNN is a special neural networks type which works using data having a grid topology [
<xref rid="bib0210" ref-type="bibr">42</xref>
]. CNNs classification technique were developed by LeCun et al. (in 1998) in the aim to recognize handwritten characters from bank checks. CNNs is a deep learning model inspired by the visual mechanism of living organisms. It uses convolutional layers to the features extraction from input data. In the CNN model, convolutional layer neurons are able to extract higher-level abstraction features from features extracted at the previous layer. CNN was applied with success in DNA studies [
<xref rid="bib0215" ref-type="bibr">[43]</xref>
,
<xref rid="bib0220" ref-type="bibr">[44]</xref>
,
<xref rid="bib0225" ref-type="bibr">[45]</xref>
,
<xref rid="bib0230" ref-type="bibr">[46]</xref>
], Breast Cancer Cell Segmentation [
<xref rid="bib0235" ref-type="bibr">47</xref>
,
<xref rid="bib0240" ref-type="bibr">48</xref>
], medical diagnosis [
<xref rid="bib0245" ref-type="bibr">49</xref>
,
<xref rid="bib0250" ref-type="bibr">50</xref>
], character recognition [
<xref rid="bib0255" ref-type="bibr">51</xref>
] and in other areas of application.</p>
<p id="par0185">In this work, we used CNN to establish a system of new repetitive DNA sequences recognition in human X and Y chromosomes. For this, we took the RGB scalogram representations of DNA as the input of the classification system with a size of 75 × 100.</p>
<p id="par0190">The DNA images are passed, then, through a stack of convolutional layers, where we used filters with a very small receptive field (3 × 3). These filters act in the role of a scanner as they capture motifs in different orientations (up/down, center, left/right). Each neuron output on a convolutional layer is the result of a convolution operation between the kernel matrix and the neuron input. As for Max-pooling, it is performed over a 2 × 2 pixel window. For each convolutional layer, the second layer is a global max-pooling layer. Each one of max-pooling layers only outputs the maximum value of all of its respective convolutional layers outputs. The second layer is considered as a sample-based discretization process. This process has a goal to down the sample of input and to reduce its dimensionality.</p>
<p id="par0195">After transforming the image into a suitable form for the Multi-Level Perceptron, the image must be flattened into a column vector. The result is a flattened output that is fed to a feed-forward neural network.</p>
<p id="par0200">A back-propagation was applied to every iteration of training. A Fully-Connected layer was added to ensure a non-linear combination learning of the high-level features (which are represented by the output of the flatten layer). The Fully-Connected layer is learning a possibly non-linear function in that space. Over an epoch’s series, using the Softmax Classification technique our model is eligible to distinguish between dominating and certain low-level features in images and it can classify repetitive DNA classes.</p>
<p id="par0205">After transforming the image into a suitable form for the Multi-Level Perceptron, the image must be flattened into a column vector. The result is a flattened output that is fed to a feed-forward neural network. A back-propagation was applied to every iteration of training. A Fully-Connected layer was added to ensure a non-linear combination learning of the high-level features (which are represented by the output of the flatten layer). The Fully-Connected layer is learning a possibly non-linear function in that space. Over an epoch’s series, using the Softmax Classification technique our model is eligible to distinguish between dominating and certain low-level features in images and it can classify repetitive DNA classes.</p>
</sec>
</sec>
<sec id="sec0060">
<label>3</label>
<title>Results</title>
<p id="par0210">Only sexual chromosomes provide opportunities to know the evolution mechanisms from one specie to another. These mechanisms can depend on the accumulation of repetitive sequences [
<xref rid="bib0010" ref-type="bibr">2</xref>
]. In this work, we first applied the FHRS technique to detect new repetitive sequences within human sexual chromosomes (X and Y). After that, we entered these sequences to a CNN based on classification system aiming at recognizing them.</p>
<sec id="sec0065">
<label>3.1</label>
<title>New repetitive DNA detection results</title>
<p id="par0215">In this work, we used the FHRS approach (Find Human Repetitive Sequences), which combines wavelet analysis and a specific coding technique, to represent repetitive patterns in the form of an image. This method has the advantage of identifying new repetitive sequences without using any prior knowledge about the input DNA sequence. Based on this, we have discovered various new repetitive DNA sequences within sexual chromosomes, be they tandem or interspersed. After that, we have looked for the existence of these sequences in the whole human chromosomes or in other genomes. Afterward, we checked if these sequences exist or not in genes. Finally, we classed these repetitive sequences in terms of their relative location to heterochromatin, telomere, and centromere.</p>
<p id="par0220">As a result, we have constructed a database comprising two sub-databases. The first one contains newly discovered repetitive sequences of type satellites and minisatellites. The second one encloses existing repetitive sequences.</p>
<p id="par0225">Here, the new repetitive sequences database provides the composition of the new highly repetitive DNA sequences and the correspondent locations. The repetitive sequences are of different sizes and are classified into two types: tandem repeat sequences or interspersed repeat sequences. We called this new database "
<italic>New-repeat-Data</italic>
".</p>
<p id="par0230">With our approach, highly conserved repetitive DNA sequences, having no annotations in the DNA library (NCBI or DFAM), have been found in the human genome.</p>
<p id="par0235">In the telomere of X and Y chromosomes, we have found highly short or long repetitive sequences. The sequence “Rseq2” (Rseq2=CTTTAGAGTCTG) is an example of short Minisatellite of 21 base pairs. Its repetition number is 312 extending from 26,304 bp to 249,544 bp. In addition, the sequence (CCCTAA)
<sub>n</sub>
, which is annotated in NCBI database, has been well localized using our algorithm.</p>
<p id="par0240">As long repetitive Minisatellite sequences, we have discovered a new sequence “Rseq1” of 61 base pairs and a repetition number of 12. These repetitive sequences exist in the same location within great portions of chromosome Y.
<xref rid="fig0035" ref-type="fig">Fig. 7</xref>
shows an example of the global signature of a new telomeric repetitive sequence with a 71000bp of size.
<fig id="fig0035">
<label>Fig. 7</label>
<caption>
<p>Telomere image signature of homologue regions corresponding to the minisatellite “Rseq2” (
<italic>CTTTAGAGTCTG</italic>
)
<sub>n</sub>
within X and Y chromosomes.</p>
</caption>
<alt-text id="at0320">Fig. 7</alt-text>
<graphic xlink:href="gr7_lrg"></graphic>
</fig>
</p>
<p id="par0245">On the other hand, a high repetitive sequence “Rseq3” (Rseq3=‘TTTAAAGAT’ of size equal to 9 bp) has shown as a new repetitive sequence in the human genome. This short repetitive DNA sequence was found also in many species such as chimpanzees, bonobo, and even in SARS−COV2 (COVID-19) coronavirus genome with a repetition number of 2.
<xref rid="tbl0010" ref-type="table">Table 2</xref>
shows the location of this microsatellite in some chromosomes of the human genome.
<table-wrap position="float" id="tbl0010">
<label>Table 2</label>
<caption>
<p>Positions of the new discovered repeat sequence “Rseq3” in 12 chromosomes of the human genome.</p>
</caption>
<alt-text id="at0380">Table 2</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Chromosome</th>
<th align="left">Repetition number</th>
<th align="left">Chromosome</th>
<th align="left">Repetition number</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Chr X</td>
<td align="left">2302</td>
<td align="left">Chr 6</td>
<td align="left">3016</td>
</tr>
<tr>
<td align="left">Chr 1</td>
<td align="left">3447</td>
<td align="left">Chr 7</td>
<td align="left">2562</td>
</tr>
<tr>
<td align="left">Chr 2</td>
<td align="left">4193</td>
<td align="left">Chr 8</td>
<td align="left">2405</td>
</tr>
<tr>
<td align="left">Chr 3</td>
<td align="left">3450</td>
<td align="left">Chr 9</td>
<td align="left">1916</td>
</tr>
<tr>
<td align="left">Chr 4</td>
<td align="left">3647</td>
<td align="left">Chr 10</td>
<td align="left">837</td>
</tr>
<tr>
<td align="left">Chr 5</td>
<td align="left">3263</td>
<td align="left">Chr 11</td>
<td align="left">2021</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0250">Other sequences are found to be very high repetitive in the human genome, like the sequence “Rseq4” (Rseq4= ‘GTATACA’) which appears in the X chromosome 1375 times. This sequence exists also in the COVID-19 coronavirus.</p>
<p id="par0255">Furthermore, we have found a new minisatellite with a size of 61bp in human. Using the BLAT algorithm, this sequence was also found in the X chromosome of
<italic>Gorilla</italic>
(
<italic>gorGor4</italic>
) with a position of 15499bp to 15,559 bp.
<xref rid="fig0040" ref-type="fig">Fig. 8</xref>
shows the method adopted to localize this repetitive sequence in other regions.
<fig id="fig0040">
<label>Fig. 8</label>
<caption>
<p>Repetitive DNA sequence detection in X chromosome. (a): The 2-D representation provides a visual way to see three characterized long repetitive sequences;(b): Location of these repetitive sequences in other regions.</p>
</caption>
<alt-text id="at0325">Fig. 8</alt-text>
<graphic xlink:href="gr8_lrg"></graphic>
</fig>
</p>
<p id="par0260">
<xref rid="fig0040" ref-type="fig">Fig. 8</xref>
is divided into two result blocks. In the first one, we expose the scalogram corresponding to the new repetitive DNA sequence. The second one contains the sequence location result in all the other genomes using the BLAT algorithm.</p>
<p id="par0265">In the first result block, we provide the scalogram representation of the DNA sequence we have located at the X chromosome of the human genome (Xp22.33, position: [321001:322000bp]). The scalogram representation makes possible to see all the specific repetitive patterns. After that we extracted the reference sequence which is the maximum repetitive sequence having a maximum size in the DNA sequence. Then, we have found two new repetitive sequences that were not referenced by the current bioinformatic systems or sequence alignment programs. Locations of these two new repetitive sequences in both X and Y chromosomes are given by
<xref rid="tbl0015" ref-type="table">Table 3</xref>
. The repetitive patterns in the scalograms prove the presence of two microsatellites: Rseq5 whose size is 61bp and reRseq6 size is 28 bp.
<table-wrap position="float" id="tbl0015">
<label>Table 3</label>
<caption>
<p>Positions of the two new discovered repeat sequences Rseq5 and Rseq 6 in the X and Y chromosomes of the human genome.</p>
</caption>
<alt-text id="at0385">Table 3</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start</th>
<th align="left">End</th>
<th align="left">Size (bp)</th>
<th align="left">sequence</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">321,472</td>
<td align="left">321,602</td>
<td align="left">130</td>
<td align="left" valign="middle" rowspan="3">“Rseq5”</td>
</tr>
<tr>
<td align="left">321,612</td>
<td align="left">321,742</td>
<td align="left">130</td>
</tr>
<tr>
<td align="left">321,752</td>
<td align="left">321,882</td>
<td align="left">130</td>
</tr>
<tr>
<td align="left">321,267</td>
<td align="left">321,295</td>
<td align="left">28</td>
<td align="left" valign="middle" rowspan="7">“Rseq6”</td>
</tr>
<tr>
<td align="left">321,419</td>
<td align="left">321,447</td>
<td align="left">28</td>
</tr>
<tr>
<td align="left">321,561</td>
<td align="left">321,589</td>
<td align="left">28</td>
</tr>
<tr>
<td align="left">321,701</td>
<td align="left">321,729</td>
<td align="left">28</td>
</tr>
<tr>
<td align="left">321,841</td>
<td align="left">321,869</td>
<td align="left">28</td>
</tr>
<tr>
<td align="left">322,148</td>
<td align="left">322,176</td>
<td align="left">28</td>
</tr>
<tr>
<td align="left">323,095</td>
<td align="left">323,123</td>
<td align="left">28</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0270">These sequences are:
<list list-type="simple" id="lis0015">
<list-item id="lsti0040">
<label>-</label>
<p id="par0275">Rseq5=‘AAAAAAAAAAAAAAAAGAAAAGCCGGGCGTGGTGGTGGGTGCCTGTGGTCCCAGCTGCTCGGGACGCTGAGGTGGGAGGATTGCTTGAGCCCAGGAGTTTGACACCAGCATGGGCAATATGGTAAGACCC’.</p>
</list-item>
<list-item id="lsti0045">
<label>-</label>
<p id="par0280">Rseq6=‘CCCAGGAGTTTGACACCAGCATGGGCAA’.</p>
</list-item>
</list>
</p>
<p id="par0285">After the localization of these two repetitive DNA sequences (Rseq5 and Rseq6), we have chosen to use the BLAT alignment tool in order to see if these sequences have other locations in the other human chromosomes or in other genomes. Indeed, the repetitive sequences that migrate to different regions of the genome have a great importance and they have been classified as conservative mobile DNA sequences. Their importance will be higher if these conservative regions are localized in genes.</p>
<p id="par0290">As a result, we have found the Rep2 sequence at the position 321,267 bp to 321447 bp in the intronic region of a non-protein coding RNA 685(
<italic>LINC00685</italic>
) gene, and thus in both X and Y chromosomes [
<xref rid="bib0260" ref-type="bibr">52</xref>
].</p>
<p id="par0295">In the sub-figure b of
<xref rid="fig0040" ref-type="fig">Fig. 8</xref>
(second result), we show that the new repetitive sequence Rep2 is located, not only within other chromosomes (1, 5, 15, X and Y) of the human genome, but also in other genomes like
<italic>chimpanzee</italic>
and
<italic>bonobo</italic>
. Results shown in
<xref rid="tbl0020" ref-type="table">Table 4</xref>
prove that Rep2 has been located in intronic region of different chromosomes of the human genome: 1, 5, 15, X and Y.
<table-wrap position="float" id="tbl0020">
<label>Table 4</label>
<caption>
<p>Position corresponding to the new discovered scattered repeat sequence Rseq6 (28bp) in different chromosomes in the human genome.</p>
</caption>
<alt-text id="at0390">Table 4</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start</th>
<th align="left">end</th>
<th align="left">Chromosome</th>
<th align="left">Gene</th>
<th align="left">location in gene</th>
<th align="left">location in genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">321,267</td>
<td align="left">321,295</td>
<td align="left">X and Y</td>
<td align="left">LINC00685</td>
<td align="left">intron 1/1</td>
<td align="left">Xp22.33 and Yp11.31</td>
</tr>
<tr>
<td align="left">157,332,849</td>
<td align="left">157,332,876</td>
<td align="left">5</td>
<td align="left">CYFIP2</td>
<td align="left">intron 21/31</td>
<td align="left">5q33.3</td>
</tr>
<tr>
<td align="left">22,225,387</td>
<td align="left">22,225,414</td>
<td align="left" valign="middle" rowspan="2">15</td>
<td align="left" valign="middle" rowspan="2">LOC101928039</td>
<td align="left" valign="middle" rowspan="2">uncharacterized</td>
<td align="left" valign="middle" rowspan="2">15q11.2</td>
</tr>
<tr>
<td align="left">22,225,680</td>
<td align="left">22,225,707</td>
</tr>
<tr>
<td align="left">237,615,395</td>
<td align="left">237,615,421</td>
<td align="left">1</td>
<td align="left">RYR2</td>
<td align="left">intron 37/104</td>
<td align="left">1q43</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0300">In fact, the sequence “Rseq6” presents a special intronic conservative region located, not only in different chromosomes but also in different genomes. Rseq6 sequence that have a size of 29 bp has been localized in two genes corresponding to
<italic>chimpanzee</italic>
genome. It is located at the position: 135476733–135476760 in the
<italic>DDX46</italic>
gene ([135444165:135519361 bp]) of the chromosome 5. It is also localized at the position: 86759–57686796 bp positions in the
<italic>FAM13C</italic>
gene ([57587233:57707637bp]) of the chromosome 10.</p>
<p id="par0305">In addition, we present another example of a special new repetitive sequence “Rseq7” which has been found using our approach. The
<xref rid="fig0045" ref-type="fig">Fig. 9</xref>
shows the time-frequency representation of the
<italic>LOC</italic>
652,608 gene which has a size of 2532 bp. The gene is found at the position: 1172583–1175114 bp in the X chromosome of the human genome. This pseudo-gene is a 60S ribosomal protein L6-like. The DNA image shown in
<xref rid="fig0045" ref-type="fig">Fig. 9</xref>
demonstrates three exonic regions and two intronic regions.
<fig id="fig0045">
<label>Fig. 9</label>
<caption>
<p>LOC652608 Gene in the X chromosome contains a tandem repeat sequence: Rseq7 started in Intronic region (Intron 2) until Exonic region (Exon 3).</p>
</caption>
<alt-text id="at0330">Fig. 9</alt-text>
<graphic xlink:href="gr9_lrg"></graphic>
</fig>
</p>
<p id="par0310">We can clearly see that the second intronic region is composed by a specific tandemic sequence which we called “Rseq7”. The correspondent modified version has the same size as “Rseq7” which is equal to 208 bp.</p>
<p id="par0315">This particular repetitive sequence starts in the intronic zone: Intron2 until reaching and exceeding the exonic zone: Exon3; with a modification of 11 nucleotides.</p>
<p id="par0320">Intron 2 is a noncoding sequence (208 bp) which is composed of multiple repetitions of “Rseq7”.</p>
<p id="par0325">Rseq7=‘TGATGGTTTTCCTGAAGCAGCTGGCTAGTGGCTTGTTACTCGTAACTGGACCTCTGGTCCTCAATCGAGTCCCTCCACGAAGAACGCACCAGAAATTTGTCATTGCCACCTCAACCAAAATCGGTATCAGCAATGTAAAAATCTCAAAACATCTTAGTGATGCTGACTTGAAGAAGAAGAAGCTGTGGAAGCCCAGACACCAGGAGAG’.</p>
<p id="par0330">Then, we searched this new tandem repeat “Rseq7” in the other chromosomes. As a result, we found that this sequence exists in 7 chromosomes with some nucleotides modifications. Moreover, we have located this modified intronic sequence in genes regions of other chromosomes of the human genome.</p>
<p id="par0335">
<xref rid="fig0050" ref-type="fig">Fig. 10</xref>
shows two reference sequences and the modified version. The first exonic sequence example corresponds to the
<italic>LOC</italic>
652608 gene in located in the X chromosome (
<xref rid="fig0050" ref-type="fig">Fig. 10</xref>
a). The second exonic sequence corresponds to the
<italic>RPL6P</italic>
22 gene in which is located in the chromosome 7 (
<xref rid="fig0050" ref-type="fig">Fig. 10</xref>
b).
<fig id="fig0050">
<label>Fig. 10</label>
<caption>
<p>Two examples of conserved intronic repetitive sequences (satellites) and noncoding sequence located in coding region such as senescence [
<xref rid="bib0265" ref-type="bibr">53</xref>
].</p>
</caption>
<alt-text id="at0335">Fig. 10</alt-text>
<graphic xlink:href="gr10_lrg"></graphic>
</fig>
</p>
<p id="par0340">For these two examples the nucleotides variation number between the intronic sequence “Rseq7” and the exonic sequence is equal to 11 base pairs but with different locations.</p>
<p id="par0345">On the other hand, we have chosen to use image processing techniques to extract the repetitive sequences. The idea consists in segmenting the scalogram image in order to extract the repetitive patterns. For this purpose, we developed a new segmentation algorithm applied to the DNA scalograms.
<xref rid="fig0055" ref-type="fig">Fig. 11</xref>
illustrates the obtained results by our segmentation algorithm with a thresholding value equal to 26. It shows the location of the “Rseq7” repetitive sequences and the correspondent modified versions. Here, we can see in the first subfigure (scalogram) that the repetitive pattern is located at: 1173583bp-1175114 bp in the X chromosome of the human genome. The second subfigure presents the segmented image. The repetitive patterns correspond to the repetitive sequences which start in intronic sequences and end in exonic region with some nucleotides modification (11 nucleotides) in the beginning and in the end (
<xref rid="fig0055" ref-type="fig">Fig. 11</xref>
).
<fig id="fig0055">
<label>Fig. 11</label>
<caption>
<p>Example of DNA image segmentation by which we can obtain the begining and the end of the repetitive patterns located in intronic region (Intron 2), and the corresponding modified sequences (especially in exonic region) with the modification region.</p>
</caption>
<alt-text id="at0340">Fig. 11</alt-text>
<graphic xlink:href="gr11_lrg"></graphic>
</fig>
</p>
<p id="par0350">After the repetitive sequences localization, we checked if these sequences are located in other regions in the human genome and even in the genomes of other species.
<xref rid="tbl0025" ref-type="table">Table 5</xref>
shows the location of the repetitive sequence “Rseq7” and its modified repetitive sequences in different gene regions of different chromosomes in the human genome. We can note that this new repetitive sequence characterizes a ribosomal protein (RPs) region in the human genome. The ribosomal RNA gene repeat (rDNA) is the largest repetitive region in the eukaryotic genome. The genome stability depends on the stability of the rDNA, the latter affects cellular functions
<table-wrap position="float" id="tbl0025">
<label>Table 5</label>
<caption>
<p>Location of repetitive intronic satellites sequence “Rseq7” and the corresponding exonic modified sequences in different chromosomes of the human genome.</p>
</caption>
<alt-text id="at0395">Table 5</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start</th>
<th align="left">end</th>
<th align="left">Chromosome</th>
<th align="left">Gene</th>
<th align="left">location in gene</th>
<th align="left">location in genome</th>
<th align="left">sequence</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1,174,105</td>
<td align="left">1,174,312</td>
<td align="left" valign="middle" rowspan="4">X and Y</td>
<td align="left" valign="middle" rowspan="4">LOC652608</td>
<td align="left" valign="middle" rowspan="3">Intron 2</td>
<td align="left" valign="middle" rowspan="4">Xp22.33 and Yp11.2</td>
<td align="left" valign="middle" rowspan="3">Rseq7(208 bp)</td>
<td align="left" valign="middle" rowspan="4">60S
<italic>ribosomal protein</italic>
L6-like</td>
</tr>
<tr>
<td align="left">1,174,313</td>
<td align="left">1,174,520</td>
</tr>
<tr>
<td align="left">1,174,521</td>
<td align="left">1,174,728</td>
</tr>
<tr>
<td align="left">1,174,729</td>
<td align="left">1,174,936</td>
<td align="left">Exon 3</td>
<td align="left">modified Rseq7</td>
</tr>
<tr>
<td align="left">45,781,761</td>
<td align="left">45,781,958</td>
<td align="left">1</td>
<td align="left">RPL6P1</td>
<td align="left">Exon 1</td>
<td align="left">1p34.1</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 1</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="2">65,573,946</td>
<td align="left" valign="middle" rowspan="2">65,574,144</td>
<td align="left" valign="middle" rowspan="2">4</td>
<td align="left">EPHA5</td>
<td align="left">Exon 1</td>
<td align="left">4q13.1-q13.2</td>
<td align="left">modified Rseq7</td>
<td align="left">EPH receptor A5</td>
</tr>
<tr>
<td align="left">RPL6P10</td>
<td align="left">Exon 1</td>
<td align="left">4q13.2</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 10</td>
</tr>
<tr>
<td align="left">137,722,471</td>
<td align="left">137,722,672</td>
<td align="left" valign="middle" rowspan="3">7</td>
<td align="left">DGKI</td>
<td align="left">Intron 2</td>
<td align="left">7q33</td>
<td align="left">modified Rseq7</td>
<td align="left">OTTHUMP00000208597</td>
</tr>
<tr>
<td align="left">14,070,714</td>
<td align="left">14,070,911</td>
<td align="left">RPL6P21</td>
<td align="left">Exon 2</td>
<td align="left">7p21.3</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 21</td>
</tr>
<tr>
<td align="left">64,141,719</td>
<td align="left">64,141,920</td>
<td align="left">AC091685.2</td>
<td align="left">Exon 2</td>
<td align="left">7q11.21</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 11</td>
</tr>
<tr>
<td align="left" valign="middle" rowspan="2">33,859,542</td>
<td align="left" valign="middle" rowspan="2">33,859,740</td>
<td align="left" valign="middle" rowspan="2">8</td>
<td align="left">LOC105379364</td>
<td align="left">uncharacterized</td>
<td align="left">8p12</td>
<td align="left">modified Rseq7</td>
<td align="left">uncharacterized LOC105379364</td>
</tr>
<tr>
<td align="left">RPL6P22</td>
<td align="left">Exon 1</td>
<td align="left">8p12</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 22</td>
</tr>
<tr>
<td align="left">83,151,812</td>
<td align="left">83,152,006</td>
<td align="left" valign="middle" rowspan="2">12</td>
<td align="left">RPL6P25</td>
<td align="left">Exon 1</td>
<td align="left">12q21.31</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 25</td>
</tr>
<tr>
<td align="left">112,405,884</td>
<td align="left">112,406,338</td>
<td align="left">RPL6</td>
<td align="left">Exon 6</td>
<td align="left">12q24.13</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6</td>
</tr>
<tr>
<td align="left">6,462,328</td>
<td align="left">6,462,526</td>
<td align="left">18</td>
<td align="left">RPL6P27</td>
<td align="left">Exon 1</td>
<td align="left">18p11.31</td>
<td align="left">modified Rseq7</td>
<td align="left">
<italic>ribosomal protein</italic>
L6 pseudogene 27</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0355">The next example in
<xref rid="fig0060" ref-type="fig">Fig. 12</xref>
shows highly repetitive patterns in the X chromosome at position: 2277000–2282500 bp (Xp22.33 region) in the human genome. This region contains tandem repeat sequences and interspersed repeat sequences. In addition, the localization results have shown that these specific patterns are localized in the intronic region of the
<italic>DHRSX</italic>
gene ([2,219,506 bp: 2,500,974 bp]) in the X chromosome and even in other genes located in other chromosomes.
<fig id="fig0060">
<label>Fig. 12</label>
<caption>
<p>Scalogram corresponding to a DNA sequence in X chromosome that contains repetitive sequences in intronic region.</p>
</caption>
<alt-text id="at0345">Fig. 12</alt-text>
<graphic xlink:href="gr12_lrg"></graphic>
</fig>
</p>
<p id="par0360">
<italic>DHRSX</italic>
gene is a new gene discovered in 2014 at the Xp22.33 and Yp11.2 in the human genome. It has been shown that the protein encoded by this gene is implicated in the positive regulation of starvation induced autophagy [
<xref rid="bib0270" ref-type="bibr">54</xref>
].</p>
<p id="par0365">The scalogram represented in
<xref rid="fig0060" ref-type="fig">Fig. 12</xref>
indicates the presence of repetitive patterns in intronic regions. The reference sequence corresponding to tandem repeat sequence “Rseq8” has a size equal to 89bp and 14 as a repetition number. Other repetitive sequences are localized in these intronic regions which are:
<list list-type="simple" id="lis0020">
<list-item id="lsti0050">
<label>-</label>
<p id="par0370">“Rseq9” with a size of 42 bp and 26 as repetition number</p>
</list-item>
<list-item id="lsti0055">
<label>-</label>
<p id="par0375">“Rseq10” with a size of 19 bp and 63 as repetition number</p>
</list-item>
<list-item id="lsti0060">
<label>-</label>
<p id="par0380">“Rseq11” with a size of 6 bp and 123 as repetition number.</p>
</list-item>
</list>
</p>
<p id="par0385">All these repetitive sequences are minisatellite type. In the NCBI database, these regions are defined as a low complexity G-rich repetition and there is no further given information.
<list list-type="simple" id="lis0025">
<list-item id="lsti0065">
<label></label>
<p id="par0390">Rseq8="AGGGAGAGAGAGGGAGGGCAAACGAGAGGGAGAGAGAAGGAGGAGGAGGAAATGGGGGAAAGAGAGAGAAAGAGAGATGGAGAGGGAAC"</p>
</list-item>
<list-item id="lsti0070">
<label></label>
<p id="par0395">Rseq9="AGAGAGATGGAGAGGGAACAGGGAGAGAGAGGGAGGGCAAAC"</p>
</list-item>
<list-item id="lsti0075">
<label></label>
<p id="par0400">Rseq10="AGAGAGATGGAGAGGGAAC"</p>
</list-item>
<list-item id="lsti0080">
<label></label>
<p id="par0405">Rseq11= "AGAGAGAA"</p>
</list-item>
</list>
</p>
<p id="par0410">These repetitive sequences are also located at the same position in intronic region within the
<italic>DHRSX</italic>
gene in the Y chromosome of the human genome.</p>
<p id="par0415">
<xref rid="tbl0030" ref-type="table">Table 6</xref>
details the location of the new repetitive sequence “Rseq8” inside the X and Y chromosomes.
<table-wrap position="float" id="tbl0030">
<label>Table 6</label>
<caption>
<p>Location of the intronic repetitive sequence “Rseq8” in the X and Y chromosomes of the human genome.</p>
</caption>
<alt-text id="at0400">Table 6</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start</th>
<th align="left">End</th>
<th align="left">Repetition types</th>
<th align="left">Gene</th>
<th align="left">Location in gene</th>
<th align="left">location in genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">2,277,547</td>
<td align="left">2,277,635</td>
<td align="left" valign="middle" rowspan="3">Tandem</td>
<td align="left" valign="middle" rowspan="14">DHRSX</td>
<td align="left" valign="middle" rowspan="14">Intron 4/6</td>
<td align="left" valign="middle" rowspan="14">Xp22.33 and Yp11.2</td>
</tr>
<tr>
<td align="left">2,277,636</td>
<td align="left">2,277,724</td>
</tr>
<tr>
<td align="left">2,277,725</td>
<td align="left">2,277,813</td>
</tr>
<tr>
<td align="left">2,277,903</td>
<td align="left">2,277,991</td>
<td align="left">Dispersed</td>
</tr>
<tr>
<td align="left">2,278,791</td>
<td align="left">2,278,879</td>
<td align="left" valign="middle" rowspan="3">Tandem</td>
</tr>
<tr>
<td align="left">2,278,880</td>
<td align="left">2,278,968</td>
</tr>
<tr>
<td align="left">2,278,969</td>
<td align="left">2,279,057</td>
</tr>
<tr>
<td align="left">2,279,147</td>
<td align="left">2,279,235</td>
<td align="left" valign="middle" rowspan="2">Tandem</td>
</tr>
<tr>
<td align="left">2,279,236</td>
<td align="left">2,279,324</td>
</tr>
<tr>
<td align="left">2,280,290</td>
<td align="left">2,280,378</td>
<td align="left" valign="middle" rowspan="3">Tandem</td>
</tr>
<tr>
<td align="left">2,280,379</td>
<td align="left">2,280,467</td>
</tr>
<tr>
<td align="left">2,280,468</td>
<td align="left">2,280,556</td>
</tr>
<tr>
<td align="left">2,280,646</td>
<td align="left">2,280,734</td>
<td align="left" valign="middle" rowspan="2">Tandem</td>
</tr>
<tr>
<td align="left">2,280,735</td>
<td align="left">2,280,823</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>The
<xref rid="tbl0035" ref-type="table">Table7</xref>
provides the locations of “Rseq9” in the X chromosome of other genomes.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<table-wrap position="float" id="tbl0035">
<label>Table 7</label>
<caption>
<p>Position of “Rseq9” in X chromosome of other genomes.</p>
</caption>
<alt-text id="at0405">Table 7</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start</th>
<th align="left">End</th>
<th align="left">chromosome</th>
<th align="left">genome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1,833,384</td>
<td align="left">1,833,425</td>
<td align="left">X</td>
<td align="left">
<italic>Gorilla</italic>
</td>
</tr>
<tr>
<td align="left">1,927,705</td>
<td align="left">1,927,746</td>
<td align="left" valign="middle" rowspan="7">X</td>
<td align="left" valign="middle" rowspan="7">
<italic>Chimpanzee</italic>
</td>
</tr>
<tr>
<td align="left">1,927,883</td>
<td align="left">1,927,924</td>
</tr>
<tr>
<td align="left">1,928,326</td>
<td align="left">1,928,367</td>
</tr>
<tr>
<td align="left">1,928,414</td>
<td align="left">1,928,455</td>
</tr>
<tr>
<td align="left">1,928,503</td>
<td align="left">1,928,544</td>
</tr>
<tr>
<td align="left">1,928,592</td>
<td align="left">1,928,633</td>
</tr>
<tr>
<td align="left">1,928,771</td>
<td align="left">1,928,812</td>
</tr>
<tr>
<td align="left">2,220,841</td>
<td align="left">2,220,882</td>
<td align="left">X</td>
<td align="left">
<italic>Bonobo</italic>
</td>
</tr>
<tr>
<td align="left">1,800,783</td>
<td align="left">1,800,824</td>
<td align="left">X</td>
<td align="left">
<italic>Rhesus</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0420">Furthermore, this repetitive sequence is located inside the intronic region of the
<italic>DHRSX</italic>
gene with tandem repeat and dispersed repeat forms.</p>
<p id="par0425">
<xref rid="fig0065" ref-type="fig">Fig. 13</xref>
shows an example of another repeat tandem pattern found in the X chromosome at position 27210460−27211308bp in the human genome. The annotation of this sequence in the NCBI database indicates the presence of the simple repeats classes (TA)n ([27,210,497 : 27,210,679]), (CATATA)n ([27,210,682 : 27,210,757]) and (TA)n ([27,210,758 :27211323]). These confirmed repetitive sequences have been also located with our approach. In addition we discovered the new repetitive sequences:‘ATATATGATATATACTATATATGTCATATATACATATACAC’, ‘ATATATGATATATAC’, ‘TGATAT’, ‘TACATA’ and ‘GATATA’ These sequences have been localized inside the LOC105373150 gene ([27153368:27399005]) within the Xp21.3 region of the human genome.
<list list-type="simple" id="lis0030">
<list-item id="lsti0085">
<label></label>
<p id="par0430">Rseq12=“ATATATGATATATACTATATATGTCATATATACATATACAC”</p>
</list-item>
</list>
<fig id="fig0065">
<label>Fig. 13</label>
<caption>
<p>Scalogram representation of a new discovered tandem repeat sequence “Rseq12”:(ATATATGATATATACTATATATGTCATATATACATATACAC)
<sub>n</sub>
.</p>
</caption>
<alt-text id="at0350">Fig. 13</alt-text>
<graphic xlink:href="gr13_lrg"></graphic>
</fig>
</p>
<p id="par0435">The short repetitive sequence "TACATA" (6 bp) appears 22 times in this DNA sequence and has 69,710 as a repetition number in the X chromosome.</p>
<p id="par0440">After searching for the existence of this tandem repeat sequence “Rseq12” in other locations, we have searched it in the X chromosome of other genomes:
<italic>Bonobo</italic>
genome [27,166,538 bp: 27,166,578 bp];
<italic>Chimpanzee</italic>
genome [27158028:27158068 bp].</p>
<p id="par0445">Using our algorithm, we have successfully found 9 repetitions of another new short repetitive sequence as a tandem repeat sequence (TRs). We called this sequence of 29 base pairs “Rseq13”.
<list list-type="simple" id="lis0035">
<list-item id="lsti0090">
<label>-</label>
<p id="par0450">Rseq13=“CTGTATAACCTAAATAATATAGGTTATAT”</p>
</list-item>
</list>
</p>
<p id="par0455">
<xref rid="fig0070" ref-type="fig">Fig. 14</xref>
shows the scalogram of a new repetitive DNA sequence that we called “Rseq13”. The sequence has a size of 261 bp and it is localized at 28076765–28077025 bp in the X chromosome. It is a tandem repeat sequence, with patterns of 29 bp length: “Rseq13”. The NCBI and the Dfam databases don’t indicate the existence of such repetitive sequence (“Rseq13”). With our approach we succeeded to detect this tandem repeat without any prior knowledge about its existence.
<fig id="fig0070">
<label>Fig. 14</label>
<caption>
<p>Scalogram image corresponding to DNA sequence “TRseq1” (with size equal to 261) containing the tandem repeat sequence “Rseq13” with a repetition number equal to 9.</p>
</caption>
<alt-text id="at0355">Fig. 14</alt-text>
<graphic xlink:href="gr14_lrg"></graphic>
</fig>
</p>
<p id="par0460">The repetitive sequence “Rseq13” is located not only in the X chromosome of human genome but also in other genomes like in the X chromosomes of
<italic>Bonobo</italic>
(at [28,032,917 bp-28,033,158 bp]),
<italic>Chimpanzee</italic>
([28,028,604 bp-28,028,816 bp]) and
<italic>Gorilla</italic>
([28,333,971 bp-28,334,231 bp]) with two nucleotides modification.</p>
<p id="par0465">
<xref rid="fig0075" ref-type="fig">Fig. 15</xref>
shows the scalogram of a new DNA sequence “TRseq2” with a size of 261 bp. The sequence is positioned at 156029111–156029371 bp in the X chromosome. As we can see, the scalogram contains a repetitive pattern corresponding to a tandem repeat sequence: “Rseq14”. This subsequence ("TCTCTGCGCCTGCGCCGGCGCGGCGCGCC") has a size of 29 bp and 9 as a repetition number.
<fig id="fig0075">
<label>Fig. 15</label>
<caption>
<p>Scalogram image corresponding to DNA sequence “TRseq2” confirm the existing of the “Rseq14” tandem repeat sequence (TCTCTGCGCCTGCGCCGGCGCGGCGCGCC)
<sub>n</sub>
annotated in [
<xref rid="bib0225" ref-type="bibr">45</xref>
] as a minisatellites sequence which their repetition number equal to 9.</p>
</caption>
<alt-text id="at0360">Fig. 15</alt-text>
<graphic xlink:href="gr15_lrg"></graphic>
</fig>
</p>
<p id="par0470">Rseq14 is not annotated as a tandem repeat in the NCBI or the Dfam databases but it is defined as a TAR1of the telomeric satellite family [
<xref rid="bib0275" ref-type="bibr">55</xref>
].</p>
<p id="par0475">In
<xref rid="tbl0040" ref-type="table">Table 8</xref>
, we provide the localization results of “Rseq14” in the whole human genome and in other genomes.
<table-wrap position="float" id="tbl0040">
<label>Table 8</label>
<caption>
<p>Repetitive sequence location corresponding to “Rseq14” in the X human chromosome and in other genomes.</p>
</caption>
<alt-text id="at0410">Table 8</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Start</th>
<th align="left">End</th>
<th align="left">Repetition number</th>
<th align="left">Repetition type</th>
<th align="left">chromosome</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">12,491</td>
<td align="left">12,751</td>
<td align="left">9</td>
<td align="left" valign="middle" rowspan="10">TAR1 : Satellitetelomeric</td>
<td align="left">5</td>
</tr>
<tr>
<td align="left">12,520</td>
<td align="left">12,780</td>
<td align="left">9</td>
<td align="left">5</td>
</tr>
<tr>
<td align="left">57,215,631</td>
<td align="left">57,215,891</td>
<td align="left">9</td>
<td align="left">Y</td>
</tr>
<tr>
<td align="left">156,029,111</td>
<td align="left">156,029,371</td>
<td align="left">9</td>
<td align="left">X</td>
</tr>
<tr>
<td align="left">10,629</td>
<td align="left">10,950</td>
<td align="left">2</td>
<td align="left" valign="middle" rowspan="2">1</td>
</tr>
<tr>
<td align="left">181,167</td>
<td align="left">181,311</td>
<td align="left">2</td>
</tr>
<tr>
<td align="left">10,754</td>
<td align="left">11,079</td>
<td align="left">2</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">10,601</td>
<td align="left">10,629</td>
<td align="left">1</td>
<td align="left">16</td>
</tr>
<tr>
<td align="left">101,980,093</td>
<td align="left">101,980,000</td>
<td align="left">1</td>
<td align="left">15</td>
</tr>
<tr>
<td align="left">135,076,184</td>
<td align="left">135,076,000</td>
<td align="left">1</td>
<td align="left">11</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0480">
<xref rid="fig0080" ref-type="fig">Fig. 16</xref>
shows the scalogram of another new DNA sequence: “TRseq3” with a size of 500 bp and extending from 2845001bp to 2845500bp in the X chromosome of human genome. The sequence contains a tandem repeat sequence: “Rseq15” (CGTGTGTATGTATATTTATATACA), which size is a 24 bp and its repetition number is equal to 18. This sequence is not annotated as a tandem repeat sequence in the NCBI database nor in the Dfam database.
<fig id="fig0080">
<label>Fig. 16</label>
<caption>
<p>Scalogram image corresponding to the DNA sequence “TRseq3” that contains “Rseq15” as tandem repeat motif.</p>
</caption>
<alt-text id="at0365">Fig. 16</alt-text>
<graphic xlink:href="gr16_lrg"></graphic>
</fig>
</p>
<p id="par0485">Our "New-repeat-Data" database of all new discovered repetitive sequences are presented in “Supplementary Material” file. To conclude, we succeeded to implement an efficient algorithm for repetitive sequences detection. The sequences we detected are of two types: satellites and minisatellites. On the other hand, we have obtained better results than those of the bioinformatics tools. The main advantage presented by this work is being independent of any prior knowledge about the searched repeat.</p>
</sec>
<sec id="sec0070">
<label>3.2</label>
<title>CNN classification results</title>
<p id="par0490">In this section, we present the results of using CNN model to classify DNA scalograms obtained in the first part of this work. Our goal is to identify the different classes of the new repetitive sequences we discovered and stocked in the "New-repeat-Data" database. As a data, we randomly took 200 non-repetitive sequences (NonRep) and 780 repetitive sequences (Rep). Repetitive sequences data consists of 780 sequences divided into 4 classes depending on their repetitive pattern length (
<xref rid="tbl0045" ref-type="table">Table 9</xref>
). These classes are: Rep1 (with a size >100), Rep2 (with a size between 60 and 100), Rep3 (with a size between 30 and 60) and Rep4 (with a size <30). In globally, our constructed database contains five classes that four contain scalograms of repetitive sequences and one contains scalograms without repetitive sequences. For the classification purpose, all the dataset (980 scalogram images) was splitted into 80% for training (784 images) and 20% for testing (196 images). Thus, by such classification system we can discover images that contain similar repetitive patterns. We can also differentiate these images from others that don’t contain repetitions.
<table-wrap position="float" id="tbl0045">
<label>Table 9</label>
<caption>
<p>Description of the input data to the CNN classification system.</p>
</caption>
<alt-text id="at0415">Table 9</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">CLASS</th>
<th align="left">Repetitive pattern with size X</th>
<th align="left">NUMBER</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Rep1</td>
<td align="left">X>100</td>
<td align="left">180</td>
</tr>
<tr>
<td align="left">Rep2</td>
<td align="left">60</td>
<td align="left">150</td>
</tr>
<tr>
<td align="left">Rep3</td>
<td align="left">30</td>
<td align="left">200</td>
</tr>
<tr>
<td align="left">Rep4</td>
<td align="left">X<30</td>
<td align="left">250</td>
</tr>
<tr>
<td align="left">NonRep</td>
<td align="left">NONE</td>
<td align="left">200</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0495">The
<xref rid="fig0085" ref-type="fig">Fig. 17</xref>
represents the classification results of the four repetitive DNA classes (images with repetitive patterns) against one class of non-repetitive DNA (images with no repetitive patterns).
<fig id="fig0085">
<label>Fig. 17</label>
<caption>
<p>Confusion Matrix result obtained by our classification system.</p>
</caption>
<alt-text id="at0370">Fig. 17</alt-text>
<graphic xlink:href="gr17_lrg"></graphic>
</fig>
</p>
<p id="par0500">With the CNN model, we distinguished different specific types of DNA images. The score ranges from 89% to 100%. The obtained results yield an average score of 94.4%.</p>
<p id="par0505">The confusion matrix of the classification rates confirms that our system is efficient in distinguishing between small repetitive patterns (Rep4) and non-repetitive DNA sequences (NonRep) with score equal to 100%. This result is quite clear, since the scalogram images of these two classes are very different.</p>
<p id="par0510">The following
<xref rid="tbl0050" ref-type="table">Table 10</xref>
contains three evaluation measurements: precision, recall and F1-score which we used to evaluate our classification system.
<table-wrap position="float" id="tbl0050">
<label>Table 10</label>
<caption>
<p>Evaluation measurements of our classification system.</p>
</caption>
<alt-text id="at0420">Table 10</alt-text>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Class</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F1-score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Rep1</td>
<td align="left">94</td>
<td align="left">92</td>
<td align="left">93</td>
</tr>
<tr>
<td align="left">Rep2</td>
<td align="left">89</td>
<td align="left">89</td>
<td align="left">89</td>
</tr>
<tr>
<td align="left">Rep3</td>
<td align="left">88</td>
<td align="left">91</td>
<td align="left">90</td>
</tr>
<tr>
<td align="left">Rep4</td>
<td align="left">100</td>
<td align="left">100</td>
<td align="left">100</td>
</tr>
<tr>
<td align="left">NonRep</td>
<td align="left">100</td>
<td align="left">100</td>
<td align="left">100</td>
</tr>
<tr>
<td align="left">avg/ total</td>
<td align="left">94.2</td>
<td align="left">94.4</td>
<td align="left">94.4</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p id="par0515">Overall, our system gives good results in recognizing the four new repetitive DNA sequences with an average of 95% in precision, recall and F1-score.</p>
</sec>
</sec>
<sec id="sec0075">
<label>4</label>
<title>Conclusion</title>
<p id="par0520">Genetic knowledge improvement of the human genome is a complex and a continuous research process. To contribute to this process, bioinformatics and signal and images processing tools have been applied to reveal hidden spectral features of DNA sequences. Although the repetitive DNA sequences occupy 40% of the
<italic>Human</italic>
genome, the localization of these sequences remains insufficient as it is a very difficult task.</p>
<p id="par0525">In this paper, we proposed a new algorithm based on the signal and image processing tools to extract the repetitive patterns from DNA images that correspond to the repetitive DNA sequences. The main goal of this is to create a new database that contains locations of all the new discovered repetitive sequences. As an example of the obtained results, we found a new modified repetitive sequence that can characterize 60S ribosomal protein: “Rseq7”. Therefore, deeper studies that may give a biological interpretation of these results will be welcome.</p>
<p id="par0530">In this article, we proposed a novel and highly-effective method for DNA images prediction based on CNN model. In our prediction system, the obtained accuracy scores over 100 fold cross validation ranged from 89% to 100% with an overall score of 94.4%.</p>
<p id="par0535">On behalf of all authors, the corresponding author states that there is no conflict of interest.</p>
</sec>
<sec id="sec0080">
<title>Declaration of Competing Interest</title>
<p id="par0540">The authors declare that there are no conflict of interest exists and no competing interests regarding the publication of this paper.</p>
</sec>
<sec id="sec00005">
<title>CRediT authorship contribution statement</title>
<p id="par00005">
<bold>Rabeb Touati:</bold>
Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing - original draft, Writing - review & editing.
<bold>Asma Tajouri:</bold>
Writing - review & editing, Validation.
<bold>Imen Mesaoudi:</bold>
Writing - review & editing.
<bold>Afef Elloumi Oueslati:</bold>
Validation, Formal analysis.
<bold>Zied Lachiri:</bold>
Validation.
<bold>Maher Kharrat:</bold>
Conceptualization, Supervision.</p>
</sec>
</body>
<back>
<ref-list id="bibl0005">
<title>References</title>
<ref id="bib0005">
<label>1</label>
<element-citation publication-type="journal" id="sbref0005">
<person-group person-group-type="author">
<name>
<surname>Venter</surname>
<given-names>J.C.</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>M.D.</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>E.W.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.W.</given-names>
</name>
<name>
<surname>Mural</surname>
<given-names>R.J.</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G.G.</given-names>
</name>
<name>
<surname>Gocayne</surname>
<given-names>J.D.</given-names>
</name>
</person-group>
<article-title>The sequence of the human genome</article-title>
<source>Science</source>
<volume>291</volume>
<issue>5507</issue>
<year>2001</year>
<fpage>1304</fpage>
<lpage>1351</lpage>
<pub-id pub-id-type="pmid">11181995</pub-id>
</element-citation>
</ref>
<ref id="bib0010">
<label>2</label>
<element-citation publication-type="journal" id="sbref0010">
<person-group person-group-type="author">
<name>
<surname>de Freitas</surname>
<given-names>N.L.</given-names>
</name>
<name>
<surname>Al-Rikabi</surname>
<given-names>A.B.</given-names>
</name>
<name>
<surname>Bertollo</surname>
<given-names>L.A.C.</given-names>
</name>
<name>
<surname>Ezaz</surname>
<given-names>T.</given-names>
</name>
<name>
<surname>Yano</surname>
<given-names>C.F.</given-names>
</name>
<name>
<surname>de Oliveira</surname>
<given-names>E.A.</given-names>
</name>
<name>
<surname>de Bello Cioffi</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Early stages of XY sex chromosomes differentiation in the fish Hoplias malabaricus (Characiformes, Erythrinidae) revealed by DNA repeats accumulation</article-title>
<source>Curr. Genomics</source>
<volume>19</volume>
<issue>3</issue>
<year>2018</year>
<fpage>216</fpage>
<lpage>226</lpage>
<pub-id pub-id-type="pmid">29606909</pub-id>
</element-citation>
</ref>
<ref id="bib0015">
<label>3</label>
<element-citation publication-type="journal" id="sbref0015">
<person-group person-group-type="author">
<name>
<surname>Ramel</surname>
<given-names>C.</given-names>
</name>
</person-group>
<article-title>Mini-and microsatellites</article-title>
<source>Environ. Health Perspect.</source>
<volume>105</volume>
<issue>suppl 4</issue>
<year>1997</year>
<fpage>781</fpage>
<lpage>789</lpage>
<pub-id pub-id-type="pmid">9255562</pub-id>
</element-citation>
</ref>
<ref id="bib0020">
<label>4</label>
<element-citation publication-type="book" id="sbref0020">
<person-group person-group-type="author">
<name>
<surname>Biscotti</surname>
<given-names>M.A.</given-names>
</name>
<name>
<surname>Olmo</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Heslop-Harrison</surname>
<given-names>J.P.</given-names>
</name>
</person-group>
<chapter-title>Repetitive DNA in Eukaryotic Genomes</chapter-title>
<year>2015</year>
</element-citation>
</ref>
<ref id="bib0025">
<label>5</label>
<element-citation publication-type="journal" id="sbref0025">
<person-group person-group-type="author">
<name>
<surname>Treangen</surname>
<given-names>T.J.</given-names>
</name>
<name>
<surname>Salzberg</surname>
<given-names>S.L.</given-names>
</name>
</person-group>
<article-title>Repetitive DNA and next-generation sequencing: computational challenges and solutions</article-title>
<source>Nat. Rev. Genet.</source>
<volume>13</volume>
<issue>1</issue>
<year>2012</year>
<fpage>36</fpage>
</element-citation>
</ref>
<ref id="bib0030">
<label>6</label>
<element-citation publication-type="journal" id="sbref0030">
<person-group person-group-type="author">
<name>
<surname>Jabs</surname>
<given-names>E.W.</given-names>
</name>
<name>
<surname>Persico</surname>
<given-names>M.G.</given-names>
</name>
</person-group>
<article-title>Characterization of human centromeric regions of specific chromosomes by means of alphoid DNA sequences</article-title>
<source>Am. J. Hum. Genet.</source>
<volume>41</volume>
<year>1987</year>
<fpage>374</fpage>
<lpage>390</lpage>
<pub-id pub-id-type="pmid">3631075</pub-id>
</element-citation>
</ref>
<ref id="bib0035">
<label>7</label>
<element-citation publication-type="journal" id="sbref0035">
<person-group person-group-type="author">
<name>
<surname>Blackburn</surname>
<given-names>E.H.</given-names>
</name>
<name>
<surname>Gall</surname>
<given-names>J.G.</given-names>
</name>
</person-group>
<article-title>A tandemly repeated sequence at the termini of the extrachromosomal ribosomal RNA genes in tetrahymena</article-title>
<source>J. Mol. Biol.</source>
<volume>120</volume>
<year>1978</year>
<fpage>33</fpage>
<lpage>53</lpage>
<pub-id pub-id-type="pmid">642006</pub-id>
</element-citation>
</ref>
<ref id="bib0040">
<label>8</label>
<element-citation publication-type="journal" id="sbref0040">
<person-group person-group-type="author">
<name>
<surname>Stewart</surname>
<given-names>J.A.</given-names>
</name>
<name>
<surname>Chaiken</surname>
<given-names>M.F.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>F.</given-names>
</name>
<name>
<surname>Price</surname>
<given-names>C.M.</given-names>
</name>
</person-group>
<article-title>Maintaining the end: roles of telomere proteins in end-protection, telomere replication and length regulation</article-title>
<source>Mutat. Res. Mol. Mech. Mutagen.</source>
<volume>730</volume>
<issue>1-2</issue>
<year>2012</year>
<fpage>12</fpage>
<lpage>19</lpage>
</element-citation>
</ref>
<ref id="bib0045">
<label>9</label>
<element-citation publication-type="journal" id="sbref0045">
<person-group person-group-type="author">
<name>
<surname>Moyzis</surname>
<given-names>R.K.</given-names>
</name>
<name>
<surname>Buckingham</surname>
<given-names>J.M.</given-names>
</name>
<name>
<surname>Cram</surname>
<given-names>L.S.</given-names>
</name>
<name>
<surname>Dani</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Deaven</surname>
<given-names>L.L.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>M.D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>J.R.</given-names>
</name>
</person-group>
<article-title>A highly conserved repetitive DNA sequence,(TTAGGG) n, present at the telomeres of human chromosomes</article-title>
<source>Proc. Natl. Acad. Sci.</source>
<volume>85</volume>
<issue>18</issue>
<year>1988</year>
<fpage>6622</fpage>
<lpage>6626</lpage>
<pub-id pub-id-type="pmid">3413114</pub-id>
</element-citation>
</ref>
<ref id="bib0050">
<label>10</label>
<element-citation publication-type="journal" id="sbref0050">
<person-group person-group-type="author">
<name>
<surname>Zakian</surname>
<given-names>V.A.</given-names>
</name>
</person-group>
<article-title>Structure and function of telomeres</article-title>
<source>Ann Rev Genet</source>
<volume>23</volume>
<year>1989</year>
<fpage>579</fpage>
<lpage>604</lpage>
<pub-id pub-id-type="pmid">2694944</pub-id>
</element-citation>
</ref>
<ref id="bib0055">
<label>11</label>
<element-citation publication-type="journal" id="sbref0055">
<person-group person-group-type="author">
<name>
<surname>Peng</surname>
<given-names>J.C.</given-names>
</name>
<name>
<surname>Karpen</surname>
<given-names>G.H.</given-names>
</name>
</person-group>
<article-title>Epigenetic regulation of heterochromatic DNA stability</article-title>
<source>Curr. Opin. Genet. Dev.</source>
<volume>18</volume>
<issue>2</issue>
<year>2008</year>
<fpage>204</fpage>
<lpage>211</lpage>
<pub-id pub-id-type="pmid">18372168</pub-id>
</element-citation>
</ref>
<ref id="bib0060">
<label>12</label>
<element-citation publication-type="journal" id="sbref0060">
<person-group person-group-type="author">
<name>
<surname>Lim</surname>
<given-names>Kian Guan</given-names>
</name>
<name>
<surname>Kwoh</surname>
<given-names>Chee Keong</given-names>
</name>
<name>
<surname>Hsu</surname>
<given-names>Li Yang</given-names>
</name>
</person-group>
<article-title>Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance</article-title>
<source>Brief. Bioinformatics</source>
<volume>14</volume>
<issue>1</issue>
<year>2012</year>
<fpage>67</fpage>
<lpage>81</lpage>
<pub-id pub-id-type="pmid">22648964</pub-id>
</element-citation>
</ref>
<ref id="bib0065">
<label>13</label>
<element-citation publication-type="journal" id="sbref0065">
<person-group person-group-type="author">
<name>
<surname>Thiel</surname>
<given-names>Teresa</given-names>
</name>
<name>
<surname>Michalek</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Varshney</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.)</article-title>
<source>Theor. Appl. Genet.</source>
<volume>106</volume>
<issue>3</issue>
<year>2003</year>
<fpage>411</fpage>
<lpage>422</lpage>
<pub-id pub-id-type="pmid">12589540</pub-id>
</element-citation>
</ref>
<ref id="bib0070">
<label>14</label>
<element-citation publication-type="journal" id="sbref0070">
<person-group person-group-type="author">
<name>
<surname>Kolpakov</surname>
<given-names>Roman</given-names>
</name>
<name>
<surname>Ghizlane</surname>
<given-names>Bana</given-names>
</name>
<name>
<surname>Kucherov</surname>
</name>
</person-group>
<article-title>Gregory. “mreps: efficient and flexible detection of tandem repeats in DNA</article-title>
<source>Nucleic Acids Res.</source>
<volume>31</volume>
<issue>13</issue>
<year>2003</year>
<fpage>3672</fpage>
<lpage>3678</lpage>
<pub-id pub-id-type="pmid">12824391</pub-id>
</element-citation>
</ref>
<ref id="bib0075">
<label>15</label>
<element-citation publication-type="book" id="sbref0075">
<person-group person-group-type="author">
<name>
<surname>Abajian</surname>
<given-names>Chris</given-names>
</name>
</person-group>
<chapter-title>Sputnik - DNA Microsatellite Repeat Search Utility</chapter-title>
<year>1994</year>
</element-citation>
</ref>
<ref id="bib0080">
<label>16</label>
<element-citation publication-type="journal" id="sbref0080">
<person-group person-group-type="author">
<name>
<surname>Sarachu</surname>
</name>
<name>
<surname>Martín et Colet, Marc</surname>
</name>
</person-group>
<article-title>wEMBOSS: a web interface for EMBOSS</article-title>
<source>Bioinformatics</source>
<volume>21</volume>
<issue>4</issue>
<year>2004</year>
<fpage>540</fpage>
<lpage>541</lpage>
<pub-id pub-id-type="pmid">15388516</pub-id>
</element-citation>
</ref>
<ref id="bib0085">
<label>17</label>
<element-citation publication-type="journal" id="sbref0085">
<person-group person-group-type="author">
<name>
<surname>Benson</surname>
<given-names>Gary</given-names>
</name>
</person-group>
<article-title>Tandem repeats finder: a program to analyze DNA sequences</article-title>
<source>Nucleic Acids Res.</source>
<volume>27</volume>
<issue>no 2</issue>
<year>1999</year>
<fpage>573</fpage>
<lpage>580</lpage>
<pub-id pub-id-type="pmid">9862982</pub-id>
</element-citation>
</ref>
<ref id="bib0090">
<label>18</label>
<element-citation publication-type="journal" id="sbref0090">
<person-group person-group-type="author">
<name>
<surname>Tarailo‐Graovac</surname>
</name>
<name>
<surname>Maja et Chen</surname>
</name>
<name>
<surname>Nansheng</surname>
</name>
</person-group>
<article-title>Using RepeatMasker to identify repetitive elements in genomic sequences</article-title>
<source>Curr. Protoc. Bioinformatics</source>
<volume>25</volume>
<issue>no 1</issue>
<year>2009</year>
<fpage>14</fpage>
<comment>p. 4.10. 1-4.10</comment>
</element-citation>
</ref>
<ref id="bib0095">
<label>19</label>
<element-citation publication-type="journal" id="sbref0095">
<person-group person-group-type="author">
<name>
<surname>Flicek</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Birney</surname>
<given-names>E.</given-names>
</name>
</person-group>
<article-title>Sense from sequence reads: methods for alignment and assembly</article-title>
<source>Nat. Methods</source>
<volume>6</volume>
<issue>11s</issue>
<year>2009</year>
<fpage>S6</fpage>
<pub-id pub-id-type="pmid">19844229</pub-id>
</element-citation>
</ref>
<ref id="bib0100">
<label>20</label>
<element-citation publication-type="journal" id="sbref0100">
<person-group person-group-type="author">
<name>
<surname>de Koning</surname>
<given-names>A.J.</given-names>
</name>
<name>
<surname>Gu</surname>
<given-names>W.</given-names>
</name>
<name>
<surname>Castoe</surname>
<given-names>T.A.</given-names>
</name>
<name>
<surname>Batzer</surname>
<given-names>M.A.</given-names>
</name>
<name>
<surname>Pollock</surname>
<given-names>D.D.</given-names>
</name>
</person-group>
<article-title>Repetitive elements may comprise over two-thirds of the human genome</article-title>
<source>PLoS Genet.</source>
<volume>7</volume>
<issue>12</issue>
<year>2011</year>
<object-id pub-id-type="publisher-id">e1002384</object-id>
</element-citation>
</ref>
<ref id="bib0105">
<label>21</label>
<element-citation publication-type="book" id="sbref0105">
<person-group person-group-type="author">
<name>
<surname>The NCBI</surname>
</name>
</person-group>
<chapter-title>GenBank Database</chapter-title>
<year>2019</year>
<comment>Available:
<ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/Genbank/" id="intr0010">http://www.ncbi.nlm.nih.gov/Genbank/</ext-link>
(Accessed 1 September 2019).</comment>
</element-citation>
</ref>
<ref id="bib0110">
<label>22</label>
<element-citation publication-type="journal" id="sbref0110">
<person-group person-group-type="author">
<name>
<surname>Venter</surname>
<given-names>J.C.</given-names>
</name>
<name>
<surname>Adams</surname>
<given-names>M.D.</given-names>
</name>
<name>
<surname>Myers</surname>
<given-names>E.W.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>P.W.</given-names>
</name>
<name>
<surname>Mural</surname>
<given-names>R.J.</given-names>
</name>
<name>
<surname>Sutton</surname>
<given-names>G.G.</given-names>
</name>
<name>
<surname>Gocayne</surname>
<given-names>J.D.</given-names>
</name>
</person-group>
<article-title>The sequence of the human genome</article-title>
<source>Science</source>
<volume>291</volume>
<issue>5507</issue>
<year>2001</year>
<fpage>1304</fpage>
<lpage>1351</lpage>
<pub-id pub-id-type="pmid">11181995</pub-id>
</element-citation>
</ref>
<ref id="bib0115">
<label>23</label>
<element-citation publication-type="journal" id="sbref0115">
<person-group person-group-type="author">
<name>
<surname>Touati</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Haddad-Boubaker</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ferchichi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Messaoudi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Ouesleti</surname>
<given-names>A.E.</given-names>
</name>
<name>
<surname>Triki</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Kharrat</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Comparative genomic signature representations of the emerging COVID-19 coronavirus and other coronaviruses: high identity and possible recombination between Bat and Pangolin coronaviruses</article-title>
<source>Genomics</source>
<volume>112</volume>
<issue>6</issue>
<year>2020</year>
<fpage>4189</fpage>
<lpage>4202</lpage>
<pub-id pub-id-type="pmid">32645523</pub-id>
</element-citation>
</ref>
<ref id="bib0120">
<label>24</label>
<element-citation publication-type="journal" id="sbref0120">
<person-group person-group-type="author">
<name>
<surname>Touati</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Oueslati</surname>
<given-names>A.E.</given-names>
</name>
<name>
<surname>Messaoudi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Lachiri</surname>
<given-names>Z.</given-names>
</name>
</person-group>
<article-title>The Helitron family classification using SVM based on Fourier transform features applied on an unbalanced dataset</article-title>
<source>Med. Biol. Eng. Comput.</source>
<volume>57</volume>
<issue>10</issue>
<year>2019</year>
<fpage>2289</fpage>
<lpage>2304</lpage>
<pub-id pub-id-type="pmid">31422557</pub-id>
</element-citation>
</ref>
<ref id="bib0125">
<label>25</label>
<element-citation publication-type="journal" id="sbref0125">
<person-group person-group-type="author">
<name>
<surname>Buchner</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Janjarasjitt</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Detection and visualization of tandem repeats in DNA sequences</article-title>
<source>Ieee Trans. Signal Process.</source>
<volume>51</volume>
<issue>9</issue>
<year>2003</year>
<fpage>2280</fpage>
<lpage>2287</lpage>
</element-citation>
</ref>
<ref id="bib0130">
<label>26</label>
<element-citation publication-type="journal" id="sbref0130">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>S.D.</given-names>
</name>
<name>
<surname>Sharma</surname>
<given-names>S.N.</given-names>
</name>
<name>
<surname>Saxena</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Identification of short exons disunited by a short intron in eukaryotic DNA regions</article-title>
<source>IEEEACM Trans. Comput. Biol. Bioinform.</source>
<year>2019</year>
</element-citation>
</ref>
<ref id="bib0135">
<label>27</label>
<element-citation publication-type="journal" id="sbref0135">
<person-group person-group-type="author">
<name>
<surname>Chechetkin</surname>
<given-names>V.R.</given-names>
</name>
<name>
<surname>Turygin</surname>
<given-names>A.Y.</given-names>
</name>
</person-group>
<article-title>Search of hidden periodicities in DNA sequences</article-title>
<source>J. Theor. Biol.</source>
<volume>175</volume>
<issue>4</issue>
<year>1995</year>
<fpage>477</fpage>
<lpage>494</lpage>
<pub-id pub-id-type="pmid">7475085</pub-id>
</element-citation>
</ref>
<ref id="bib0140">
<label>28</label>
<element-citation publication-type="journal" id="sbref0140">
<person-group person-group-type="author">
<name>
<surname>Sharma</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Issac</surname>
<given-names>B.</given-names>
</name>
<name>
<surname>Raghava</surname>
<given-names>G.P.S.</given-names>
</name>
<name>
<surname>Ramaswamy</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation</article-title>
<source>Bioinformatics</source>
<volume>20</volume>
<issue>9</issue>
<year>2004</year>
<fpage>1405</fpage>
<lpage>1412</lpage>
<pub-id pub-id-type="pmid">14976032</pub-id>
</element-citation>
</ref>
<ref id="bib0145">
<label>29</label>
<element-citation publication-type="journal" id="sbref0145">
<person-group person-group-type="author">
<name>
<surname>Touati</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Messaoudi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Oueslati</surname>
<given-names>A.E.</given-names>
</name>
<name>
<surname>Lachiri</surname>
<given-names>Z.</given-names>
</name>
</person-group>
<article-title>Helitron’s periodicities identification in C. Elegans based on the smoothed spectral analysis and the frequency Chaos game signal coding</article-title>
<source>Int J Adv Comput Sci Appl</source>
<volume>9</volume>
<issue>4</issue>
<year>2018</year>
</element-citation>
</ref>
<ref id="bib0150">
<label>30</label>
<element-citation publication-type="journal" id="sbref0150">
<person-group person-group-type="author">
<name>
<surname>Touati</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Messaoudi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Oueslati</surname>
<given-names>A.E.</given-names>
</name>
<name>
<surname>Lachiri</surname>
<given-names>Z.</given-names>
</name>
</person-group>
<article-title>A combined support vector machine-FCGS classification based on the wavelet transform for Helitrons recognition in C. elegans</article-title>
<source>Multimed. Tools Appl.</source>
<volume>78</volume>
<issue>10</issue>
<year>2019</year>
<fpage>13047</fpage>
<lpage>13066</lpage>
</element-citation>
</ref>
<ref id="bib0155">
<label>31</label>
<element-citation publication-type="journal" id="sbref0155">
<person-group person-group-type="author">
<name>
<surname>Touati</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Messaoudi</surname>
<given-names>I.</given-names>
</name>
<name>
<surname>Oueslati</surname>
<given-names>A.E.</given-names>
</name>
<name>
<surname>Lachiri</surname>
<given-names>Z.</given-names>
</name>
</person-group>
<article-title>Distinguishing between intra-genomic helitron families using time-frequency features and random forest approaches</article-title>
<source>Biomed. Signal Process. Control</source>
<volume>54</volume>
<year>2019</year>
<object-id pub-id-type="publisher-id">101579</object-id>
</element-citation>
</ref>
<ref id="bib0160">
<label>32</label>
<element-citation publication-type="journal" id="sbref0160">
<person-group person-group-type="author">
<name>
<surname>Grossmann</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Morlet</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Decomposition of Hardy functions into square integrable wavelets of constant shape</article-title>
<source>Siam J. Math. Anal.</source>
<volume>15</volume>
<issue>4</issue>
<year>1984</year>
<fpage>723</fpage>
<lpage>736</lpage>
</element-citation>
</ref>
<ref id="bib0165">
<label>33</label>
<element-citation publication-type="journal" id="sbref0165">
<person-group person-group-type="author">
<name>
<surname>Merry</surname>
<given-names>R.J.E.</given-names>
</name>
</person-group>
<article-title>Wavelet theory and applications: a literature study</article-title>
<source>DCT rapporten</source>
<volume>2005</volume>
<year>2005</year>
</element-citation>
</ref>
<ref id="bib0170">
<label>34</label>
<element-citation publication-type="journal" id="sbref0170">
<person-group person-group-type="author">
<name>
<surname>Najmi</surname>
<given-names>A.H.</given-names>
</name>
<name>
<surname>Sadowsky</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>The continuous wavelet transform and variable resolution time-frequency analysis</article-title>
<source>Johns Hopkins APL Tech. Dig.</source>
<volume>18</volume>
<issue>1</issue>
<year>1997</year>
<fpage>134</fpage>
<lpage>140</lpage>
</element-citation>
</ref>
<ref id="bib0175">
<label>35</label>
<element-citation publication-type="journal" id="sbref0175">
<person-group person-group-type="author">
<name>
<surname>Kumar</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Saxena</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Algorithm and technique on various edge detection: a survey</article-title>
<source>Signal & Image Processing</source>
<volume>4</volume>
<issue>3</issue>
<year>2013</year>
<fpage>65</fpage>
</element-citation>
</ref>
<ref id="bib0180">
<label>36</label>
<element-citation publication-type="book" id="sbref0180">
<person-group person-group-type="author">
<name>
<surname>Sahni</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Mittal</surname>
<given-names>N.</given-names>
</name>
</person-group>
<chapter-title>Breast cancer detection using image processing techniques</chapter-title>
<source>Advances in Interdisciplinary Engineering</source>
<year>2019</year>
<publisher-name>Springer</publisher-name>
<publisher-loc>Singapore</publisher-loc>
<fpage>813</fpage>
<lpage>823</lpage>
</element-citation>
</ref>
<ref id="bib0185">
<label>37</label>
<element-citation publication-type="journal" id="sbref0185">
<person-group person-group-type="author">
<name>
<surname>Canny</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>A computational approach to edge detection</article-title>
<source>IEEE Trans. Pattern Anal. Mach. Intell.</source>
<volume>6</volume>
<year>1986</year>
<fpage>679</fpage>
<lpage>698</lpage>
</element-citation>
</ref>
<ref id="bib0190">
<label>38</label>
<element-citation publication-type="journal" id="sbref0190">
<person-group person-group-type="author">
<name>
<surname>Bao</surname>
<given-names>P.</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
</person-group>
<article-title>Canny edge detection enhancement by scale multiplication</article-title>
<source>IEEE Trans. Pattern Anal. Mach. Intell.</source>
<volume>27</volume>
<issue>9</issue>
<year>2005</year>
<fpage>1485</fpage>
<lpage>1490</lpage>
<pub-id pub-id-type="pmid">16173190</pub-id>
</element-citation>
</ref>
<ref id="bib0195">
<label>39</label>
<element-citation publication-type="book" id="sbref0195">
<person-group person-group-type="author">
<name>
<surname>Soille</surname>
<given-names>P.</given-names>
</name>
</person-group>
<chapter-title>Morphological Image Analysis: Principles and Applications</chapter-title>
<year>2013</year>
<publisher-name>Springer Science & Business Media</publisher-name>
</element-citation>
</ref>
<ref id="bib0200">
<label>40</label>
<element-citation publication-type="journal" id="sbref0200">
<person-group person-group-type="author">
<name>
<surname>Kent</surname>
<given-names>W.J.</given-names>
</name>
</person-group>
<article-title>BLAT—the BLAST-like alignment tool</article-title>
<source>Genome Res.</source>
<volume>12</volume>
<issue>4</issue>
<year>2002</year>
<fpage>656</fpage>
<lpage>664</lpage>
<comment>Available 2019,
<ext-link ext-link-type="uri" xlink:href="https://genome.ucsc.edu" id="intr0015">https://genome.ucsc.edu</ext-link>
/ (Accessed 2019).</comment>
<pub-id pub-id-type="pmid">11932250</pub-id>
</element-citation>
</ref>
<ref id="bib0205">
<label>41</label>
<element-citation publication-type="journal" id="sbref0205">
<person-group person-group-type="author">
<name>
<surname>Wheeler</surname>
<given-names>T.J.</given-names>
</name>
<name>
<surname>Clements</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Eddy</surname>
<given-names>S.R.</given-names>
</name>
<name>
<surname>Hubley</surname>
<given-names>R.</given-names>
</name>
<name>
<surname>Jones</surname>
<given-names>T.A.</given-names>
</name>
<name>
<surname>Jurka</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Finn</surname>
<given-names>R.D.</given-names>
</name>
</person-group>
<article-title>Dfam: a database of repetitive DNA based on profile hidden Markov models</article-title>
<source>Nucleic Acids Res.</source>
<volume>41</volume>
<issue>D1</issue>
<year>2012</year>
<fpage>D70</fpage>
<lpage>D82</lpage>
<comment>(Accessed 2019)</comment>
<ext-link ext-link-type="uri" xlink:href="http://www.dfam.org/home" id="intr0020">http://www.dfam.org/home</ext-link>
<pub-id pub-id-type="pmid">23203985</pub-id>
</element-citation>
</ref>
<ref id="bib0210">
<label>42</label>
<element-citation publication-type="journal" id="sbref0210">
<person-group person-group-type="author">
<name>
<surname>LeCun</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Bottou</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Bengio</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Haffner</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Gradient-based learning applied to document recognition</article-title>
<source>Proc. Ieee</source>
<volume>86</volume>
<issue>11</issue>
<year>1998</year>
<fpage>2278</fpage>
<lpage>2324</lpage>
</element-citation>
</ref>
<ref id="bib0215">
<label>43</label>
<element-citation publication-type="journal" id="sbref0215">
<person-group person-group-type="author">
<name>
<surname>Abd–Alhalem</surname>
<given-names>S.M.</given-names>
</name>
<name>
<surname>Soliman</surname>
<given-names>N.F.</given-names>
</name>
<name>
<surname>Eldin</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Abd Elrahman</surname>
<given-names>S.E.</given-names>
</name>
<name>
<surname>Ismail</surname>
<given-names>N.A.</given-names>
</name>
<name>
<surname>El-Rabaie</surname>
<given-names>E.S.M.</given-names>
</name>
<name>
<surname>El-Samie</surname>
<given-names>F.E.A.</given-names>
</name>
</person-group>
<article-title>Bacterial classification with convolutional neural networks based on different data reduction layers</article-title>
<source>Nucleosides Nucleotides Nucleic Acids</source>
<year>2019</year>
<fpage>1</fpage>
<lpage>11</lpage>
<pub-id pub-id-type="pmid">30587086</pub-id>
</element-citation>
</ref>
<ref id="bib0220">
<label>44</label>
<element-citation publication-type="journal" id="sbref0220">
<person-group person-group-type="author">
<name>
<surname>Zeng</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Edwards</surname>
<given-names>M.D.</given-names>
</name>
<name>
<surname>Liu</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Gifford</surname>
<given-names>D.K.</given-names>
</name>
</person-group>
<article-title>Convolutional neural network architectures for predicting DNA–protein binding</article-title>
<source>Bioinformatics</source>
<volume>32</volume>
<issue>12</issue>
<year>2016</year>
<fpage>i121</fpage>
<lpage>i127</lpage>
<pub-id pub-id-type="pmid">27307608</pub-id>
</element-citation>
</ref>
<ref id="bib0225">
<label>45</label>
<element-citation publication-type="journal" id="sbref0225">
<person-group person-group-type="author">
<name>
<surname>Al-Ajlan</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>El Allali</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>CNN-MGP: convolutional neural networks for metagenomics gene prediction</article-title>
<source>Interdiscip. Sci.</source>
<volume>11</volume>
<issue>4</issue>
<year>2019</year>
<fpage>628</fpage>
<lpage>635</lpage>
<pub-id pub-id-type="pmid">30588558</pub-id>
</element-citation>
</ref>
<ref id="bib0230">
<label>46</label>
<element-citation publication-type="journal" id="sbref0230">
<person-group person-group-type="author">
<name>
<surname>Elbashir</surname>
<given-names>M.K.</given-names>
</name>
<name>
<surname>Ezz</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Mohammed</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Saloum</surname>
<given-names>S.S.</given-names>
</name>
</person-group>
<article-title>Lightweight convolutional neural network for breast Cancer classification using RNA-Seq gene expression data</article-title>
<source>IEEE Access</source>
<volume>7</volume>
<year>2019</year>
<fpage>185338</fpage>
<lpage>185348</lpage>
</element-citation>
</ref>
<ref id="bib0235">
<label>47</label>
<element-citation publication-type="journal" id="sbref0235">
<person-group person-group-type="author">
<name>
<surname>Zhou</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>L.Y.</given-names>
</name>
<name>
<surname>Dou</surname>
<given-names>Q.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>H.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>C.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.J.</given-names>
</name>
<name>
<surname>Heng</surname>
<given-names>P.A.</given-names>
</name>
</person-group>
<article-title>Weakly supervised 3D deep learning for breast cancer classification and localization of the lesions in MR images</article-title>
<source>J. Magn. Reson. Imaging</source>
<volume>50</volume>
<issue>4</issue>
<year>2019</year>
<fpage>1144</fpage>
<lpage>1151</lpage>
<pub-id pub-id-type="pmid">30924997</pub-id>
</element-citation>
</ref>
<ref id="bib0240">
<label>48</label>
<element-citation publication-type="journal" id="sbref0240">
<person-group person-group-type="author">
<name>
<surname>Ghoneim</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Muhammad</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Hossain</surname>
<given-names>M.S.</given-names>
</name>
</person-group>
<article-title>Cervical cancer classification using convolutional neural networks and extreme learning machines</article-title>
<source>Future Gener. Comput. Syst.</source>
<volume>102</volume>
<year>2020</year>
<fpage>643</fpage>
<lpage>649</lpage>
</element-citation>
</ref>
<ref id="bib0245">
<label>49</label>
<element-citation publication-type="journal" id="sbref0245">
<person-group person-group-type="author">
<name>
<surname>Porumb</surname>
<given-names>M.</given-names>
</name>
<name>
<surname>Iadanza</surname>
<given-names>E.</given-names>
</name>
<name>
<surname>Massaro</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Pecchia</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>A convolutional neural network approach to detect congestive heart failure</article-title>
<source>Biomed. Signal Process. Control</source>
<volume>55</volume>
<year>2020</year>
<object-id pub-id-type="publisher-id">101597</object-id>
</element-citation>
</ref>
<ref id="bib0250">
<label>50</label>
<element-citation publication-type="journal" id="sbref0250">
<person-group person-group-type="author">
<name>
<surname>Mukhopadhyay</surname>
<given-names>A.K.</given-names>
</name>
<name>
<surname>Samui</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>An experimental study on upper limb position invariant EMG signal classification based on deep neural network</article-title>
<source>Biomed. Signal Process. Control</source>
<volume>55</volume>
<year>2020</year>
<object-id pub-id-type="publisher-id">101669</object-id>
</element-citation>
</ref>
<ref id="bib0255">
<label>51</label>
<element-citation publication-type="journal" id="sbref0255">
<person-group person-group-type="author">
<name>
<surname>Kundu</surname>
<given-names>S.</given-names>
</name>
<name>
<surname>Ari</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>P300 based character recognition using convolutional neural network and support vector machine</article-title>
<source>Biomed. Signal Process. Control</source>
<volume>55</volume>
<year>2020</year>
<object-id pub-id-type="publisher-id">101645</object-id>
</element-citation>
</ref>
<ref id="bib0260">
<label>52</label>
<element-citation publication-type="journal" id="sbref0260">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>J.</given-names>
</name>
<name>
<surname>Fan</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Jian</surname>
<given-names>Z.</given-names>
</name>
<name>
<surname>Chen</surname>
<given-names>G.G.</given-names>
</name>
<name>
<surname>Lai</surname>
<given-names>P.B.</given-names>
</name>
</person-group>
<article-title>Cancer specific long noncoding RNAs show differential expression patterns and competing endogenous RNA potential in hepatocellular carcinoma</article-title>
<source>PLoS One</source>
<volume>10</volume>
<issue>10</issue>
<year>2015</year>
<object-id pub-id-type="publisher-id">e0141042</object-id>
</element-citation>
</ref>
<ref id="bib0265">
<label>53</label>
<element-citation publication-type="book" id="sbref0265">
<person-group person-group-type="author">
<name>
<surname>Kobayashi</surname>
<given-names>T.</given-names>
</name>
</person-group>
<chapter-title>Genome instability of repetitive sequence: lesson from the ribosomal RNA gene repeat</chapter-title>
<source>In DNA Replication, Recombination, and Repair</source>
<year>2016</year>
<publisher-name>Springer</publisher-name>
<publisher-loc>Tokyo</publisher-loc>
<fpage>235</fpage>
<lpage>247</lpage>
</element-citation>
</ref>
<ref id="bib0270">
<label>54</label>
<element-citation publication-type="journal" id="sbref0270">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Luo</surname>
<given-names>Y.</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>G.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
<name>
<surname>Na</surname>
<given-names>D.</given-names>
</name>
<name>
<surname>Wu</surname>
<given-names>X.</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>L.</given-names>
</name>
</person-group>
<article-title>DHRSX, a novel non-classical secretory protein associated with starvation induced autophagy</article-title>
<source>Int. J. Med. Sci.</source>
<volume>11</volume>
<issue>9</issue>
<year>2014</year>
<fpage>962</fpage>
<pub-id pub-id-type="pmid">25076851</pub-id>
</element-citation>
</ref>
<ref id="bib0275">
<label>55</label>
<element-citation publication-type="journal" id="sbref0275">
<person-group person-group-type="author">
<name>
<surname>Brown</surname>
<given-names>W.R.</given-names>
</name>
<name>
<surname>MacKinnon</surname>
<given-names>P.J.</given-names>
</name>
<name>
<surname>Villasanté</surname>
<given-names>A.</given-names>
</name>
<name>
<surname>Spurr</surname>
<given-names>N.</given-names>
</name>
<name>
<surname>Buckle</surname>
<given-names>V.J.</given-names>
</name>
<name>
<surname>Dobson</surname>
<given-names>M.J.</given-names>
</name>
</person-group>
<article-title>Structure and polymorphism of human telomere-associated DNA</article-title>
<source>Cell</source>
<volume>63</volume>
<issue>1</issue>
<year>1990</year>
<fpage>119</fpage>
<lpage>132</lpage>
<pub-id pub-id-type="pmid">2208276</pub-id>
</element-citation>
</ref>
</ref-list>
<bio>
<graphic xlink:href="fx2_lrg"></graphic>
<p>
<bold>Rabeb. Touati</bold>
: PhD, master and engineer in electrical engineering from the National Engineering School of Tunisia (ENIT). Currently, she has a Post-Doctoral position at the Laboratory of Human Genetics (LR99ES10) at the Faculty of Medicine of Tunis. Her research interest includes biomedical, genomic signal and image processing, bioinformatics, pattern recognition and machine learning.</p>
</bio>
<bio>
<graphic xlink:href="fx3_lrg"></graphic>
<p>
<bold>Asma Tajouri</bold>
: PhD in Human Genetics from the Faculty of Medicine of Tunis. She has a Post-Doctoral position at the Laboratory of Human Genetics (LR99ES10) at the Faculty of Medicine of Tunis. Her research interests include Human Genetics.</p>
</bio>
<bio>
<graphic xlink:href="fx4_lrg"></graphic>
<p>
<bold>Imen. Messaoudi</bold>
: Received her PhD degree in electrical engineering from the National Engineering School of Tunisia. She is Assistant professor at the Higher Institute of Information Technologies and Communications from Carthage University. Her research interest includes biomedical and genomic signal processing.</p>
</bio>
<bio>
<graphic xlink:href="fx5_lrg"></graphic>
<p>
<bold>Afef. Elloumi Oueslati</bold>
: PhD in electrical engineering from the National Engineering School of Tunisia (ENIT). She is Associate Professor at the National School of Engineers of Carthage (ENICarthage). Her research interest includes issues related to signal and image processing applied in the biomedical and genomic fields.</p>
</bio>
<bio>
<graphic xlink:href="fx6_lrg"></graphic>
<p>
<bold>Zied. Lachiri</bold>
: PhD in electrical engineering from the National Engineering School of Tunisia (ENIT).He is Professor and Research Director in the Signal, Image and Information Technology laboratory (LR-SITI, ENIT). His research interests include pattern recognition, and signal and image processing in biomedical, multimedia, and man-machine communication</p>
</bio>
<bio>
<graphic xlink:href="fx7_lrg"></graphic>
<p>
<bold>Maher. Kharrat</bold>
: PhD in Human Genetics from the Faculty of Medicine of Tunis (FMT). He is Associate Professor and Research Director in the Genetic Human laboratory (LR99ES10) at the Faculty of Medicine of Tunis (FMT). He currently works at the Faculty of Medicine, University of Tunis El Manar. Dr. Maher does research in the field of Human Genetics.</p>
</bio>
<sec id="sec0090" sec-type="supplementary-material">
<label>Appendix A</label>
<title>Supplementary data</title>
<p id="par0555">The following are Supplementary data to this article:
<supplementary-material content-type="local-data" id="upi0005">
<media xlink:href="mmc1.zip"></media>
</supplementary-material>
<supplementary-material content-type="local-data" id="upi0010">
<media xlink:href="mmc2.xlsx"></media>
</supplementary-material>
</p>
</sec>
<ack id="ack0005">
<title>Acknowledgment</title>
<p id="par0545">This study was founded by the
<funding-source id="gs0005">
<institution-wrap>
<institution-id institution-id-type="doi">10.13039/501100004562</institution-id>
<institution>Ministry of Higher Education and Research</institution>
</institution-wrap>
</funding-source>
, LR99ES10 Human Genetics Laboratory.</p>
</ack>
<fn-group>
<fn id="sec0085" fn-type="supplementary-material">
<label>Appendix A</label>
<p id="par0550">Supplementary material related to this article can be found, in the online version, at
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.bspc.2020.102207" id="intr0005">https://doi.org/10.1016/j.bspc.2020.102207</ext-link>
.</p>
</fn>
</fn-group>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/MaghrebDataLibMedV2/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0001689 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0001689 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sante
   |area=    MaghrebDataLibMedV2
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.38.
Data generation: Wed Jun 30 18:27:05 2021. Site generation: Wed Jun 30 18:34:21 2021