Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 000F580 ( Pmc/Corpus ); précédent : 000F579; suivant : 000F581 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads</title>
<author>
<name sortKey="Novak, Petr" sort="Novak, Petr" uniqKey="Novak P" first="Petr" last="Novák">Petr Novák</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Avila Robledillo, Laura" sort="Avila Robledillo, Laura" uniqKey="Avila Robledillo L" first="Laura" last="Ávila Robledillo">Laura Ávila Robledillo</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Koblizkova, Andrea" sort="Koblizkova, Andrea" uniqKey="Koblizkova A" first="Andrea" last="Koblížková">Andrea Koblížková</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vrbova, Iva" sort="Vrbova, Iva" uniqKey="Vrbova I" first="Iva" last="Vrbová">Iva Vrbová</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Neumann, Pavel" sort="Neumann, Pavel" uniqKey="Neumann P" first="Pavel" last="Neumann">Pavel Neumann</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Macas, Ji" sort="Macas, Ji" uniqKey="Macas J" first="Ji" last="Macas">Ji Macas</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">28402514</idno>
<idno type="pmc">5499541</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499541</idno>
<idno type="RBID">PMC:5499541</idno>
<idno type="doi">10.1093/nar/gkx257</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000F58</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000F58</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads</title>
<author>
<name sortKey="Novak, Petr" sort="Novak, Petr" uniqKey="Novak P" first="Petr" last="Novák">Petr Novák</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Avila Robledillo, Laura" sort="Avila Robledillo, Laura" uniqKey="Avila Robledillo L" first="Laura" last="Ávila Robledillo">Laura Ávila Robledillo</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Koblizkova, Andrea" sort="Koblizkova, Andrea" uniqKey="Koblizkova A" first="Andrea" last="Koblížková">Andrea Koblížková</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Vrbova, Iva" sort="Vrbova, Iva" uniqKey="Vrbova I" first="Iva" last="Vrbová">Iva Vrbová</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Neumann, Pavel" sort="Neumann, Pavel" uniqKey="Neumann P" first="Pavel" last="Neumann">Pavel Neumann</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Macas, Ji" sort="Macas, Ji" uniqKey="Macas J" first="Ji" last="Macas">Ji Macas</name>
<affiliation>
<nlm:aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Nucleic Acids Research</title>
<idno type="ISSN">0305-1048</idno>
<idno type="eISSN">1362-4962</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<p>Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent
<italic>k</italic>
-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of
<italic>Vicia faba</italic>
and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence
<italic>in situ</italic>
hybridization.</p>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Meszaros, T" uniqKey="Meszaros T">T. Mészáros</name>
</author>
<author>
<name sortKey="Nouzova, M" uniqKey="Nouzova M">M. Nouzová</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garrido Ramos, M A" uniqKey="Garrido Ramos M">M.A. Garrido-Ramos</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Plohl, M" uniqKey="Plohl M">M. Plohl</name>
</author>
<author>
<name sortKey="Luchetti, A" uniqKey="Luchetti A">A. Luchetti</name>
</author>
<author>
<name sortKey="Mestrovi, N" uniqKey="Mestrovi N">N. Meštrović</name>
</author>
<author>
<name sortKey="Mantovani, B" uniqKey="Mantovani B">B. Mantovani</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ellegren, H" uniqKey="Ellegren H">H. Ellegren</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Richard, G F" uniqKey="Richard G">G.-F. Richard</name>
</author>
<author>
<name sortKey="Kerrest, A" uniqKey="Kerrest A">A. Kerrest</name>
</author>
<author>
<name sortKey="Dujon, B" uniqKey="Dujon B">B. Dujon</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Plohl, M" uniqKey="Plohl M">M. Plohl</name>
</author>
<author>
<name sortKey="Mestrovi, N" uniqKey="Mestrovi N">N. Meštrović</name>
</author>
<author>
<name sortKey="Mravinac, B" uniqKey="Mravinac B">B. Mravinac</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fuchs, J" uniqKey="Fuchs J">J. Fuchs</name>
</author>
<author>
<name sortKey="Strehl, S" uniqKey="Strehl S">S. Strehl</name>
</author>
<author>
<name sortKey="Brandes, A" uniqKey="Brandes A">A. Brandes</name>
</author>
<author>
<name sortKey="Schweizer, D" uniqKey="Schweizer D">D. Schweizer</name>
</author>
<author>
<name sortKey="Schubert, I" uniqKey="Schubert I">I. Schubert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Pozarkova, D" uniqKey="Pozarkova D">D. Požárková</name>
</author>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
<author>
<name sortKey="Nouzova, M" uniqKey="Nouzova M">M. Nouzová</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cai, Z" uniqKey="Cai Z">Z. Cai</name>
</author>
<author>
<name sortKey="Liu, H" uniqKey="Liu H">H. Liu</name>
</author>
<author>
<name sortKey="He, Q" uniqKey="He Q">Q. He</name>
</author>
<author>
<name sortKey="Pu, M" uniqKey="Pu M">M. Pu</name>
</author>
<author>
<name sortKey="Chen, J" uniqKey="Chen J">J. Chen</name>
</author>
<author>
<name sortKey="Lai, J" uniqKey="Lai J">J. Lai</name>
</author>
<author>
<name sortKey="Li, X" uniqKey="Li X">X. Li</name>
</author>
<author>
<name sortKey="Jin, W" uniqKey="Jin W">W. Jin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kit, S" uniqKey="Kit S">S. Kit</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hemleben, V" uniqKey="Hemleben V">V. Hemleben</name>
</author>
<author>
<name sortKey="Kova K, A" uniqKey="Kova K A">A. Kovařík</name>
</author>
<author>
<name sortKey="Torres Ruiz, R A" uniqKey="Torres Ruiz R">R.A. Torres-Ruiz</name>
</author>
<author>
<name sortKey="Volkov, R A" uniqKey="Volkov R">R.A. Volkov</name>
</author>
<author>
<name sortKey="Beridze, T" uniqKey="Beridze T">T. Beridze</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Benson, G" uniqKey="Benson G">G. Benson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Glun I, M" uniqKey="Glun I M">M. Glunčić</name>
</author>
<author>
<name sortKey="Paar, V" uniqKey="Paar V">V. Paar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Herzel, H" uniqKey="Herzel H">H. Herzel</name>
</author>
<author>
<name sortKey="Weiss, O" uniqKey="Weiss O">O. Weiss</name>
</author>
<author>
<name sortKey="Trifonov, E N" uniqKey="Trifonov E">E.N. Trifonov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
<author>
<name sortKey="Koblizkova, A" uniqKey="Koblizkova A">A. Koblížková</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sharma, D" uniqKey="Sharma D">D. Sharma</name>
</author>
<author>
<name sortKey="Issac, B" uniqKey="Issac B">B. Issac</name>
</author>
<author>
<name sortKey="Raghava, G P S" uniqKey="Raghava G">G.P.S. Raghava</name>
</author>
<author>
<name sortKey="Ramaswamy, R" uniqKey="Ramaswamy R">R. Ramaswamy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Treangen, T J" uniqKey="Treangen T">T.J. Treangen</name>
</author>
<author>
<name sortKey="Salzberg, S L" uniqKey="Salzberg S">S.L. Salzberg</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Pech, J" uniqKey="Pech J">J. Pech</name>
</author>
<author>
<name sortKey="Steinhaisl, J" uniqKey="Steinhaisl J">J. Steinhaisl</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Weiss Schneeweiss, H" uniqKey="Weiss Schneeweiss H">H. Weiss-Schneeweiss</name>
</author>
<author>
<name sortKey="Leitch, A R" uniqKey="Leitch A">A.R. Leitch</name>
</author>
<author>
<name sortKey="Mccann, J" uniqKey="Mccann J">J. McCann</name>
</author>
<author>
<name sortKey="Jang, T S" uniqKey="Jang T">T.-S. Jang</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pagan, H J T" uniqKey="Pagan H">H.J.T. Pagan</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Mcculloch, E S" uniqKey="Mcculloch E">E.S. McCulloch</name>
</author>
<author>
<name sortKey="Stevens, R D" uniqKey="Stevens R">R.D. Stevens</name>
</author>
<author>
<name sortKey="Ray, D A" uniqKey="Ray D">D.A. Ray</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Garcia, G" uniqKey="Garcia G">G. García</name>
</author>
<author>
<name sortKey="Rios, N" uniqKey="Rios N">N. Ríos</name>
</author>
<author>
<name sortKey="Gutierrez, V" uniqKey="Gutierrez V">V. Gutiérrez</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Camacho, J P M" uniqKey="Camacho J">J.P.M. Camacho</name>
</author>
<author>
<name sortKey="Ruiz Ruano, F J" uniqKey="Ruiz Ruano F">F.J. Ruiz-Ruano</name>
</author>
<author>
<name sortKey="Martin Blazquez, R" uniqKey="Martin Blazquez R">R. Martín-Blázquez</name>
</author>
<author>
<name sortKey="L Pez Le N, M D" uniqKey="L Pez Le N M">M.D. López-León</name>
</author>
<author>
<name sortKey="Cabrero, J" uniqKey="Cabrero J">J. Cabrero</name>
</author>
<author>
<name sortKey="Lorite, P" uniqKey="Lorite P">P. Lorite</name>
</author>
<author>
<name sortKey="Cabral De Mello, D C" uniqKey="Cabral De Mello D">D.C. Cabral-de-Mello</name>
</author>
<author>
<name sortKey="Bakkali, M" uniqKey="Bakkali M">M. Bakkali</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
<author>
<name sortKey="Schroeder Reiter, E" uniqKey="Schroeder Reiter E">E. Schroeder-Reiter</name>
</author>
<author>
<name sortKey="Koblizkova, A" uniqKey="Koblizkova A">A. Koblížková</name>
</author>
<author>
<name sortKey="Steinbauerova, V" uniqKey="Steinbauerova V">V. Steinbauerová</name>
</author>
<author>
<name sortKey="Chocholova, E" uniqKey="Chocholova E">E. Chocholová</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Wanner, G" uniqKey="Wanner G">G. Wanner</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Marques, A" uniqKey="Marques A">A. Marques</name>
</author>
<author>
<name sortKey="Ribeiro, T" uniqKey="Ribeiro T">T. Ribeiro</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Schubert, V" uniqKey="Schubert V">V. Schubert</name>
</author>
<author>
<name sortKey="Pellino, M" uniqKey="Pellino M">M. Pellino</name>
</author>
<author>
<name sortKey="Fuchs, J" uniqKey="Fuchs J">J. Fuchs</name>
</author>
<author>
<name sortKey="Ma, W" uniqKey="Ma W">W. Ma</name>
</author>
<author>
<name sortKey="Kuhlmann, M" uniqKey="Kuhlmann M">M. Kuhlmann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Heckmann, S" uniqKey="Heckmann S">S. Heckmann</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Kumke, K" uniqKey="Kumke K">K. Kumke</name>
</author>
<author>
<name sortKey="Fuchs, J" uniqKey="Fuchs J">J. Fuchs</name>
</author>
<author>
<name sortKey="Schubert, V" uniqKey="Schubert V">V. Schubert</name>
</author>
<author>
<name sortKey="Ma, L" uniqKey="Ma L">L. Ma</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Taudien, S" uniqKey="Taudien S">S. Taudien</name>
</author>
<author>
<name sortKey="Platzer, M" uniqKey="Platzer M">M. Platzer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ruiz Ruano, F J" uniqKey="Ruiz Ruano F">F.J. Ruiz-Ruano</name>
</author>
<author>
<name sortKey="L Pez Le N, M D" uniqKey="L Pez Le N M">M.D. López-León</name>
</author>
<author>
<name sortKey="Cabrero, J" uniqKey="Cabrero J">J. Cabrero</name>
</author>
<author>
<name sortKey="Camacho, J P M" uniqKey="Camacho J">J.P.M. Camacho</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Kejnovsk, E" uniqKey="Kejnovsk E">E. Kejnovský</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Koblizkova, A" uniqKey="Koblizkova A">A. Koblížková</name>
</author>
<author>
<name sortKey="Vyskot, B" uniqKey="Vyskot B">B. Vyskot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Pellicer, J" uniqKey="Pellicer J">J. Pellicer</name>
</author>
<author>
<name sortKey=" Kova, J" uniqKey=" Kova J">J. Čížková</name>
</author>
<author>
<name sortKey="Koblizkova, A" uniqKey="Koblizkova A">A. Koblížková</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Fukova, I" uniqKey="Fukova I">I. Fuková</name>
</author>
<author>
<name sortKey="Dolezel, J" uniqKey="Dolezel J">J. Doležel</name>
</author>
<author>
<name sortKey="Kelly, L J" uniqKey="Kelly L">L.J. Kelly</name>
</author>
<author>
<name sortKey="Leitch, I J" uniqKey="Leitch I">I.J. Leitch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Renny Byfield, S" uniqKey="Renny Byfield S">S. Renny-Byfield</name>
</author>
<author>
<name sortKey="Kova K, A" uniqKey="Kova K A">A. Kovařík</name>
</author>
<author>
<name sortKey="Chester, M" uniqKey="Chester M">M. Chester</name>
</author>
<author>
<name sortKey="Nichols, R A" uniqKey="Nichols R">R.A. Nichols</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Leitch, A R" uniqKey="Leitch A">A.R. Leitch</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Jiang, J" uniqKey="Jiang J">J. Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Torres, G A" uniqKey="Torres G">G.A. Torres</name>
</author>
<author>
<name sortKey="Gong, Z" uniqKey="Gong Z">Z. Gong</name>
</author>
<author>
<name sortKey="Iovene, M" uniqKey="Iovene M">M. Iovene</name>
</author>
<author>
<name sortKey="Hirsch, C D" uniqKey="Hirsch C">C.D. Hirsch</name>
</author>
<author>
<name sortKey="Buell, C R" uniqKey="Buell C">C.R. Buell</name>
</author>
<author>
<name sortKey="Bryan, G J" uniqKey="Bryan G">G.J. Bryan</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Jiang, J" uniqKey="Jiang J">J. Jiang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Blondel, V D" uniqKey="Blondel V">V.D. Blondel</name>
</author>
<author>
<name sortKey="Guillaume, J L" uniqKey="Guillaume J">J.-L. Guillaume</name>
</author>
<author>
<name sortKey="Lambiotte, R" uniqKey="Lambiotte R">R. Lambiotte</name>
</author>
<author>
<name sortKey="Lefebvre, E" uniqKey="Lefebvre E">E. Lefebvre</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wilson, R J" uniqKey="Wilson R">R.J. Wilson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zaslavsky, T" uniqKey="Zaslavsky T">T. Zaslavsky</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fraley, C" uniqKey="Fraley C">C. Fraley</name>
</author>
<author>
<name sortKey="Raftery, A E" uniqKey="Raftery A">A.E. Raftery</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Havecker, E R" uniqKey="Havecker E">E.R. Havecker</name>
</author>
<author>
<name sortKey="Gao, X" uniqKey="Gao X">X. Gao</name>
</author>
<author>
<name sortKey="Voytas, D F" uniqKey="Voytas D">D.F. Voytas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Csardi, G" uniqKey="Csardi G">G. Csardi</name>
</author>
<author>
<name sortKey="Nepusz, T" uniqKey="Nepusz T">T. Nepusz</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Afgan, E" uniqKey="Afgan E">E. Afgan</name>
</author>
<author>
<name sortKey="Baker, D" uniqKey="Baker D">D. Baker</name>
</author>
<author>
<name sortKey="Van Den Beek, M" uniqKey="Van Den Beek M">M. van den Beek</name>
</author>
<author>
<name sortKey="Blankenberg, D" uniqKey="Blankenberg D">D. Blankenberg</name>
</author>
<author>
<name sortKey="Bouvier, D" uniqKey="Bouvier D">D. Bouvier</name>
</author>
<author>
<name sortKey=" Ech, M" uniqKey=" Ech M">M. Čech</name>
</author>
<author>
<name sortKey="Chilton, J" uniqKey="Chilton J">J. Chilton</name>
</author>
<author>
<name sortKey="Clements, D" uniqKey="Clements D">D. Clements</name>
</author>
<author>
<name sortKey="Coraor, N" uniqKey="Coraor N">N. Coraor</name>
</author>
<author>
<name sortKey="Eberhard, C" uniqKey="Eberhard C">C. Eberhard</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kato, A" uniqKey="Kato A">A. Kato</name>
</author>
<author>
<name sortKey="Albert, P" uniqKey="Albert P">P. Albert</name>
</author>
<author>
<name sortKey="Vega, J" uniqKey="Vega J">J. Vega</name>
</author>
<author>
<name sortKey="Birchler, J" uniqKey="Birchler J">J. Birchler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kato, A" uniqKey="Kato A">A. Kato</name>
</author>
<author>
<name sortKey="Yakura, K" uniqKey="Yakura K">K. Yakura</name>
</author>
<author>
<name sortKey="Tanifuji, S" uniqKey="Tanifuji S">S. Tanifuji</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fuchs, J" uniqKey="Fuchs J">J. Fuchs</name>
</author>
<author>
<name sortKey="Pich, U" uniqKey="Pich U">U. Pich</name>
</author>
<author>
<name sortKey="Meister, A" uniqKey="Meister A">A. Meister</name>
</author>
<author>
<name sortKey="Schubert, I" uniqKey="Schubert I">I. Schubert</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ananiev, E V" uniqKey="Ananiev E">E.V. Ananiev</name>
</author>
<author>
<name sortKey="Phillips, R L" uniqKey="Phillips R">R.L. Phillips</name>
</author>
<author>
<name sortKey="Rines, H W" uniqKey="Rines H">H.W. Rines</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ananiev, E V" uniqKey="Ananiev E">E.V. Ananiev</name>
</author>
<author>
<name sortKey="Phillips, R L" uniqKey="Phillips R">R.L. Phillips</name>
</author>
<author>
<name sortKey="Rines, H W" uniqKey="Rines H">H.W. Rines</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ananiev, E V" uniqKey="Ananiev E">E.V. Ananiev</name>
</author>
<author>
<name sortKey="Phillips, R L" uniqKey="Phillips R">R.L. Phillips</name>
</author>
<author>
<name sortKey="Rines, H W" uniqKey="Rines H">H.W. Rines</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Maggini, F" uniqKey="Maggini F">F. Maggini</name>
</author>
<author>
<name sortKey="Cremonini, R" uniqKey="Cremonini R">R. Cremonini</name>
</author>
<author>
<name sortKey="Zolfino, C" uniqKey="Zolfino C">C. Zolfino</name>
</author>
<author>
<name sortKey="Tucci, G F" uniqKey="Tucci G">G.F. Tucci</name>
</author>
<author>
<name sortKey="D Vidio, R" uniqKey="D Vidio R">R. D’Ovidio</name>
</author>
<author>
<name sortKey="Delre, V" uniqKey="Delre V">V. Delre</name>
</author>
<author>
<name sortKey="Depace, C" uniqKey="Depace C">C. DePace</name>
</author>
<author>
<name sortKey="Scarascia Mugnozza, G T" uniqKey="Scarascia Mugnozza G">G.T. Scarascia Mugnozza</name>
</author>
<author>
<name sortKey="Cionini, P G" uniqKey="Cionini P">P.G. Cionini</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Koblizkova, A" uniqKey="Koblizkova A">A. Koblížková</name>
</author>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schaper, E" uniqKey="Schaper E">E. Schaper</name>
</author>
<author>
<name sortKey="Kajava, A V" uniqKey="Kajava A">A. V. Kajava</name>
</author>
<author>
<name sortKey="Hauser, A" uniqKey="Hauser A">A. Hauser</name>
</author>
<author>
<name sortKey="Anisimova, M" uniqKey="Anisimova M">M. Anisimova</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lim, K G" uniqKey="Lim K">K.G. Lim</name>
</author>
<author>
<name sortKey="Kwoh, C K" uniqKey="Kwoh C">C.K. Kwoh</name>
</author>
<author>
<name sortKey="Hsu, L Y" uniqKey="Hsu L">L.Y. Hsu</name>
</author>
<author>
<name sortKey="Wirawan, A" uniqKey="Wirawan A">A. Wirawan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fertin, G" uniqKey="Fertin G">G. Fertin</name>
</author>
<author>
<name sortKey="Jean, G" uniqKey="Jean G">G. Jean</name>
</author>
<author>
<name sortKey="Radulescu, A" uniqKey="Radulescu A">A. Radulescu</name>
</author>
<author>
<name sortKey="Rusu, I" uniqKey="Rusu I">I. Rusu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fertin, G" uniqKey="Fertin G">G. Fertin</name>
</author>
<author>
<name sortKey="Jean, G" uniqKey="Jean G">G. Jean</name>
</author>
<author>
<name sortKey="Radulescu, A" uniqKey="Radulescu A">A. Radulescu</name>
</author>
<author>
<name sortKey="Rusu, I" uniqKey="Rusu I">I. Rusu</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, J T" uniqKey="Simpson J">J.T. Simpson</name>
</author>
<author>
<name sortKey="Wong, K" uniqKey="Wong K">K. Wong</name>
</author>
<author>
<name sortKey="Jackman, S D" uniqKey="Jackman S">S.D. Jackman</name>
</author>
<author>
<name sortKey="Schein, J E" uniqKey="Schein J">J.E. Schein</name>
</author>
<author>
<name sortKey="Jones, S J M" uniqKey="Jones S">S.J.M. Jones</name>
</author>
<author>
<name sortKey="Birol, I" uniqKey="Birol I">I. Birol</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gong, Z" uniqKey="Gong Z">Z. Gong</name>
</author>
<author>
<name sortKey="Wu, Y" uniqKey="Wu Y">Y. Wu</name>
</author>
<author>
<name sortKey="Koblizkova, A" uniqKey="Koblizkova A">A. Koblížková</name>
</author>
<author>
<name sortKey="Torres, G A" uniqKey="Torres G">G.A. Torres</name>
</author>
<author>
<name sortKey="Wang, K" uniqKey="Wang K">K. Wang</name>
</author>
<author>
<name sortKey="Iovene, M" uniqKey="Iovene M">M. Iovene</name>
</author>
<author>
<name sortKey="Neumann, P" uniqKey="Neumann P">P. Neumann</name>
</author>
<author>
<name sortKey="Zhang, W" uniqKey="Zhang W">W. Zhang</name>
</author>
<author>
<name sortKey="Novak, P" uniqKey="Novak P">P. Novák</name>
</author>
<author>
<name sortKey="Buell, C R" uniqKey="Buell C">C.R. Buell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Macas, J" uniqKey="Macas J">J. Macas</name>
</author>
<author>
<name sortKey="Navratilova, A" uniqKey="Navratilova A">A. Navrátilová</name>
</author>
<author>
<name sortKey="Meszaros, T" uniqKey="Meszaros T">T. Mészáros</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="iso-abbrev">Nucleic Acids Res</journal-id>
<journal-id journal-id-type="publisher-id">nar</journal-id>
<journal-title-group>
<journal-title>Nucleic Acids Research</journal-title>
</journal-title-group>
<issn pub-type="ppub">0305-1048</issn>
<issn pub-type="epub">1362-4962</issn>
<publisher>
<publisher-name>Oxford University Press</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">28402514</article-id>
<article-id pub-id-type="pmc">5499541</article-id>
<article-id pub-id-type="doi">10.1093/nar/gkx257</article-id>
<article-id pub-id-type="publisher-id">gkx257</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Methods Online</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Novák</surname>
<given-names>Petr</given-names>
</name>
<xref ref-type="aff" rid="AFF1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ávila Robledillo</surname>
<given-names>Laura</given-names>
</name>
<xref ref-type="aff" rid="AFF1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Koblížková</surname>
<given-names>Andrea</given-names>
</name>
<xref ref-type="aff" rid="AFF1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Vrbová</surname>
<given-names>Iva</given-names>
</name>
<xref ref-type="aff" rid="AFF1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Neumann</surname>
<given-names>Pavel</given-names>
</name>
<xref ref-type="aff" rid="AFF1"></xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Macas</surname>
<given-names>Jiří</given-names>
</name>
<pmc-comment>macas@umbr.cas.cz</pmc-comment>
<xref ref-type="aff" rid="AFF1"></xref>
<xref ref-type="corresp" rid="COR1"></xref>
</contrib>
</contrib-group>
<aff id="AFF1">Institute of Plant Molecular Biology, Biology Centre CAS, České Budějovice CZ-37005, Czech Republic</aff>
<author-notes>
<corresp id="COR1">
<label>*</label>
To whom correspondence should be addressed. Tel: +420 387 775 516; Fax: +420 385 310 356; Email:
<email>macas@umbr.cas.cz</email>
</corresp>
</author-notes>
<pub-date pub-type="ppub">
<day>07</day>
<month>7</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="epub" iso-8601-date="2017-04-10">
<day>10</day>
<month>4</month>
<year>2017</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>10</day>
<month>4</month>
<year>2017</year>
</pub-date>
<pmc-comment> PMC Release delay is 0 months and 0 days and was based on the . </pmc-comment>
<volume>45</volume>
<issue>12</issue>
<fpage>e111</fpage>
<lpage>e111</lpage>
<history>
<date date-type="accepted">
<day>04</day>
<month>4</month>
<year>2017</year>
</date>
<date date-type="rev-recd">
<day>23</day>
<month>3</month>
<year>2017</year>
</date>
<date date-type="received">
<day>25</day>
<month>1</month>
<year>2017</year>
</date>
</history>
<permissions>
<copyright-statement>© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.</copyright-statement>
<copyright-year>2017</copyright-year>
<license license-type="cc-by-nc" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
<uri xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</uri>
), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
<email>journals.permissions@oup.com</email>
</license-p>
</license>
</permissions>
<self-uri xlink:href="gkx257.pdf"></self-uri>
<abstract>
<title>Abstract</title>
<p>Satellite DNA is one of the major classes of repetitive DNA, characterized by tandemly arranged repeat copies that form contiguous arrays up to megabases in length. This type of genomic organization makes satellite DNA difficult to assemble, which hampers characterization of satellite sequences by computational analysis of genomic contigs. Here, we present tandem repeat analyzer (TAREAN), a novel computational pipeline that circumvents this problem by detecting satellite repeats directly from unassembled short reads. The pipeline first employs graph-based sequence clustering to identify groups of reads that represent repetitive elements. Putative satellite repeats are subsequently detected by the presence of circular structures in their cluster graphs. Consensus sequences of repeat monomers are then reconstructed from the most frequent
<italic>k</italic>
-mers obtained by decomposing read sequences from corresponding clusters. The pipeline performance was successfully validated by analyzing low-pass genome sequencing data from five plant species where satellite DNA was previously experimentally characterized. Moreover, novel satellite repeats were predicted for the genome of
<italic>Vicia faba</italic>
and three of these repeats were verified by detecting their sequences on metaphase chromosomes using fluorescence
<italic>in situ</italic>
hybridization.</p>
</abstract>
<counts>
<page-count count="10"></page-count>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="SEC1">
<title>INTRODUCTION</title>
<p>Satellite DNA (satDNA) is a class of repetitive DNA that is characterized by its genomic organization into long arrays of tandemly arranged units called monomers. The monomer sequences are typically hundreds of nucleotides long and highly homogenized (
<xref rid="B1" ref-type="bibr">1</xref>
). Although monomer length is often used to classify genomic tandem repeats as microsatellites (2–7 bp), minisatellites (tens of bp) or satellites (hundreds of bp), it appears that satellites are best distinguished by forming longer arrays (tens of kilobases up to megabases) concentrated in relatively few genomic loci, while micro- and mini-satellite arrays are much shorter and scattered across the genome. These differences in genomic organization probably reflect different amplification and homogenization mechanisms acting on these repeats (
<xref rid="B2" ref-type="bibr">2</xref>
<xref rid="B5" ref-type="bibr">5</xref>
). In the majority of eukaryotic genomes studied to date, satDNA was predominantly located in subtelomeric and centromeric chromosome regions, and the role of satDNA in centromere determination and function is the subject of ongoing research (
<xref rid="B6" ref-type="bibr">6</xref>
). In some organisms, such as higher plants, satellite repeats are also located in interstitial chromosome regions, forming prominent heterochromatic bands (
<xref rid="B7" ref-type="bibr">7</xref>
,
<xref rid="B8" ref-type="bibr">8</xref>
). The overall patterns of satDNA distribution revealed by fluorescence
<italic>in situ</italic>
hybridization (FISH) are frequently used in karyotype studies because they can provide markers for distinguishing morphologically similar chromosomes (
<xref rid="B9" ref-type="bibr">9</xref>
,
<xref rid="B10" ref-type="bibr">10</xref>
).</p>
<p>Investigation of satDNA or its utilization as a cytogenetic marker requires
<italic>a priori</italic>
knowledge of the nucleotide sequences of satellite repeats in the species of interest. However, satDNA is among the most dynamic components of eukaryotic genomes and its high evolutionary rate results in considerable sequence diversification, therefore most satellite repeat families are species- or genus-specific (
<xref rid="B1" ref-type="bibr">1</xref>
). Consequently, identification of satDNA by its similarity to known repeats from phylogenetically distant taxa is not possible. For these reasons, there has been continuous demand for efficient
<italic>ab initio</italic>
methods for satDNA identification. Satellite DNA acquired its name from density gradient centrifugation experiments, where it was discovered as a constituent of satellite bands formed due to its different buoyant density compared to the bulk of genomic DNA (
<xref rid="B11" ref-type="bibr">11</xref>
). Thus, density centrifugation was the first method of satDNA isolation, followed by other experimental approaches based, for example, on the presence of specific restriction sites in monomer sequences (
<xref rid="B12" ref-type="bibr">12</xref>
) or on the self-priming of tandemly repeated sequences in a modified PCR protocol (
<xref rid="B8" ref-type="bibr">8</xref>
). Although these methods led to identification of numerous satellite repeats, they are mostly limited to isolation of highly amplified repeats and biased towards those that can be distinguished by some property of their sequences, such as the presence of conserved restriction site. However, the satellites lacking these features may remain unnoticed.</p>
<p>An alternative to experimental methods for satDNA isolation is to identify their presence in genomic sequence data. Due to the introduction of next generation sequencing technologies, generating such data is no longer a limiting factor for genome investigation. Bioinformatics tools, such as Tandem Repeats Finder (TRF) (
<xref rid="B13" ref-type="bibr">13</xref>
), can then be used to search genomic sequences for tandem repeats including satellite DNA. As reviewed by Glunčić and Paar (
<xref rid="B14" ref-type="bibr">14</xref>
), TRF is a representative of string matching algorithms, which are utilized in a number of computational tools for tandem repeat prediction, along with alternative approaches based on nucleotide autocorrelation functions (
<xref rid="B15" ref-type="bibr">15</xref>
,
<xref rid="B16" ref-type="bibr">16</xref>
) and Fourier transforms (
<xref rid="B17" ref-type="bibr">17</xref>
). However, a common limitation of these tools is their need for long input sequences, spanning more than one repeat monomer. Although such long contigs are routinely available from whole genome assemblies, they often lack or are severely underrepresented for satellite repeats. This is because satellite repeats are extremely difficult to assemble due to their structure and high sequence homogeneity (
<xref rid="B18" ref-type="bibr">18</xref>
). Thus, the search for satellite repeats should ideally be performed in unassembled reads but this approach is hampered by relatively short length of the reads produced by most of the currently used NGS technologies.</p>
<p>The task of repeat identification from unassembled NGS reads has been addressed by the introduction of a similarity-based clustering algorithm which evaluates all-to-all sequence comparisons between whole genome shotgun reads (
<xref rid="B19" ref-type="bibr">19</xref>
,
<xref rid="B20" ref-type="bibr">20</xref>
). When applied to low-coverage (0.01–0.50×) genome sequencing data, there are almost no similarities detected between reads derived from single-copy sequences. On the other hand, reads that originated from repetitive elements produce multiple similarity hits and can thus be identified as clusters of frequently overlapping sequences. The number of reads in each cluster is proportional to the genomic abundance of the corresponding repeat, thus enabling its quantification. This repeat clustering analysis is at the core of the RepeatExplorer pipeline (
<xref rid="B20" ref-type="bibr">20</xref>
), which was originally designed and used for repeat characterization in plants (reviewed in (
<xref rid="B21" ref-type="bibr">21</xref>
)), but also proved to be efficient in repeat identification in other organisms, including bats (
<xref rid="B22" ref-type="bibr">22</xref>
), fish (
<xref rid="B23" ref-type="bibr">23</xref>
) and insects (
<xref rid="B24" ref-type="bibr">24</xref>
).</p>
<p>The clustering algorithm employed by RepeatExplorer represents the reads and their sequence similarities as nodes and connecting edges, respectively, in a virtual graph, and identifies read clusters by examination of the graph topology (
<xref rid="B19" ref-type="bibr">19</xref>
). In addition to efficient partitioning of the graph into clusters, this approach has the benefit of providing graphical representation of individual clusters. The shapes of these graphs reflect the genomic organization and sequence variability of corresponding repeats, ranging from linear structures typical for dispersed transposable elements to circular or globular shapes of tandemly repeated sequences (
<xref rid="B19" ref-type="bibr">19</xref>
). It has been demonstrated for a number of species analyzed using graph-based read clustering that the graph shapes can be reliably used to discover novel satellite repeats (
<xref rid="B25" ref-type="bibr">25</xref>
<xref rid="B31" ref-type="bibr">31</xref>
). However, the need to visually inspect the graph shapes represented a limitation of this approach and prevented its full automation. Another problem with this approach concerned identification of the most abundant variants of monomer sequences which are then needed for downstream applications, including a design of hybridization probes or PCR primers. Inferring monomer consensus using traditional methods based on multiple sequence alignments is not feasible due to large numbers of analyzed reads and a principally similar approach employing sequence assembly results in multiple contigs which require further manual processing. However, it was shown that alignment-free approaches utilizing
<italic>k</italic>
-mer frequency statistics are more suitable for monomer reconstruction from unassembled sequence reads (
<xref rid="B29" ref-type="bibr">29</xref>
,
<xref rid="B32" ref-type="bibr">32</xref>
,
<xref rid="B33" ref-type="bibr">33</xref>
) and therefore could fill this last gap in the automated workflow once implemented into an efficient computational tool.</p>
<p>In this work, we present tandem repeat analyzer (TAREAN), a computational pipeline which was built on the principles of graph-based repeat clustering, enhanced and supplemented with additional tools facilitating unsupervised identification and characterization of satellite repeats from unassembled sequence reads. The pipeline uses low-pass whole genome sequence reads as its input and performs their graph-based clustering as the first step in the analysis. Resulting clusters, representing all types of repeats, are then examined for the presence of circular structures characteristic for tandem repeats. This is achieved by constructing directed graphs from read similarities and selecting clusters that contain strongly connected components in their graphs. In addition, paired-end read information is utilized to discriminate clusters representing potential satellite repeats from other types of tandemly repeated sequences. Reads from these clusters are then decomposed to
<italic>k</italic>
-mers and fractions of the most frequent
<italic>k</italic>
-mers are used for reconstructing representative monomer sequences for each satellite repeat. To test the efficiency and specificity of the pipeline, we first analyzed NGS data from five plant species with various numbers of previously characterized satellite repeat families which differ in their genomic abundance. Moreover, we demonstrated that TAREAN can also identify novel satellite repeats that were subsequently verified by their detection on metaphase chromosomes using FISH with probes designed according to reconstructed monomer sequences.</p>
</sec>
<sec sec-type="materials|methods" id="SEC2">
<title>MATERIALS AND METHODS</title>
<sec id="SEC2-1">
<title>The workflow of TAREAN</title>
<p>
<italic>Input data</italic>
. The analysis requires paired-end reads generated by whole genome shotgun sequencing provided as a single FASTA formatted file. Read length should be 100–200 nt and the number of analyzed reads should represent less than 1× genome equivalent (genome coverage of 0.01–0.50× is recommended). Illumina 2 × 100 nt reads were used in this work, however, paired-end reads generated by other NGS platforms should be also suitable for analysis, provided that the sequenced fragments are of sufficient length to avoid frequent overlaps of paired-end read sequences. Reads should be of uniform length, quality-filtered (quality score ≥10 over 95% of bases, no Ns allowed) and only complete read pairs should be submitted for analysis. The analysis workflow is schematically depicted in Figure
<xref ref-type="fig" rid="F1">1</xref>
.</p>
<fig id="F1" orientation="portrait" position="float">
<label>Figure 1.</label>
<caption>
<p>Schematic representation of the TAREAN analysis workflow.</p>
</caption>
<graphic xlink:href="gkx257fig1"></graphic>
</fig>
<sec id="SEC2-1-1">
<title>Graph-based clustering</title>
<p>The read clustering algorithm is the same as described by Novák
<italic>et al</italic>
. (
<xref rid="B19" ref-type="bibr">19</xref>
). Briefly, reads are subjected to all-to-all sequence comparisons and their mutual similarities exceeding a specified threshold (90% similarity over at least 55% of the read length) are represented as a graph in which vertices correspond to sequence reads and overlapping reads are connected by edges. The resulting graph is then subjected to the Louvain method for community detection and partitioned into clusters (
<xref rid="B34" ref-type="bibr">34</xref>
) (Figure
<xref ref-type="fig" rid="F1">1A</xref>
). Although this method is computationally efficient and does not generate chimeric clusters, its drawback is that some families of repetitive elements frequently get split into multiple clusters rather than being represented as a single cluster (
<xref rid="B19" ref-type="bibr">19</xref>
,
<xref rid="B30" ref-type="bibr">30</xref>
). However, by utilizing paired-end read information, these split clusters can be identified and merged. Merging is performed for clusters that share significant proportions of broken read pairs (that is, when paired-end reads are present in different clusters), as determined with the formula:
<disp-formula>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}\begin{equation*}{k_{x,y}} = \frac{{2W}}{{{n_x} + {n_y}}}\end{equation*}\end{document}</tex-math>
</disp-formula>
where
<italic>W</italic>
is the number of read pairs shared between clusters
<italic>x</italic>
and
<italic>y</italic>
, and
<italic>n
<sub>x</sub>
</italic>
and
<italic>n
<sub>y</sub>
</italic>
are the numbers of broken read pairs in clusters
<italic>x</italic>
and
<italic>y</italic>
, respectively. The cutoff for cluster merging used in our analysis was set to
<italic>k
<sub>x,y</sub>
</italic>
≥ 0.2.</p>
</sec>
<sec id="SEC2-1-2">
<title>Automated detection of circular structures in cluster graphs</title>
<p>Following clustering, each cluster that represents abundant genomic repeat is examined for the presence of circular structures indicative of tandem repeats (clusters that contain at least 0.01% of input reads are analyzed by default). This is achieved by constructing a directed graph from the read similarities (Figure
<xref ref-type="fig" rid="F1">1B</xref>
) and testing if the graph is strongly connected, which means that it can be traversed from one read to any other read through a series of similarity overlaps (
<xref rid="B35" ref-type="bibr">35</xref>
). This is implemented by first constructing an edge signed graph
<italic>Σ</italic>
where vertices represent reads, edges connect overlapping reads and signs of the edges reflect orientation of the overlapping reads, being positive for forward to forward and negative for forward to reverse complement overlaps (
<xref rid="B36" ref-type="bibr">36</xref>
). The minimum spanning tree
<italic>Σ</italic>
<sub>msp</sub>
of the graph
<italic>Σ</italic>
is then traversed using depth first search and each vertex which is connected to a previously visited vertex with a negative edge is switched (i.e. reverse-complemented). The resulting switching equivalent graph of
<italic>Σ</italic>
<sub>msp</sub>
is used in the next iteration, finally leading to the directed graph
<italic>G</italic>
where all edges are positively signed. Next, the proportion of the largest strongly connected component in graph
<italic>G</italic>
is calculated as the connected component index
<italic>C</italic>
:
<disp-formula>
<tex-math id="M2">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}\begin{equation*}C = \frac{{V({G_{LSCC}})}}{{V(G)}}\end{equation*}\end{document}</tex-math>
</disp-formula>
where
<italic>V</italic>
(
<italic>G</italic>
) is the number of vertices of the graph
<italic>G</italic>
and
<italic>V</italic>
(
<italic>G</italic>
<sub>LSCC</sub>
) is the number of vertices in the graph
<italic>G
<sub>LSCC</sub>
</italic>
, which is a subgraph of
<italic>G</italic>
and corresponds to its largest strongly connected component (Figure
<xref ref-type="fig" rid="F1">1C</xref>
). For graphs derived from exact tandem repeats,
<italic>C</italic>
= 1.</p>
</sec>
<sec id="SEC2-1-3">
<title>Identification of putative satellite repeats</title>
<p>Although the parameter
<italic>C</italic>
facilitates the identification of clusters representing tandemly repeated genomic sequences, it does not efficiently discriminate clusters derived from satellite DNA from those representing other types of tandem repeats. Therefore, an additional cluster characteristic providing a proportion of broken read pairs is calculated. A typical feature of satellite repeats is that they occur in long contiguous arrays of monomers ranging up to megabases in length, whereas other tandem repeats form arrays in a range of hundreds to thousands of bp. Consequently, clusters of satDNA contain low proportions of broken read pairs, because most sequenced DNA fragments are entirely made of the same repeat. On the other hand, the proportions of broken pairs are much higher in tandem repeats scattered in the genome in a high number of short arrays, because many sequenced fragments span the junctions between a tandem repeat array and its neighboring genomic sequences. This is evaluated as the pair completeness index
<italic>P</italic>
using a formula:
<disp-formula>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} }{}\begin{equation*}P = \frac{{{N_C}}}{{{N_C} + {N_I}}}\end{equation*}\end{document}</tex-math>
</disp-formula>
where
<italic>N
<sub>C</sub>
</italic>
is the number of complete read pairs in the cluster
<italic>N</italic>
and
<italic>N
<sub>I</sub>
</italic>
is the number of broken pairs. Both criteria,
<italic>C</italic>
and
<italic>P</italic>
, are then used simultaneously to detect putative satellite repeats, which have expected values close to 1 for both. Estimation of the threshold values of
<italic>C</italic>
and
<italic>P</italic>
suitable for sensitive yet reliable identification of putative satellite repeats was performed by re-analyzing 2968 manually annotated clusters from 11 plant species selected from the dataset published by Macas
<italic>et al</italic>
. (
<xref rid="B30" ref-type="bibr">30</xref>
). The estimation was done using discriminant analysis based on a Gaussian finite mixture model (
<xref rid="B37" ref-type="bibr">37</xref>
) as implemented in the R package mclust.</p>
</sec>
<sec id="SEC2-1-4">
<title>Reconstruction of monomer sequences from the most frequent k-mers</title>
<p>Reconstruction of prevailing sequence variants is performed by counting the occurrences of
<italic>k</italic>
-mers in a set of oriented reads obtained from the directed graph
<italic>G. k</italic>
-mers with lengths
<italic>k</italic>
= 11–27 are analyzed in parallel. The use of oriented reads ensures that the sequence reconstruction will be performed in one direction only, avoiding parallel reconstruction of its reverse complement. Identified
<italic>k</italic>
-mers are sorted based on their proportions in the analyzed sequence data and the resulting sorted list with the most frequent
<italic>k</italic>
-mers at the top is used in the subsequent analysis. The most frequent
<italic>k</italic>
-mers that represent 50% of the sequence data are used to construct a de Bruijn graph
<italic>B</italic>
and are removed from the
<italic>k</italic>
-mer list.
<italic>k</italic>
-mer frequencies are represented in the graph as weights of the corresponding vertices. The graph is then checked for the presence of cycles, and the subgraph
<italic>B
<sub>LSCC</sub>
</italic>
with the largest strongly connected component is identified. If there is no strongly connected component or if the sum of vertex weights in
<italic>B
<sub>LSCC</sub>
</italic>
is less than the threshold
<italic>p
<sub>km</sub>
</italic>
, additional
<italic>k</italic>
-mers from the top of the list are iteratively added until the threshold is reached (the optimal value of
<italic>p
<sub>km</sub>
</italic>
was tested empirically and set to 0.225). This process leads to graphs with reduced numbers of vertices yet containing cycles corresponding to prevalent monomers of tandem repeats (Figure
<xref ref-type="fig" rid="F1">1D</xref>
). Variants of monomer sequences are then extracted from the cycles by converting the sequences of
<italic>k</italic>
-mers making up de Bruijn graphs to nucleotide sequences, aligning sequences of the same length and calculating consensus and position probability matrices (PPM) from
<italic>k</italic>
-mer weights (Figure
<xref ref-type="fig" rid="F1">1E</xref>
). To limit the number of cycles used for monomer reconstruction, only the cycle with the highest weight (the sum of weights of all vertices in the cycle) is considered for each graph branch. In case of length variation in the reconstructed monomer sequences, multiple PPMs are produced and the one with the highest total weight is reported.</p>
</sec>
<sec id="SEC2-1-5">
<title>Identification of other types of repetitive sequences</title>
<p>Genes coding for 45S and 5S ribosomal RNAs are arranged as multi-copy tandem arrays in eukaryotic genomes and as such are detected as putative satellites by TAREAN. However, they are identified by similarity searches to a custom database of Viridiplantae rDNA sequences and reported separately in the program output. A specific type of repeats with potential for producing false-positive results are LTR-retrotransposons, which, due to the presence of direct terminal repeats, form circular graphs (
<xref rid="B19" ref-type="bibr">19</xref>
). To avoid this misclassification, we search for LTR retrotransposon-specific features in the reconstructed consensus sequences, including the presence of primer binding sites (PBS) complementary to some tRNAs (
<xref rid="B38" ref-type="bibr">38</xref>
) and retrotransposon protein-coding open reading frames longer than 300 bp.</p>
</sec>
</sec>
<sec id="SEC2-2">
<title>Implementation</title>
<p>TAREAN is implemented using custom python and R scripts. Graph analysis was performed using igraph, a software collection for complex network research (
<xref rid="B39" ref-type="bibr">39</xref>
). All scripts and databases are available for download from
<ext-link ext-link-type="uri" xlink:href="http://w3lamc.umbr.cas.cz/lamc/resources.php">http://w3lamc.umbr.cas.cz/lamc/resources.php</ext-link>
. Additionally, TAREAN was implemented under Galaxy web-based environment (
<xref rid="B40" ref-type="bibr">40</xref>
) and made available as a tool in the public RepeatExplorer server (
<xref rid="B20" ref-type="bibr">20</xref>
) at
<ext-link ext-link-type="uri" xlink:href="http://www.repeatexplorer.org">http://www.repeatexplorer.org</ext-link>
. The analyses presented in this paper were performed on Linux-based servers equipped with 16 GB RAM and 4–16 CPUs.</p>
</sec>
<sec id="SEC2-3">
<title>Pipeline testing and validation of the results</title>
<sec id="SEC2-3-1">
<title>Data</title>
<p>The pipeline was tested using genomic shotgun Illumina reads from five species with previously characterized satellite repeats. The reads were downloaded from European Nucleotide Archive (
<ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/ena">http://www.ebi.ac.uk/ena</ext-link>
) under accession numbers ERX379412 (
<italic>Vicia faba</italic>
L.), ERR063464 (
<italic>Pisum sativum</italic>
L.), ERP001569 (
<italic>Luzula elegans</italic>
Lowe), PRJEB9643 (
<italic>Rhynchospora pubera</italic>
(Vahl) Boeckeler) and SRX118541 (
<italic>Zea mays</italic>
L.).</p>
</sec>
<sec id="SEC2-3-2">
<title>Experimental validation of predicted satellite repeats</title>
<p>Reconstructed consensus sequences of satellites predicted by TAREAN in
<italic>Vicia faba</italic>
were used to design oligonucleotide probes for fluorescence
<italic>in situ</italic>
hybridization: Vf_TA11_H2, biotin-5΄-GGT TAC TTC ATC ACT AAG AAA CTA AGT TAA AAG ACT ATT AMT TAA TGA CAC-3΄; FokI_H1, fluorescein-5΄-CTA CCT TCC ATA ATG ACA AGG CTA CCA TCC ATT GGA GTA ACA AAA ATC TC-3΄. The oligo-probes were labeled with biotin or fluorescein at their 5΄ ends during synthesis. Alternatively, PCR primers were designed for amplification and cloning of satellites with longer monomers: Vf_TA39_1, 5΄-AGC ACG AAT AAA ACT AAA GTT C-3΄; Vf_TA39_2, 5΄-TAC TTT TGA AGT GAA ATG GAG-3΄; Vf_TA157_1, 5΄-GGT ATG AGA ATG GTG TAT CTT TTA TCA-3΄; Vf_TA157_2, 5΄-AGA AAA GAT ATT TGG TTT CGA ATG A-3΄. All oligonucleotides were synthesized by Integrated DNA Technologies (Leuven, Belgium). Probe amplification from total genomic DNA of
<italic>V. faba</italic>
and cloning was performed as described in Macas
<italic>et al</italic>
. (
<xref rid="B30" ref-type="bibr">30</xref>
). Probes were labeled with biotin-16-dUTP (Roche Diagnostics GmbH, Mannheim, Germany) or Alexa Fluor 568 (Thermo Fisher Scientific, Waltham, MA, USA) using nick translation (
<xref rid="B41" ref-type="bibr">41</xref>
) and FISH was performed according to Macas
<italic>et al</italic>
. (
<xref rid="B42" ref-type="bibr">42</xref>
). The oligo-probe FokI_H1 specific for FokI satellite (
<xref rid="B43" ref-type="bibr">43</xref>
) was used for simultaneous hybridization (two-color FISH) with the novel repeats to provide characteristic banding patterns allowing the discrimination of all chromosomes within the
<italic>V. faba</italic>
karyotype (
<xref rid="B44" ref-type="bibr">44</xref>
). Chromosomes were counterstained with DAPI and examined using a Nikon Eclipse 600 microscope. Images were captured using a DS-Qi1Mc cooled camera and NIS Elements 3.0 software (Laboratory Imaging, Praha, Czech Republic).</p>
</sec>
</sec>
</sec>
<sec sec-type="results" id="SEC3">
<title>RESULTS</title>
<sec id="SEC3-1">
<title>Major features of the pipeline and estimation of optimal parameters</title>
<p>The TAREAN pipeline takes paired-end NGS reads as input and outputs a list of clusters identified as putative satellite repeats, their genomic abundance and various cluster characteristics. The lengths and nucleotide sequences of reconstructed monomers are also provided and are accompanied by a detailed output from
<italic>k</italic>
-mer-based reconstruction including sequences and sequence logos of alternative variants of monomer sequences. A summary of this information is provided in HTML format and includes a table listing all analyzed clusters (an example of the HTML output is provided as
<xref ref-type="supplementary-material" rid="sup1">Supplementary Data</xref>
). More detailed information about clusters is provided in additional files. When the analysis is performed on a Galaxy server, all generated results are downloadable as a zip archive. Since read clustering results in thousands of clusters, the search for satellite repeats is limited to a subset of the largest clusters corresponding to the most abundant genomic repeats. The pipeline is set to analyze all clusters representing at least 0.01% of the input reads, but this size threshold can be changed in order to adjust the sensitivity of the analysis. Besides the satellite repeats, three other groups of clusters are reported in the output (i) LTR-retrotransposons, (ii) 45S and 5S rDNA and (iii) all remaining clusters passing the size threshold. As categories 1 and 2 contain sequences with circular graphs, their consensus is calculated in the same way as for the satellite repeats.</p>
<p>Since two cluster characteristics, the connected component index
<italic>C</italic>
and the pair completeness index
<italic>P</italic>
, are crucial for identification of satellite repeats, we searched for their optimal cutoff values by evaluating a pool of 2968 clusters from 11 species of legume plants. These clusters were manually annotated during our previous study (
<xref rid="B30" ref-type="bibr">30</xref>
) and included 174 satellites; the remaining clusters represented other kinds of genomic repeats. The
<italic>C</italic>
and
<italic>P</italic>
values of these clusters were used as training data for discriminant analysis to find the best model for satellite prediction (Figure
<xref ref-type="fig" rid="F2">2A</xref>
). Clusters identified as satellites according to this model were denoted as
<italic>high-confidence satellites</italic>
. Additionally, we also chose less strict criteria of
<italic>P</italic>
> 0.4 and
<italic>C</italic>
> 0.7 to be able to detect less typical satellite sequences which are then reported as
<italic>low-confidence satellites</italic>
. Examples of clusters with different
<italic>P</italic>
and
<italic>C</italic>
values with corresponding graph shapes are shown on Figure
<xref ref-type="fig" rid="F2">2B</xref>
 
<xref ref-type="fig" rid="F2">E</xref>
. In the model-based prediction using discriminant analysis, 143 (82%) of the reference satellites clusters were correctly classified as high-confidence satellites with a false positive rate of 1.4% (Table
<xref rid="tbl1" ref-type="table">1</xref>
). Employing the low-confidence category criteria resulted in detection of 173 out of 174 control satellite clusters but the higher sensitivity led to an increased false positive rate (18%).</p>
<fig id="F2" orientation="portrait" position="float">
<label>Figure 2.</label>
<caption>
<p>Training dataset and examples of cluster graphs. (
<bold>A</bold>
) Scatter plot of
<italic>C</italic>
(connected component index) and
<italic>P</italic>
(pair completeness index) values for all reference clusters. Red dots mark clusters that were manually annotated as satellite repeats. Threshold for classification, based on the best discriminant analysis model, is shown as a blue line and defines the high-confidence satellite group. Green lines mark empirically selected thresholds for the low-confidence category. (
<bold>B–D</bold>
) Examples of repeat clusters visualized as graphs where nodes represent sequence reads and edges connect reads with sequence similarities. Nodes belonging to the largest strongly connected components of the graphs are red; corresponding
<italic>C</italic>
and
<italic>P</italic>
values are shown below each graph.</p>
</caption>
<graphic xlink:href="gkx257fig2"></graphic>
</fig>
<table-wrap id="tbl1" orientation="portrait" position="float">
<label>Table 1.</label>
<caption>
<title>Performance of automatic classification model. A confusion matrix of numbers of clusters annotated as satellite repeats compared to the reference obtained by manual annotation</title>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">Manual annotation (reference)</th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="center" rowspan="1" colspan="1">Non-satellite</th>
<th align="center" rowspan="1" colspan="1">Satellite</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Automatic (model)</td>
<td align="left" rowspan="1" colspan="1">Non-satellite</td>
<td align="center" rowspan="1" colspan="1">2756</td>
<td align="center" rowspan="1" colspan="1">31</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Satellite</td>
<td align="center" rowspan="1" colspan="1">38</td>
<td align="center" rowspan="1" colspan="1">143</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="SEC3-2">
<title>Testing the pipeline performance using previously characterized satellite repeats</title>
<p>To validate the pipeline sensitivity and accuracy, we analyzed NGS data from five plant species in which satellite DNA was previously experimentally characterized (Table
<xref rid="tbl2" ref-type="table">2</xref>
). The satellites were identified in these species using restriction digestion-based cloning and/or library screening (
<italic>Z. mays, V. fab</italic>
a; (
<xref rid="B43" ref-type="bibr">43</xref>
,
<xref rid="B45" ref-type="bibr">45</xref>
<xref rid="B48" ref-type="bibr">48</xref>
)) or they were identified using bioinformatics tools, but subsequently verified by cloning, sequencing and FISH analysis (
<italic>Rhynchospora pubera, Pisum sativum, Luzula elegans</italic>
; (
<xref rid="B25" ref-type="bibr">25</xref>
<xref rid="B27" ref-type="bibr">27</xref>
,
<xref rid="B42" ref-type="bibr">42</xref>
)). These control species were selected for carrying diverse satellites with different monomer lengths, sequence variability, abundance and location in the genome. Moreover, these species represented three different types of chromosome organization, including species with monocentric (
<italic>Z. mays, V. faba</italic>
), meta-polycentric (
<italic>P. sativum</italic>
) and holocentric chromosomes (
<italic>L. elegans, R. pubera</italic>
).</p>
<table-wrap id="tbl2" orientation="portrait" position="float">
<label>Table 2.</label>
<caption>
<title>Evaluation of TAREAN performance in species with previously characterized satellites. Successful detection is marked by ‘++’ (high-confidence) or ‘+’ (low-confidence)</title>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">
<italic>Species</italic>
</th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
<th colspan="3" align="center" rowspan="1">TAREAN 500k</th>
<th colspan="4" align="center" rowspan="1">TAREAN max</th>
<th align="left" rowspan="1" colspan="1"></th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1">Satellite</th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">Merge</th>
<th align="center" rowspan="1" colspan="1"></th>
<th colspan="2" align="center" rowspan="1">Merge</th>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1"></th>
</tr>
<tr>
<th align="left" rowspan="1" colspan="1"></th>
<th align="left" rowspan="1" colspan="1">Monomer [bp]</th>
<th align="left" rowspan="1" colspan="1">Abundance [% genome]</th>
<th align="left" rowspan="1" colspan="1">Coverage</th>
<th align="left" rowspan="1" colspan="1">NO</th>
<th align="left" rowspan="1" colspan="1">0.2</th>
<th align="left" rowspan="1" colspan="1">Coverage</th>
<th align="left" rowspan="1" colspan="1">NO</th>
<th align="left" rowspan="1" colspan="1">0.2</th>
<th align="left" rowspan="1" colspan="1">Monomer [bp]</th>
<th align="left" rowspan="1" colspan="1">Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="11" align="left" rowspan="1">
<bold>
<italic>Zea mays</italic>
</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Zea/Tripsacum</td>
<td align="left" rowspan="1" colspan="1">180</td>
<td align="left" rowspan="1" colspan="1">2.10</td>
<td align="left" rowspan="1" colspan="1">5775</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">180</td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B47" ref-type="bibr">47</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">CentC</td>
<td align="left" rowspan="1" colspan="1">156</td>
<td align="left" rowspan="1" colspan="1">0.20</td>
<td align="left" rowspan="1" colspan="1">635</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">156</td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B45" ref-type="bibr">45</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-1(knobs)</td>
<td align="left" rowspan="1" colspan="1">350/180</td>
<td align="left" rowspan="1" colspan="1">0.11</td>
<td align="left" rowspan="1" colspan="1">156/303</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">359</td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B46" ref-type="bibr">46</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>Rhynchospora pubera</italic>
</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B26" ref-type="bibr">26</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Tyba-1</td>
<td align="left" rowspan="1" colspan="1">171</td>
<td align="left" rowspan="1" colspan="1">1.80</td>
<td align="left" rowspan="1" colspan="1">5211</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">172</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Tyba-2</td>
<td align="left" rowspan="1" colspan="1">171</td>
<td align="left" rowspan="1" colspan="1">1.16</td>
<td align="left" rowspan="1" colspan="1">3358</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">172</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>Luzula elegans</italic>
</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B27" ref-type="bibr">27</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT4</td>
<td align="left" rowspan="1" colspan="1">190/220/360</td>
<td align="left" rowspan="1" colspan="1">2.40</td>
<td align="left" rowspan="1" colspan="1">6253/3300</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">25263/13333</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">170</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT11</td>
<td align="left" rowspan="1" colspan="1">56</td>
<td align="left" rowspan="1" colspan="1">1.10</td>
<td align="left" rowspan="1" colspan="1">9723</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">39286</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">56</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT7</td>
<td align="left" rowspan="1" colspan="1">75</td>
<td align="left" rowspan="1" colspan="1">0.97</td>
<td align="left" rowspan="1" colspan="1">6402</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">25867</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">75</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT16</td>
<td align="left" rowspan="1" colspan="1">178/195</td>
<td align="left" rowspan="1" colspan="1">0.82</td>
<td align="left" rowspan="1" colspan="1">2280/2028</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">9213/8410</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">177/195</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT18</td>
<td align="left" rowspan="1" colspan="1">Variable</td>
<td align="left" rowspan="1" colspan="1">0.55</td>
<td align="left" rowspan="1" colspan="1">n.a.</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">n.a.</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">56</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT23</td>
<td align="left" rowspan="1" colspan="1">57</td>
<td align="left" rowspan="1" colspan="1">0.50</td>
<td align="left" rowspan="1" colspan="1">3397/1037</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">17544</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">57</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT17</td>
<td align="left" rowspan="1" colspan="1">161</td>
<td align="left" rowspan="1" colspan="1">0.48</td>
<td align="left" rowspan="1" colspan="1">1476</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">5963</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">161</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT38</td>
<td align="left" rowspan="1" colspan="1">137</td>
<td align="left" rowspan="1" colspan="1">0.37</td>
<td align="left" rowspan="1" colspan="1">1337</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">5401</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT25</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">0.36</td>
<td align="left" rowspan="1" colspan="1">29700</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">120000</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">SSR</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT22</td>
<td align="left" rowspan="1" colspan="1">51/167</td>
<td align="left" rowspan="1" colspan="1">0.35</td>
<td align="left" rowspan="1" colspan="1">1037</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">13725/4192</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">51</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT28</td>
<td align="left" rowspan="1" colspan="1">390/730</td>
<td align="left" rowspan="1" colspan="1">0.32</td>
<td align="left" rowspan="1" colspan="1">406/217</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">1730/877</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">392</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT43</td>
<td align="left" rowspan="1" colspan="1">190</td>
<td align="left" rowspan="1" colspan="1">0.23</td>
<td align="left" rowspan="1" colspan="1">599</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">2421</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">189</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT9 + 21</td>
<td align="left" rowspan="1" colspan="1">43</td>
<td align="left" rowspan="1" colspan="1">0.22</td>
<td align="left" rowspan="1" colspan="1">2533</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">10233</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">43</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT63</td>
<td align="left" rowspan="1" colspan="1">90</td>
<td align="left" rowspan="1" colspan="1">0.13</td>
<td align="left" rowspan="1" colspan="1">715</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">2889</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">89</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT72</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">0.13</td>
<td align="left" rowspan="1" colspan="1">16088</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">65000</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">SSR</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT36</td>
<td align="left" rowspan="1" colspan="1">6</td>
<td align="left" rowspan="1" colspan="1">0.12</td>
<td align="left" rowspan="1" colspan="1">9900</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">40000</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">24</td>
<td align="left" rowspan="1" colspan="1">SSR</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT99</td>
<td align="left" rowspan="1" colspan="1">180</td>
<td align="left" rowspan="1" colspan="1">0.11</td>
<td align="left" rowspan="1" colspan="1">303</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">1222</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">180</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT109</td>
<td align="left" rowspan="1" colspan="1">33</td>
<td align="left" rowspan="1" colspan="1">0.08</td>
<td align="left" rowspan="1" colspan="1">1245</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">5030</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">33</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT89</td>
<td align="left" rowspan="1" colspan="1">41</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">724</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">2927</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">LeSAT27</td>
<td align="left" rowspan="1" colspan="1">42</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">672</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">2714</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">84</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">
<bold>
<italic>Pisum sativum</italic>
</bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B25" ref-type="bibr">25</xref>
,
<xref rid="B42" ref-type="bibr">42</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">PisTR-B</td>
<td align="left" rowspan="1" colspan="1">50</td>
<td align="left" rowspan="1" colspan="1">1.37</td>
<td align="left" rowspan="1" colspan="1">13740</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">39516</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">50</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-5</td>
<td align="left" rowspan="1" colspan="1">54</td>
<td align="left" rowspan="1" colspan="1">0.51</td>
<td align="left" rowspan="1" colspan="1">4731</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">13608</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">54</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-2</td>
<td align="left" rowspan="1" colspan="1">440</td>
<td align="left" rowspan="1" colspan="1">0.21</td>
<td align="left" rowspan="1" colspan="1">235</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">677</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-4</td>
<td align="left" rowspan="1" colspan="1">172</td>
<td align="left" rowspan="1" colspan="1">0.20</td>
<td align="left" rowspan="1" colspan="1">581</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">1672</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">173</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-7</td>
<td align="left" rowspan="1" colspan="1">164</td>
<td align="left" rowspan="1" colspan="1">0.14</td>
<td align="left" rowspan="1" colspan="1">412</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">1184</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">164</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-11</td>
<td align="left" rowspan="1" colspan="1">510</td>
<td align="left" rowspan="1" colspan="1">0.10</td>
<td align="left" rowspan="1" colspan="1">101</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">290</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">459</td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-3</td>
<td align="left" rowspan="1" colspan="1">81</td>
<td align="left" rowspan="1" colspan="1">0.06</td>
<td align="left" rowspan="1" colspan="1">358</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">1030</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">82</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-19</td>
<td align="left" rowspan="1" colspan="1">2094</td>
<td align="left" rowspan="1" colspan="1">0.03</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">23</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-1</td>
<td align="left" rowspan="1" colspan="1">867</td>
<td align="left" rowspan="1" colspan="1">0.02</td>
<td align="left" rowspan="1" colspan="1">12</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">35</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">866</td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-18</td>
<td align="left" rowspan="1" colspan="1">1644</td>
<td align="left" rowspan="1" colspan="1">0.01</td>
<td align="left" rowspan="1" colspan="1">4</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">11</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-17</td>
<td align="left" rowspan="1" colspan="1">191</td>
<td align="left" rowspan="1" colspan="1">0.01</td>
<td align="left" rowspan="1" colspan="1">31</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">90</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">191</td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-6</td>
<td align="left" rowspan="1" colspan="1">245</td>
<td align="left" rowspan="1" colspan="1">0.01</td>
<td align="left" rowspan="1" colspan="1">22</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">65</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">TR-10</td>
<td align="left" rowspan="1" colspan="1">659</td>
<td align="left" rowspan="1" colspan="1">0.01</td>
<td align="left" rowspan="1" colspan="1">8</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">22</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold></bold>
</td>
<td align="left" rowspan="1" colspan="1"></td>
<td align="left" rowspan="1" colspan="1">Low coverage</td>
</tr>
<tr>
<td colspan="11" align="left" rowspan="1">
<bold>
<italic>Vicia faba</italic>
</bold>
</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">FokI</td>
<td align="left" rowspan="1" colspan="1">59</td>
<td align="left" rowspan="1" colspan="1">3.20</td>
<td align="left" rowspan="1" colspan="1">26847</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">53695</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>++</bold>
</td>
<td align="left" rowspan="1" colspan="1">59</td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B43" ref-type="bibr">43</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">pVf7</td>
<td align="left" rowspan="1" colspan="1">168</td>
<td align="left" rowspan="1" colspan="1">0.33</td>
<td align="left" rowspan="1" colspan="1">972</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">1945</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">
<bold>+</bold>
</td>
<td align="left" rowspan="1" colspan="1">169</td>
<td align="left" rowspan="1" colspan="1">(
<xref rid="B48" ref-type="bibr">48</xref>
)</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="T2TFN1">
<p>The columns show (from left to right) previously reported monomer sizes and genome abundance of reference satellite repeats and results of their detection by TAREAN with 500 000 reads (‘TAREAN 500k’) and with maximal number of reads that could be analyzed (‘TAREAN max’). The last ‘Monomer’ column provides lengths of consensus monomer sequences reconstructed by TAREAN. Multiple values reflect several repeat variants differing in monomer length.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>TAREAN runs were performed with 500 000 input reads which should provide sufficient sensitivity towards abundant satellites, yet keeping the computation time in the range of hours. Two analysis conditions were tested: the cluster merging option was either disabled or enabled at the cutoff value of 0.2. This analysis led to the successful detection of highly amplified satellites previously reported for
<italic>Z. mays, V. faba</italic>
and
<italic>R. pubera</italic>
(Table
<xref rid="tbl2" ref-type="table">2</xref>
). In
<italic>R. pubera</italic>
, both previously characterized subfamilies of the same satellite (Tyba-1 and Tyba-2) sharing ∼70% similarity (
<xref rid="B26" ref-type="bibr">26</xref>
) were detected and distinguished. In the other two species,
<italic>Luzula elegans</italic>
and
<italic>Pisum sativum</italic>
, the analysis identified all highly abundant satellites with genome proportions exceeding 0.5%, but failed to detect some less-amplified satellites with estimated genomic proportions between 0.01 and 0.50% (Table
<xref rid="tbl2" ref-type="table">2</xref>
). Thus, additional runs were performed with 2 million reads for
<italic>L. elegans</italic>
and 1.44 million for
<italic>P. sativum</italic>
, representing the maximal numbers of reads that could be processed at the given hardware configuration (the read numbers are different as they depend on the numbers of similarity hits between the reads, reflecting different proportions of repeats in each species). Although processing more reads improved the detection of four satellite repeats, 9 of 33 control satellites in these species remained unidentified. An investigation of the properties of these repeats provided an explanation of these results and enabled understanding of the sensitivity limits of TAREAN analysis, which were mostly determined by sequencing coverage, sequence homogeneity of monomers, genomic organization and similarities of satellites to other genomic repeats.</p>
<p>The sequencing coverage of a satellite was calculated as the total length of reads covering its sequences divided by monomer length. Thus, the calculation of coverage provided a normalization for genomic abundance (%) values, because satellites with the same genomic proportions but differing in monomer length have different coverages. For example, the
<italic>P. sativum</italic>
satellites, TR-17 and TR-18, both had genome proportions of 0.01%, but the former had higher coverage due to its shorter monomer length. The group of these less-amplified satellites (labeled as ‘low-coverage’ in Table
<xref rid="tbl2" ref-type="table">2</xref>
) revealed that the sensitivity limit of the analysis was at the coverage range of 30–100×, probably depending on the sequence homogeneity of individual satellites. Thus, the failure to detect the
<italic>P. sativum</italic>
satellites TR-18, TR-6, TR-10 and TR-19 could be explained by their low abundance. Moreover, in the case of TR-19, the detection was also hampered by the fact that this repeat represents a longer monomer variant of TR-11 and thus the two repeats occurred in the same cluster. Since TR-19 has a lower genomic copy number compared to TR-11 (
<xref rid="B25" ref-type="bibr">25</xref>
) its monomer was not reconstructed and reported.</p>
<p>It has been reported that some satellite repeats originated by amplification of short tandem arrays present in other genomic repeats such as retrotransposons (
<xref rid="B49" ref-type="bibr">49</xref>
). Such satellites may be difficult to detect by TAREAN because their cluster graphs could contain substantial portions of non-circular components representing neighboring regions of repeats from which they originated. This was the case in the
<italic>L. elegans</italic>
satellite, LeSAT38, and in TR-2 in
<italic>P. sativum</italic>
. Another group of undetected satellites comprised three
<italic>L. elegans</italic>
repeats with extremely short monomers corresponding to simple sequence repeats of 4 bp (LeSAT72) or 6 bp (LeSAT25 and LeSAT36). These repeats failed to produce clusters due to active masking of low complexity regions during sequence similarity searches. The partial exception was LeSAT36 where the basic motif of 6 bp also occurred as a mutated higher order repeat of 24 bp which was reported by TAREAN (Table
<xref rid="tbl2" ref-type="table">2</xref>
).</p>
</sec>
<sec id="SEC3-3">
<title>Identification and verification of novel satellite repeats</title>
<p>In addition to detecting previously described repeats, there were additional putative satellites reported for some of the analyzed species. The highest number was 12 novel satellites, which were identified for
<italic>V. faba</italic>
in the run analyzing a maximum number of 990 000 reads (Table
<xref rid="tbl3" ref-type="table">3</xref>
and
<xref ref-type="supplementary-material" rid="sup1">Supplementary Data</xref>
). One of them, Vf_TA70 was partially similar to the VicTR-B satellite described in several other
<italic>Vicia</italic>
species (
<xref rid="B8" ref-type="bibr">8</xref>
). No significant similarities were found for the other novel satellites.</p>
<table-wrap id="tbl3" orientation="portrait" position="float">
<label>Table 3.</label>
<caption>
<title>Putative novel satellite repeats identified in
<italic>Vicia faba</italic>
</title>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left" rowspan="1" colspan="1">Satellite</th>
<th align="left" rowspan="1" colspan="1">Monomer [bp]</th>
<th align="left" rowspan="1" colspan="1">Genome proportion [%]</th>
<th align="left" rowspan="1" colspan="1">Copy number /1C</th>
<th align="left" rowspan="1" colspan="1">Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_11</td>
<td align="left" rowspan="1" colspan="1">191</td>
<td align="left" rowspan="1" colspan="1">1.20</td>
<td align="left" rowspan="1" colspan="1">843 000</td>
<td align="left" rowspan="1" colspan="1">Verified by FISH (Figure
<xref ref-type="fig" rid="F3">3A</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_39</td>
<td align="left" rowspan="1" colspan="1">702</td>
<td align="left" rowspan="1" colspan="1">0.29</td>
<td align="left" rowspan="1" colspan="1">55 400</td>
<td align="left" rowspan="1" colspan="1">Verified by FISH (Figure
<xref ref-type="fig" rid="F3">3B</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_62</td>
<td align="left" rowspan="1" colspan="1">687</td>
<td align="left" rowspan="1" colspan="1">0.15</td>
<td align="left" rowspan="1" colspan="1">29 300</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_70</td>
<td align="left" rowspan="1" colspan="1">38</td>
<td align="left" rowspan="1" colspan="1">0.12</td>
<td align="left" rowspan="1" colspan="1">423 000</td>
<td align="left" rowspan="1" colspan="1">Similar to VicTR-B</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_108</td>
<td align="left" rowspan="1" colspan="1">1482</td>
<td align="left" rowspan="1" colspan="1">0.05</td>
<td align="left" rowspan="1" colspan="1">4500</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_109</td>
<td align="left" rowspan="1" colspan="1">870</td>
<td align="left" rowspan="1" colspan="1">0.05</td>
<td align="left" rowspan="1" colspan="1">7700</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_123</td>
<td align="left" rowspan="1" colspan="1">878</td>
<td align="left" rowspan="1" colspan="1">0.03</td>
<td align="left" rowspan="1" colspan="1">4600</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_137</td>
<td align="left" rowspan="1" colspan="1">603</td>
<td align="left" rowspan="1" colspan="1">0.03</td>
<td align="left" rowspan="1" colspan="1">6700</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_143</td>
<td align="left" rowspan="1" colspan="1">352</td>
<td align="left" rowspan="1" colspan="1">0.02</td>
<td align="left" rowspan="1" colspan="1">7600</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_154</td>
<td align="left" rowspan="1" colspan="1">560</td>
<td align="left" rowspan="1" colspan="1">0.02</td>
<td align="left" rowspan="1" colspan="1">4800</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_157</td>
<td align="left" rowspan="1" colspan="1">781</td>
<td align="left" rowspan="1" colspan="1">0.02</td>
<td align="left" rowspan="1" colspan="1">3400</td>
<td align="left" rowspan="1" colspan="1">Verified by FISH (Figure
<xref ref-type="fig" rid="F3">3C</xref>
)</td>
</tr>
<tr>
<td align="left" rowspan="1" colspan="1">Vf_TA_158</td>
<td align="left" rowspan="1" colspan="1">313</td>
<td align="left" rowspan="1" colspan="1">0.02</td>
<td align="left" rowspan="1" colspan="1">8600</td>
<td align="left" rowspan="1" colspan="1"></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Three of the novel repeats, Vf_TA11, Vf_TA39 and Vf_TA157, differing in monomer length and genomic abundance (Table
<xref rid="tbl3" ref-type="table">3</xref>
) were chosen for experimental validation by detecting their hybridization patterns on
<italic>V. faba</italic>
metaphase chromosomes using FISH. In all three repeats, band or dot-like patterns of signals typical for satellite DNA were detected. The repeat Vf_TA11 represented a highly amplified satellite with a 191 bp monomer, which was estimated to occur in 843 000 copies per haploid genome. The oligonucleotide probe (49 nt), designed according to the most conserved part of its reconstructed monomer sequence, produced a strong signal on the satellite arm of metacentric chromosome 1, and a number of minor signals close to centromeres of all acrocentric chromosomes (Figure
<xref ref-type="fig" rid="F3">3A</xref>
). Since monomer sequences predicted for the other two satellites were too long to be covered by oligonucleotide probes (701 and 782 bp), FISH was performed using PCR-amplified and cloned genomic fragments. In both cases the PCR with primers designed according to reconstructed monomer sequences yielded specific bands of expected lengths and their sequences confirmed predictions made by TAREAN (95.5% and 93.6% similarity to reconstructed consensus of Vf_TA39 and Vf_TA157, respectively). The resulting FISH patterns of Vf_TA39 consisted of multiple intercalary bands on most chromosomes, in agreement with relatively high abundance (55 400 copies/1C) of this repeat in the genome (Figure
<xref ref-type="fig" rid="F3">3B</xref>
). As expected, the much less amplified satellite Vf_TA157 (3400 copies/1C) produced weaker labeling which was limited to a single locus on chromosome 3 (Figure
<xref ref-type="fig" rid="F3">3C</xref>
).</p>
<fig id="F3" orientation="portrait" position="float">
<label>Figure 3.</label>
<caption>
<p>FISH localization of novel satellite repeats on metaphase chromosomes of
<italic>Vicia faba</italic>
. The probes for the novel satellites, Vf_TA11 (panel
<bold>A</bold>
), Vf_TA39 (panel
<bold>B</bold>
) and Vf_TA157 (panel
<bold>C</bold>
), are green, FokI repeats used for chromosome discrimination are labeled red and chromosomes counterstained with DAPI are blue.</p>
</caption>
<graphic xlink:href="gkx257fig3"></graphic>
</fig>
</sec>
</sec>
<sec sec-type="discussion" id="SEC4">
<title>DISCUSSION</title>
<p>In this work, we have introduced and validated TAREAN, a computational pipeline for the automated identification of satellite repeats from unassembled NGS reads. Although there are a number of computational tools available for detecting tandem repeats in assembled genomic sequences (
<xref rid="B50" ref-type="bibr">50</xref>
,
<xref rid="B51" ref-type="bibr">51</xref>
), corresponding tools utilizing short sequence reads are scarce. To our best knowledge, only two algorithms, DExTaR and MixTaR (
<xref rid="B52" ref-type="bibr">52</xref>
,
<xref rid="B53" ref-type="bibr">53</xref>
) have been published to address this problem. The former was designed for the detection of tandem repeats from de Bruijn graphs constructed for the purpose of genome assembly. It uses parts of de Bruijn graphs that were omitted from assembly and detects potential tandem repeats in the form of cycles. The method requires previous global assembly by a de Bruijn assembler such as ABySS (
<xref rid="B54" ref-type="bibr">54</xref>
) and is limited to the identification of exact tandem repeats. MixTaR represents an improved approach allowing detection of approximate tandem repeats, however, it requires long PacBio reads in addition to short Illumina reads for its analysis. Moreover, the algorithm was tested for detecting repeats with monomers up to 100 bp only, while most satellite DNA families have longer monomers (
<xref rid="B1" ref-type="bibr">1</xref>
).</p>
<p>In our previous work, we demonstrated that an alternative approach based on graph representations of repeat populations in eukaryotic genomes can be utilized for the identification of satellite repeats (
<xref rid="B19" ref-type="bibr">19</xref>
,
<xref rid="B29" ref-type="bibr">29</xref>
,
<xref rid="B31" ref-type="bibr">31</xref>
). This method, employing the RepeatExplorer pipeline (
<xref rid="B20" ref-type="bibr">20</xref>
) for performing similarity-based repeat clustering and generating graph visualizations, allows identification of approximate tandem repeats of any length, provided they are sufficiently represented in the analyzed short reads to form recognizable circular structures in their cluster graphs. Consequently, satellite repeats with various degrees of sequence conservation, and monomer lengths up to 5 kb can be identified (
<xref rid="B25" ref-type="bibr">25</xref>
,
<xref rid="B30" ref-type="bibr">30</xref>
,
<xref rid="B55" ref-type="bibr">55</xref>
). Recently, a modification of this approach, employing iterative clustering in order to improve its sensitivity towards low-copy tandem repeats, has been published by Ruiz-Ruano
<italic>et al</italic>
. (
<xref rid="B28" ref-type="bibr">28</xref>
). Nevertheless, both setups require human intervention for graph shape examination and the former does not provide consensus sequences of identified satellite repeats; features that were fully addressed in TAREAN.</p>
<p>Testing TAREAN performance using short NGS reads from five control species revealed its excellent efficiency in detecting highly abundant satellite repeats and very good performance in identifying less-amplified satellites. In addition to the repeat identification, consensus monomer sequences were accurately reconstructed in most cases. On the other hand, a fraction of previously described repeat families was not identified in test runs. When evaluating TAREAN performance, it should be acknowledged that the tool was specifically designed for the detection of genuine satellite repeats, a category of tandemly repeated sequences characterized by the forming of long contiguous arrays of highly homogenized monomer sequences. However, this category is not always clearly separated from other genomic tandem repeats. For example, some satellite repeats originate through the amplification of short tandem repeat arrays present in mobile elements. Thus, the same monomer sequences occur in the genome as short, dispersed tandem arrays, as well as in a few long arrays typical for satellite repeats (
<xref rid="B49" ref-type="bibr">49</xref>
). Such repeats then produce cluster graphs with intermediate features combining circular and linear structures, thus hampering their identification. Satellite repeats derived from a large intergenic spacer (IGS) of 45S rDNA represent a similar case, being present as short arrays within IGS and as amplified satellites elsewhere in the genome (
<xref rid="B48" ref-type="bibr">48</xref>
,
<xref rid="B56" ref-type="bibr">56</xref>
). The repeat pVf7 from
<italic>Vicia faba</italic>
(
<xref rid="B48" ref-type="bibr">48</xref>
) represented IGS-derived satellites in our data; although it was successfully identified by TAREAN, it was only listed in the low-confidence category (Table
<xref rid="tbl2" ref-type="table">2</xref>
) due to its complex graph structure. Another example of a satellite with intermediate features that was reported with lower confidence was the Tyba satellite of
<italic>Rhynchospora pubera</italic>
(Table
<xref rid="tbl2" ref-type="table">2</xref>
). Tyba is the satellite associated with centromeric chromatin, which is dispersed along holocentric
<italic>R. pubera</italic>
chromosomes. Thus, Tyba is organized in multiple arrays only up to tens of kilobases long as revealed by FISH and sequence analysis of BAC clones (
<xref rid="B26" ref-type="bibr">26</xref>
).</p>
<p>Regarding the ability to detect less abundant satellite repeats, there is no simple rule that could be used to determine a TAREAN sensitivity threshold. This is because the successful identification of a particular repeat depends on multiple factors, including its copy number in the genome, sequence variability, genomic organization and number of reads that were analyzed. In principle, increasing the number of analyzed reads results in more efficient detection of less amplified satellites (Table
<xref rid="tbl2" ref-type="table">2</xref>
) but the genome sequencing coverage should not exceed 0.5-1.0x in order to avoid similarity hits between single-copy sequences during clustering analysis. However, in species with large and repeat-rich genomes, such coverage might be hard to reach due to constraints imposed by computational resources. The limiting factor for read clustering analysis is the number of similarity hits between reads, which increases with increasing proportions of high copy number repeats in the genome. Therefore, smaller numbers of reads can be clustered in highly repetitive genomes compared to those with low proportions of highly repeated sequences (
<xref rid="B19" ref-type="bibr">19</xref>
). On the other hand, even the low coverage used in this study for the large, repeat-rich genome of
<italic>V. faba</italic>
(990 000 reads correspond to 0.007× genome equivalent) proved to be sufficient to identify relatively rare repeats like Vf_TA157 with only thousands of copies per haploid genome (Table
<xref rid="tbl3" ref-type="table">3</xref>
, Figure
<xref ref-type="fig" rid="F3">3C</xref>
).</p>
</sec>
<sec id="SEC5">
<title>AVAILABILITY</title>
<p>Command-line version of TAREAN can be downloaded from
<ext-link ext-link-type="uri" xlink:href="http://w3lamc.umbr.cas.cz/lamc/resources.php">http://w3lamc.umbr.cas.cz/lamc/resources.php</ext-link>
. The pipeline can also be run via Galaxy web interface at our public RepeatExplorer server (
<ext-link ext-link-type="uri" xlink:href="http://www.repeatexplorer.org">http://www.repeatexplorer.org</ext-link>
).</p>
</sec>
<sec sec-type="supplementary-material">
<title>Supplementary Material</title>
<supplementary-material content-type="local-data" id="sup1">
<label>Supplementary Data</label>
<media xlink:href="gkx257_supp.pdf">
<caption>
<p>Click here for additional data file.</p>
</caption>
</media>
</supplementary-material>
</sec>
</body>
<back>
<ack>
<title>ACKNOWLEDGEMENTS</title>
<p>Access to computing and data storage facilities provided by the ELIXIR CZ infrastructure is greatly appreciated.</p>
<p>
<italic>Author contributions:</italic>
P.No. implemented the algorithms. P.No. and J.M. designed the algorithms and analyzed NGS data. L.A.R., A.K. and I.V. performed FISH experiments. P.Ne. participated in the data analysis and discussion of the results. J.M. supervised the work and drafted the manuscript. All authors contributed to preparation of the final manuscript.</p>
</ack>
<sec id="SEC6">
<title>SUPPLEMENTARY DATA</title>
<p>
<xref ref-type="supplementary-material" rid="sup1">Supplementary Data</xref>
are available at NAR Online.</p>
</sec>
<sec id="SEC7">
<title>FUNDING</title>
<p>Czech Ministry of Education, Youths and Sports [LM2015047]; Czech Science Foundation [BP501/12/G090]; Czech Academy of Sciences [RVO:60077344]. Funding for open access charge: Czech Ministry of Education, Youths and Sports [LM2015047].</p>
<p>
<italic>Conflict of interest statement</italic>
. None declared.</p>
</sec>
<ref-list>
<title>REFERENCES</title>
<ref id="B1">
<label>1.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Mészáros</surname>
<given-names>T.</given-names>
</name>
,
<name name-style="western">
<surname>Nouzová</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>PlantSat: a specialized database for plant satellite repeats</article-title>
.
<source>Bioinformatics</source>
.
<year>2002</year>
;
<volume>18</volume>
:
<fpage>28</fpage>
<lpage>35</lpage>
.
<pub-id pub-id-type="pmid">11836208</pub-id>
</mixed-citation>
</ref>
<ref id="B2">
<label>2.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Garrido-Ramos</surname>
<given-names>M.A.</given-names>
</name>
</person-group>
<article-title>Satellite DNA in plants: More than just rubbish</article-title>
.
<source>Cytogenet. Genome Res.</source>
<year>2015</year>
;
<volume>146</volume>
:
<fpage>153</fpage>
<lpage>170</lpage>
.
<pub-id pub-id-type="pmid">26202574</pub-id>
</mixed-citation>
</ref>
<ref id="B3">
<label>3.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Plohl</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Luchetti</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Meštrović</surname>
<given-names>N.</given-names>
</name>
,
<name name-style="western">
<surname>Mantovani</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>Satellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin</article-title>
.
<source>Gene</source>
.
<year>2008</year>
;
<volume>409</volume>
:
<fpage>72</fpage>
<lpage>82</lpage>
.
<pub-id pub-id-type="pmid">18182173</pub-id>
</mixed-citation>
</ref>
<ref id="B4">
<label>4.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ellegren</surname>
<given-names>H.</given-names>
</name>
</person-group>
<article-title>Microsatellites: simple sequences with complex evolution</article-title>
.
<source>Nat. Rev. Genet.</source>
<year>2004</year>
;
<volume>5</volume>
:
<fpage>435</fpage>
<lpage>445</lpage>
.
<pub-id pub-id-type="pmid">15153996</pub-id>
</mixed-citation>
</ref>
<ref id="B5">
<label>5.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Richard</surname>
<given-names>G.-F.</given-names>
</name>
,
<name name-style="western">
<surname>Kerrest</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Dujon</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>Comparative genomics and molecular dynamics of DNA repeats in eukaryotes</article-title>
.
<source>Microbiol. Mol. Biol. Rev.</source>
<year>2008</year>
;
<volume>72</volume>
:
<fpage>686</fpage>
<lpage>727</lpage>
.
<pub-id pub-id-type="pmid">19052325</pub-id>
</mixed-citation>
</ref>
<ref id="B6">
<label>6.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Plohl</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Meštrović</surname>
<given-names>N.</given-names>
</name>
,
<name name-style="western">
<surname>Mravinac</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>Centromere identity from the DNA point of view</article-title>
.
<source>Chromosoma</source>
.
<year>2014</year>
;
<volume>123</volume>
:
<fpage>313</fpage>
<lpage>325</lpage>
.
<pub-id pub-id-type="pmid">24763964</pub-id>
</mixed-citation>
</ref>
<ref id="B7">
<label>7.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fuchs</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Strehl</surname>
<given-names>S.</given-names>
</name>
,
<name name-style="western">
<surname>Brandes</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Schweizer</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>Schubert</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>Molecular-cytogenetic characterization of the
<italic>Vicia faba</italic>
genome – heterochromatin differentiation, replication patterns and sequence localization</article-title>
.
<source>Chromosom. Res.</source>
<year>1998</year>
;
<volume>6</volume>
:
<fpage>219</fpage>
<lpage>230</lpage>
.</mixed-citation>
</ref>
<ref id="B8">
<label>8.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Požárková</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Nouzová</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Two new families of tandem repeats isolated from genus
<italic>Vicia</italic>
using genomic self-priming PCR</article-title>
.
<source>Mol. Gen. Genet.</source>
<year>2000</year>
;
<volume>263</volume>
:
<fpage>741</fpage>
<lpage>751</lpage>
.
<pub-id pub-id-type="pmid">10905342</pub-id>
</mixed-citation>
</ref>
<ref id="B9">
<label>9.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Cai</surname>
<given-names>Z.</given-names>
</name>
,
<name name-style="western">
<surname>Liu</surname>
<given-names>H.</given-names>
</name>
,
<name name-style="western">
<surname>He</surname>
<given-names>Q.</given-names>
</name>
,
<name name-style="western">
<surname>Pu</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Chen</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Lai</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Li</surname>
<given-names>X.</given-names>
</name>
,
<name name-style="western">
<surname>Jin</surname>
<given-names>W.</given-names>
</name>
</person-group>
<article-title>Differential genome evolution and speciation of
<italic>Coix lacryma-jobi</italic>
L. and
<italic>Coix aquatica</italic>
Roxb. hybrid guangxi revealed by repetitive sequence analysis and fine karyotyping</article-title>
.
<source>BMC Genomics</source>
.
<year>2014</year>
;
<volume>15</volume>
:
<fpage>1025</fpage>
.
<pub-id pub-id-type="pmid">25425126</pub-id>
</mixed-citation>
</ref>
<ref id="B10">
<label>10.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Karyotype analysis of four Vicia species using
<italic>in situ</italic>
hybridization with repetitive sequences</article-title>
.
<source>Ann. Bot.</source>
<year>2003</year>
;
<volume>91</volume>
:
<fpage>921</fpage>
<lpage>926</lpage>
.
<pub-id pub-id-type="pmid">12770847</pub-id>
</mixed-citation>
</ref>
<ref id="B11">
<label>11.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kit</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Equilibrium sedimentation in density gradients of DNA preparations from animal tissues</article-title>
.
<source>J. Mol. Biol.</source>
<year>1961</year>
;
<volume>3</volume>
:
<fpage>711</fpage>
<lpage>716</lpage>
.
<pub-id pub-id-type="pmid">14456492</pub-id>
</mixed-citation>
</ref>
<ref id="B12">
<label>12.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Hemleben</surname>
<given-names>V.</given-names>
</name>
,
<name name-style="western">
<surname>Kovařík</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Torres-Ruiz</surname>
<given-names>R.A.</given-names>
</name>
,
<name name-style="western">
<surname>Volkov</surname>
<given-names>R.A.</given-names>
</name>
,
<name name-style="western">
<surname>Beridze</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>Plant highly repeated satellite DNA: molecular evolution, distribution and use for identification of hybrids</article-title>
.
<source>Syst. Biodivers.</source>
<year>2007</year>
;
<volume>5</volume>
:
<fpage>277</fpage>
<lpage>289</lpage>
.</mixed-citation>
</ref>
<ref id="B13">
<label>13.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Benson</surname>
<given-names>G.</given-names>
</name>
</person-group>
<article-title>Tandem Repeats Finder: a program to analyse DNA sequences</article-title>
.
<source>Nucleic Acids Res</source>
.
<year>1999</year>
;
<volume>27</volume>
:
<fpage>573</fpage>
<lpage>578</lpage>
.
<pub-id pub-id-type="pmid">9862982</pub-id>
</mixed-citation>
</ref>
<ref id="B14">
<label>14.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Glunčić</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Paar</surname>
<given-names>V.</given-names>
</name>
</person-group>
<article-title>Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm</article-title>
.
<source>Nucleic Acids Res.</source>
<year>2013</year>
;
<volume>41</volume>
:
<fpage>e17</fpage>
.
<pub-id pub-id-type="pmid">22977183</pub-id>
</mixed-citation>
</ref>
<ref id="B15">
<label>15.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Herzel</surname>
<given-names>H.</given-names>
</name>
,
<name name-style="western">
<surname>Weiss</surname>
<given-names>O.</given-names>
</name>
,
<name name-style="western">
<surname>Trifonov</surname>
<given-names>E.N.</given-names>
</name>
</person-group>
<article-title>10-11 bp periodicities in complete genomes reflect protein structure and DNA folding</article-title>
.
<source>Bioinformatics</source>
.
<year>1999</year>
;
<volume>15</volume>
:
<fpage>187</fpage>
<lpage>193</lpage>
.
<pub-id pub-id-type="pmid">10222405</pub-id>
</mixed-citation>
</ref>
<ref id="B16">
<label>16.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Koblížková</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Sequence homogenization and chromosomal localization of VicTR-B satellites differ between closely related Vicia species</article-title>
.
<source>Chromosoma</source>
.
<year>2006</year>
;
<volume>115</volume>
:
<fpage>437</fpage>
<lpage>447</lpage>
.
<pub-id pub-id-type="pmid">16788823</pub-id>
</mixed-citation>
</ref>
<ref id="B17">
<label>17.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Sharma</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>Issac</surname>
<given-names>B.</given-names>
</name>
,
<name name-style="western">
<surname>Raghava</surname>
<given-names>G.P.S.</given-names>
</name>
,
<name name-style="western">
<surname>Ramaswamy</surname>
<given-names>R.</given-names>
</name>
</person-group>
<article-title>Spectral repeat finders (SRF): identification of repetitive sequences using Fourier transformation</article-title>
.
<source>Bioinformatics</source>
.
<year>2004</year>
;
<volume>20</volume>
:
<fpage>1405</fpage>
<lpage>1412</lpage>
.
<pub-id pub-id-type="pmid">14976032</pub-id>
</mixed-citation>
</ref>
<ref id="B18">
<label>18.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Treangen</surname>
<given-names>T.J.</given-names>
</name>
,
<name name-style="western">
<surname>Salzberg</surname>
<given-names>S.L.</given-names>
</name>
</person-group>
<article-title>Repetitive DNA and next-generation sequencing: computational challenges and solutions</article-title>
.
<source>Nat. Rev. Genet.</source>
<year>2012</year>
;
<volume>13</volume>
:
<fpage>36</fpage>
<lpage>46</lpage>
.</mixed-citation>
</ref>
<ref id="B19">
<label>19.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data</article-title>
.
<source>BMC Bioinformatics</source>
.
<year>2010</year>
;
<volume>11</volume>
:
<fpage>378</fpage>
.
<pub-id pub-id-type="pmid">20633259</pub-id>
</mixed-citation>
</ref>
<ref id="B20">
<label>20.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Pech</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Steinhaisl</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads</article-title>
.
<source>Bioinformatics</source>
.
<year>2013</year>
;
<volume>29</volume>
:
<fpage>792</fpage>
<lpage>793</lpage>
.
<pub-id pub-id-type="pmid">23376349</pub-id>
</mixed-citation>
</ref>
<ref id="B21">
<label>21.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name name-style="western">
<surname>Weiss-Schneeweiss</surname>
<given-names>H.</given-names>
</name>
,
<name name-style="western">
<surname>Leitch</surname>
<given-names>A.R.</given-names>
</name>
,
<name name-style="western">
<surname>McCann</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Jang</surname>
<given-names>T.-S.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name name-style="western">
<surname>Hörandl</surname>
<given-names>E</given-names>
</name>
,
<name name-style="western">
<surname>Appelhans</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>Employing next generation sequencing to explore the repeat landscape of the plant genome</article-title>
.
<source>Next Generation Sequencing in Plant Systematics. Regnum Vegetabile 157</source>
.
<year>2015</year>
;
<volume>158</volume>
,
<publisher-loc>Königstein</publisher-loc>
:
<publisher-name>Koeltz Scientific Books</publisher-name>
<fpage>155</fpage>
<lpage>179</lpage>
.</mixed-citation>
</ref>
<ref id="B22">
<label>22.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Pagan</surname>
<given-names>H.J.T.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>McCulloch</surname>
<given-names>E.S.</given-names>
</name>
,
<name name-style="western">
<surname>Stevens</surname>
<given-names>R.D.</given-names>
</name>
,
<name name-style="western">
<surname>Ray</surname>
<given-names>D.A.</given-names>
</name>
</person-group>
<article-title>Survey sequencing reveals elevated DNA transposon activity, novel elements, and variation in repetitive landscapes among vesper bats</article-title>
.
<source>Genome Biol. Evol.</source>
<year>2012</year>
;
<volume>4</volume>
:
<fpage>575</fpage>
<lpage>585</lpage>
.
<pub-id pub-id-type="pmid">22491057</pub-id>
</mixed-citation>
</ref>
<ref id="B23">
<label>23.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>García</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Ríos</surname>
<given-names>N.</given-names>
</name>
,
<name name-style="western">
<surname>Gutiérrez</surname>
<given-names>V.</given-names>
</name>
</person-group>
<article-title>Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus
<italic>Austrolebias</italic>
(Cyprinodontiformes, Rivulidae)</article-title>
.
<source>Genetica</source>
.
<year>2015</year>
;
<volume>143</volume>
:
<fpage>353</fpage>
<lpage>360</lpage>
.
<pub-id pub-id-type="pmid">25792372</pub-id>
</mixed-citation>
</ref>
<ref id="B24">
<label>24.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Camacho</surname>
<given-names>J.P.M.</given-names>
</name>
,
<name name-style="western">
<surname>Ruiz-Ruano</surname>
<given-names>F.J.</given-names>
</name>
,
<name name-style="western">
<surname>Martín-Blázquez</surname>
<given-names>R.</given-names>
</name>
,
<name name-style="western">
<surname>López-León</surname>
<given-names>M.D.</given-names>
</name>
,
<name name-style="western">
<surname>Cabrero</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Lorite</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Cabral-de-Mello</surname>
<given-names>D.C.</given-names>
</name>
,
<name name-style="western">
<surname>Bakkali</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>A step to the gigantic genome of the desert locust: chromosome sizes and repeated DNAs</article-title>
.
<source>Chromosoma</source>
.
<year>2014</year>
;
<volume>124</volume>
:
<fpage>263</fpage>
<lpage>275</lpage>
.
<pub-id pub-id-type="pmid">25472934</pub-id>
</mixed-citation>
</ref>
<ref id="B25">
<label>25.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Schroeder-Reiter</surname>
<given-names>E.</given-names>
</name>
,
<name name-style="western">
<surname>Koblížková</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Steinbauerová</surname>
<given-names>V.</given-names>
</name>
,
<name name-style="western">
<surname>Chocholová</surname>
<given-names>E.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Wanner</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Stretching the rules: monocentric chromosomes with multiple centromere domains</article-title>
.
<source>PLoS Genet.</source>
<year>2012</year>
;
<volume>8</volume>
:
<fpage>e1002777</fpage>
.
<pub-id pub-id-type="pmid">22737088</pub-id>
</mixed-citation>
</ref>
<ref id="B26">
<label>26.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Marques</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Ribeiro</surname>
<given-names>T.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Schubert</surname>
<given-names>V.</given-names>
</name>
,
<name name-style="western">
<surname>Pellino</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Fuchs</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Ma</surname>
<given-names>W.</given-names>
</name>
,
<name name-style="western">
<surname>Kuhlmann</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Holocentromeres in
<italic>Rhynchospora</italic>
are associated with genome-wide centromere-specific repeat arrays interspersed among euchromatin</article-title>
.
<source>Proc. Natl. Acad. Sci. U.S.A.</source>
<year>2015</year>
;
<volume>112</volume>
:
<fpage>13633</fpage>
<lpage>13638</lpage>
.
<pub-id pub-id-type="pmid">26489653</pub-id>
</mixed-citation>
</ref>
<ref id="B27">
<label>27.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Heckmann</surname>
<given-names>S.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Kumke</surname>
<given-names>K.</given-names>
</name>
,
<name name-style="western">
<surname>Fuchs</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Schubert</surname>
<given-names>V.</given-names>
</name>
,
<name name-style="western">
<surname>Ma</surname>
<given-names>L.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Taudien</surname>
<given-names>S.</given-names>
</name>
,
<name name-style="western">
<surname>Platzer</surname>
<given-names>M.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The holocentric species
<italic>Luzula elegans</italic>
shows interplay between centromere and large-scale genome organization</article-title>
.
<source>Plant J.</source>
<year>2013</year>
;
<volume>73</volume>
:
<fpage>555</fpage>
<lpage>565</lpage>
.
<pub-id pub-id-type="pmid">23078243</pub-id>
</mixed-citation>
</ref>
<ref id="B28">
<label>28.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ruiz-Ruano</surname>
<given-names>F.J.</given-names>
</name>
,
<name name-style="western">
<surname>López-León</surname>
<given-names>M.D.</given-names>
</name>
,
<name name-style="western">
<surname>Cabrero</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Camacho</surname>
<given-names>J.P.M.</given-names>
</name>
</person-group>
<article-title>High-throughput analysis of the satellitome illuminates satellite DNA evolution</article-title>
.
<source>Sci. Rep.</source>
<year>2016</year>
;
<volume>6</volume>
:
<fpage>28333</fpage>
.
<pub-id pub-id-type="pmid">27385065</pub-id>
</mixed-citation>
</ref>
<ref id="B29">
<label>29.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Kejnovský</surname>
<given-names>E.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Koblížková</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Vyskot</surname>
<given-names>B.</given-names>
</name>
</person-group>
<article-title>Next generation sequencing-based analysis of repetitive DNA in the model dioecious plant
<italic>Silene latifolia</italic>
</article-title>
.
<source>PLoS One</source>
.
<year>2011</year>
;
<volume>6</volume>
:
<fpage>e27335</fpage>
.
<pub-id pub-id-type="pmid">22096552</pub-id>
</mixed-citation>
</ref>
<ref id="B30">
<label>30.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Pellicer</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Čížková</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Koblížková</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Fuková</surname>
<given-names>I.</given-names>
</name>
,
<name name-style="western">
<surname>Doležel</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Kelly</surname>
<given-names>L.J.</given-names>
</name>
,
<name name-style="western">
<surname>Leitch</surname>
<given-names>I.J.</given-names>
</name>
</person-group>
<article-title>In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae</article-title>
.
<source>PLoS One</source>
.
<year>2015</year>
;
<volume>10</volume>
:
<fpage>e0143424</fpage>
.
<pub-id pub-id-type="pmid">26606051</pub-id>
</mixed-citation>
</ref>
<ref id="B31">
<label>31.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Renny-Byfield</surname>
<given-names>S.</given-names>
</name>
,
<name name-style="western">
<surname>Kovařík</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Chester</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Nichols</surname>
<given-names>R.A.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Leitch</surname>
<given-names>A.R.</given-names>
</name>
</person-group>
<article-title>Independent, rapid and targeted loss of highly repetitive DNA in natural and synthetic allopolyploids of
<italic>Nicotiana tabacum</italic>
</article-title>
.
<source>PLoS One</source>
.
<year>2012</year>
;
<volume>7</volume>
:
<fpage>e36963</fpage>
.
<pub-id pub-id-type="pmid">22606317</pub-id>
</mixed-citation>
</ref>
<ref id="B32">
<label>32.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Jiang</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data</article-title>
.
<source>Bioinformatics</source>
.
<year>2010</year>
;
<volume>26</volume>
:
<fpage>2101</fpage>
<lpage>2108</lpage>
.
<pub-id pub-id-type="pmid">20616383</pub-id>
</mixed-citation>
</ref>
<ref id="B33">
<label>33.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Torres</surname>
<given-names>G.A.</given-names>
</name>
,
<name name-style="western">
<surname>Gong</surname>
<given-names>Z.</given-names>
</name>
,
<name name-style="western">
<surname>Iovene</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Hirsch</surname>
<given-names>C.D.</given-names>
</name>
,
<name name-style="western">
<surname>Buell</surname>
<given-names>C.R.</given-names>
</name>
,
<name name-style="western">
<surname>Bryan</surname>
<given-names>G.J.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Jiang</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Organization and evolution of subtelomeric satellite repeats in the potato genome</article-title>
.
<source>G3</source>
.
<year>2011</year>
;
<volume>1</volume>
:
<fpage>85</fpage>
<lpage>92</lpage>
.
<pub-id pub-id-type="pmid">22384321</pub-id>
</mixed-citation>
</ref>
<ref id="B34">
<label>34.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Blondel</surname>
<given-names>V.D.</given-names>
</name>
,
<name name-style="western">
<surname>Guillaume</surname>
<given-names>J.-L.</given-names>
</name>
,
<name name-style="western">
<surname>Lambiotte</surname>
<given-names>R.</given-names>
</name>
,
<name name-style="western">
<surname>Lefebvre</surname>
<given-names>E.</given-names>
</name>
</person-group>
<article-title>Fast unfolding of communities in large networks</article-title>
.
<source>J. Stat. Mech. Theory Exp.</source>
<year>2008</year>
;
<volume>10008</volume>
:
<fpage>6</fpage>
.</mixed-citation>
</ref>
<ref id="B35">
<label>35.</label>
<mixed-citation publication-type="book">
<person-group person-group-type="author">
<name name-style="western">
<surname>Wilson</surname>
<given-names>R.J.</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name name-style="western">
<surname>Wilson</surname>
<given-names>R.J.</given-names>
</name>
</person-group>
<source>Introduction to Graph Theory</source>
.
<year>1996</year>
;
<edition>4th edn</edition>
,
<publisher-name>Addison Wesley Longman Limited</publisher-name>
.</mixed-citation>
</ref>
<ref id="B36">
<label>36.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Zaslavsky</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>Signed graphs</article-title>
.
<source>Discret. Appl. Math.</source>
<year>1982</year>
;
<volume>4</volume>
:
<fpage>47</fpage>
<lpage>74</lpage>
.</mixed-citation>
</ref>
<ref id="B37">
<label>37.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fraley</surname>
<given-names>C.</given-names>
</name>
,
<name name-style="western">
<surname>Raftery</surname>
<given-names>A.E.</given-names>
</name>
</person-group>
<article-title>Model-based clustering, discriminant analysis, and densiy estimation</article-title>
.
<source>J. Am. Stat. Assoc.</source>
<year>2002</year>
;
<volume>97</volume>
:
<fpage>611</fpage>
<lpage>631</lpage>
.</mixed-citation>
</ref>
<ref id="B38">
<label>38.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Havecker</surname>
<given-names>E.R.</given-names>
</name>
,
<name name-style="western">
<surname>Gao</surname>
<given-names>X.</given-names>
</name>
,
<name name-style="western">
<surname>Voytas</surname>
<given-names>D.F.</given-names>
</name>
</person-group>
<article-title>The diversity of LTR retrotransposons</article-title>
.
<source>Genome Biol.</source>
<year>2004</year>
;
<volume>5</volume>
:
<fpage>225</fpage>
.
<pub-id pub-id-type="pmid">15186483</pub-id>
</mixed-citation>
</ref>
<ref id="B39">
<label>39.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Csardi</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Nepusz</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>The igraph software package for complex network research</article-title>
.
<source>Inter J. Compex Syst.</source>
<year>2006</year>
.</mixed-citation>
</ref>
<ref id="B40">
<label>40.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Afgan</surname>
<given-names>E.</given-names>
</name>
,
<name name-style="western">
<surname>Baker</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>van den Beek</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Blankenberg</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>Bouvier</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>Čech</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Chilton</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Clements</surname>
<given-names>D.</given-names>
</name>
,
<name name-style="western">
<surname>Coraor</surname>
<given-names>N.</given-names>
</name>
,
<name name-style="western">
<surname>Eberhard</surname>
<given-names>C.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update</article-title>
.
<source>Nucleic Acids Res.</source>
<year>2016</year>
;
<volume>44</volume>
:
<fpage>W3</fpage>
<lpage>W10</lpage>
.
<pub-id pub-id-type="pmid">27137889</pub-id>
</mixed-citation>
</ref>
<ref id="B41">
<label>41.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kato</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Albert</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Vega</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Birchler</surname>
<given-names>J.</given-names>
</name>
</person-group>
<article-title>Sensitive fluorescence
<italic>in situ</italic>
hybridization signal detection in maize using directly labeled probes produced by high concentration DNA polymerase nick translation</article-title>
.
<source>Biotech. Histochem.</source>
<year>2006</year>
;
<volume>81</volume>
:
<fpage>71</fpage>
<lpage>78</lpage>
.
<pub-id pub-id-type="pmid">16908431</pub-id>
</mixed-citation>
</ref>
<ref id="B42">
<label>42.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Repetitive DNA in the pea (
<italic>Pisum sativum</italic>
L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and
<italic>Medicago truncatula</italic>
</article-title>
.
<source>BMC Genomics</source>
.
<year>2007</year>
;
<volume>8</volume>
:
<fpage>427</fpage>
.
<pub-id pub-id-type="pmid">18031571</pub-id>
</mixed-citation>
</ref>
<ref id="B43">
<label>43.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Kato</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Yakura</surname>
<given-names>K.</given-names>
</name>
,
<name name-style="western">
<surname>Tanifuji</surname>
<given-names>S.</given-names>
</name>
</person-group>
<article-title>Sequence analysis of
<italic>Vicia faba</italic>
repeated DNA, the FokI repeat element</article-title>
.
<source>Nucleic Acids Res.</source>
<year>1984</year>
;
<volume>12</volume>
:
<fpage>6415</fpage>
<lpage>6426</lpage>
.
<pub-id pub-id-type="pmid">6089113</pub-id>
</mixed-citation>
</ref>
<ref id="B44">
<label>44.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fuchs</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Pich</surname>
<given-names>U.</given-names>
</name>
,
<name name-style="western">
<surname>Meister</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Schubert</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>Differentiation of field bean heterochromatin by
<italic>in situ</italic>
hybridization with a repeated
<italic>Fok</italic>
I sequence</article-title>
.
<source>Chromosom. Res.</source>
<year>1994</year>
;
<volume>2</volume>
:
<fpage>25</fpage>
<lpage>28</lpage>
.</mixed-citation>
</ref>
<ref id="B45">
<label>45.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ananiev</surname>
<given-names>E.V.</given-names>
</name>
,
<name name-style="western">
<surname>Phillips</surname>
<given-names>R.L.</given-names>
</name>
,
<name name-style="western">
<surname>Rines</surname>
<given-names>H.W.</given-names>
</name>
</person-group>
<article-title>Chromosome-specific molecular organization of maize (
<italic>Zea mays</italic>
L.) centromeric regions</article-title>
.
<source>Proc. Natl. Acad. Sci. U.S.A.</source>
<year>1998</year>
;
<volume>95</volume>
:
<fpage>13073</fpage>
<lpage>13078</lpage>
.
<pub-id pub-id-type="pmid">9789043</pub-id>
</mixed-citation>
</ref>
<ref id="B46">
<label>46.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ananiev</surname>
<given-names>E.V.</given-names>
</name>
,
<name name-style="western">
<surname>Phillips</surname>
<given-names>R.L.</given-names>
</name>
,
<name name-style="western">
<surname>Rines</surname>
<given-names>H.W.</given-names>
</name>
</person-group>
<article-title>A knob-associated tandem repeat in maize capable of forming fold-back DNA segments: are chromosome knobs megatransposons</article-title>
.
<source>Proc. Natl. Acad. Sci. U.S.A.</source>
<year>1998</year>
;
<volume>95</volume>
:
<fpage>10785</fpage>
<lpage>1090</lpage>
.
<pub-id pub-id-type="pmid">9724782</pub-id>
</mixed-citation>
</ref>
<ref id="B47">
<label>47.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Ananiev</surname>
<given-names>E.V.</given-names>
</name>
,
<name name-style="western">
<surname>Phillips</surname>
<given-names>R.L.</given-names>
</name>
,
<name name-style="western">
<surname>Rines</surname>
<given-names>H.W.</given-names>
</name>
</person-group>
<article-title>Complex structure of knob DNA on maize chromosome 9: retrotransposon invasion into heterochromatin</article-title>
.
<source>Genetics</source>
.
<year>1998</year>
;
<volume>149</volume>
:
<fpage>2025</fpage>
<lpage>2037</lpage>
.
<pub-id pub-id-type="pmid">9691055</pub-id>
</mixed-citation>
</ref>
<ref id="B48">
<label>48.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Maggini</surname>
<given-names>F.</given-names>
</name>
,
<name name-style="western">
<surname>Cremonini</surname>
<given-names>R.</given-names>
</name>
,
<name name-style="western">
<surname>Zolfino</surname>
<given-names>C.</given-names>
</name>
,
<name name-style="western">
<surname>Tucci</surname>
<given-names>G.F.</given-names>
</name>
,
<name name-style="western">
<surname>D’Ovidio</surname>
<given-names>R.</given-names>
</name>
,
<name name-style="western">
<surname>Delre</surname>
<given-names>V.</given-names>
</name>
,
<name name-style="western">
<surname>DePace</surname>
<given-names>C.</given-names>
</name>
,
<name name-style="western">
<surname>Scarascia Mugnozza</surname>
<given-names>G.T.</given-names>
</name>
,
<name name-style="western">
<surname>Cionini</surname>
<given-names>P.G.</given-names>
</name>
</person-group>
<article-title>Structure and chromosomal localization of DNA sequences related to ribosomal subrepeats in
<italic>Vicia faba</italic>
</article-title>
.
<source>Chromosoma</source>
.
<year>1991</year>
;
<volume>100</volume>
:
<fpage>229</fpage>
<lpage>234</lpage>
.
<pub-id pub-id-type="pmid">2055134</pub-id>
</mixed-citation>
</ref>
<ref id="B49">
<label>49.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Koblížková</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
</person-group>
<article-title>Hypervariable 3΄ UTR region of plant LTR-retrotransposons as a source of novel satellite repeats</article-title>
.
<source>Gene</source>
.
<year>2009</year>
;
<volume>448</volume>
:
<fpage>198</fpage>
<lpage>206</lpage>
.
<pub-id pub-id-type="pmid">19563868</pub-id>
</mixed-citation>
</ref>
<ref id="B50">
<label>50.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Schaper</surname>
<given-names>E.</given-names>
</name>
,
<name name-style="western">
<surname>Kajava</surname>
<given-names>A. V.</given-names>
</name>
,
<name name-style="western">
<surname>Hauser</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Anisimova</surname>
<given-names>M.</given-names>
</name>
</person-group>
<article-title>Repeat or not repeat? Statistical validation of tandem repeat prediction in genomic sequences</article-title>
.
<source>Nucleic Acids Res.</source>
<year>2012</year>
;
<volume>40</volume>
:
<fpage>10005</fpage>
<lpage>10017</lpage>
.
<pub-id pub-id-type="pmid">22923522</pub-id>
</mixed-citation>
</ref>
<ref id="B51">
<label>51.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Lim</surname>
<given-names>K.G.</given-names>
</name>
,
<name name-style="western">
<surname>Kwoh</surname>
<given-names>C.K.</given-names>
</name>
,
<name name-style="western">
<surname>Hsu</surname>
<given-names>L.Y.</given-names>
</name>
,
<name name-style="western">
<surname>Wirawan</surname>
<given-names>A.</given-names>
</name>
</person-group>
<article-title>Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance</article-title>
.
<source>Brief. Bioinform.</source>
<year>2013</year>
;
<volume>14</volume>
:
<fpage>67</fpage>
<lpage>81</lpage>
.
<pub-id pub-id-type="pmid">22648964</pub-id>
</mixed-citation>
</ref>
<ref id="B52">
<label>52.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fertin</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Jean</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Radulescu</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Rusu</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>DExTaR: detection of exact tandem repeats based on the de Bruijn graph</article-title>
.
<source>Proc. - 2014 IEEE Int. Conf. Bioinforma. Biomed. IEEE BIBM 2014</source>
.
<year>2014</year>
;
<comment>doi:10.1109/BIBM.2014.6999134</comment>
.</mixed-citation>
</ref>
<ref id="B53">
<label>53.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Fertin</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Jean</surname>
<given-names>G.</given-names>
</name>
,
<name name-style="western">
<surname>Radulescu</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Rusu</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>Hybrid de novo tandem repeat detection using short and long reads</article-title>
.
<source>BMC Med. Genomics</source>
.
<year>2015</year>
;
<volume>8</volume>
(
<issue>Suppl. 3</issue>
):
<fpage>S5</fpage>
.</mixed-citation>
</ref>
<ref id="B54">
<label>54.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Simpson</surname>
<given-names>J.T.</given-names>
</name>
,
<name name-style="western">
<surname>Wong</surname>
<given-names>K.</given-names>
</name>
,
<name name-style="western">
<surname>Jackman</surname>
<given-names>S.D.</given-names>
</name>
,
<name name-style="western">
<surname>Schein</surname>
<given-names>J.E.</given-names>
</name>
,
<name name-style="western">
<surname>Jones</surname>
<given-names>S.J.M.</given-names>
</name>
,
<name name-style="western">
<surname>Birol</surname>
<given-names>I.</given-names>
</name>
</person-group>
<article-title>ABySS: a parallel assembler for short read sequence data</article-title>
.
<source>Genome Res.</source>
<year>2009</year>
;
<volume>19</volume>
:
<fpage>1117</fpage>
<lpage>1123</lpage>
.
<pub-id pub-id-type="pmid">19251739</pub-id>
</mixed-citation>
</ref>
<ref id="B55">
<label>55.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Gong</surname>
<given-names>Z.</given-names>
</name>
,
<name name-style="western">
<surname>Wu</surname>
<given-names>Y.</given-names>
</name>
,
<name name-style="western">
<surname>Koblížková</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Torres</surname>
<given-names>G.A.</given-names>
</name>
,
<name name-style="western">
<surname>Wang</surname>
<given-names>K.</given-names>
</name>
,
<name name-style="western">
<surname>Iovene</surname>
<given-names>M.</given-names>
</name>
,
<name name-style="western">
<surname>Neumann</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Zhang</surname>
<given-names>W.</given-names>
</name>
,
<name name-style="western">
<surname>Novák</surname>
<given-names>P.</given-names>
</name>
,
<name name-style="western">
<surname>Buell</surname>
<given-names>C.R.</given-names>
</name>
<etal></etal>
</person-group>
<article-title>Repeatless and repeat-based centromeres in potato: implications for centromere evolution</article-title>
.
<source>Plant Cell</source>
.
<year>2012</year>
;
<volume>24</volume>
:
<fpage>3559</fpage>
<lpage>3574</lpage>
.
<pub-id pub-id-type="pmid">22968715</pub-id>
</mixed-citation>
</ref>
<ref id="B56">
<label>56.</label>
<mixed-citation publication-type="journal">
<person-group person-group-type="author">
<name name-style="western">
<surname>Macas</surname>
<given-names>J.</given-names>
</name>
,
<name name-style="western">
<surname>Navrátilová</surname>
<given-names>A.</given-names>
</name>
,
<name name-style="western">
<surname>Mészáros</surname>
<given-names>T.</given-names>
</name>
</person-group>
<article-title>Sequence subfamilies of satellite repeats related to rDNA intergenic spacer are differentially amplified on
<italic>Vicia sativa</italic>
chromosomes</article-title>
.
<source>Chromosoma</source>
.
<year>2003</year>
;
<volume>112</volume>
:
<fpage>152</fpage>
<lpage>158</lpage>
.
<pub-id pub-id-type="pmid">14579131</pub-id>
</mixed-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F580 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000F580 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021