Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.
***** Acces problem to record *****\

Identifieur interne : 0002949 ( Pmc/Corpus ); précédent : 0002948; suivant : 0002950 ***** probable Xml problem with record *****

Links to Exploration step


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Efficient sequential and parallel algorithms for finding edit distance based motifs</title>
<author>
<name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Xiao, Peng" sort="Xiao, Peng" uniqKey="Xiao P" first="Peng" last="Xiao">Peng Xiao</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation>
<nlm:aff id="Aff2">Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</nlm:aff>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">27557423</idno>
<idno type="pmc">5001234</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001234</idno>
<idno type="RBID">PMC:5001234</idno>
<idno type="doi">10.1186/s12864-016-2789-9</idno>
<date when="2016">2016</date>
<idno type="wicri:Area/Pmc/Corpus">000294</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000294</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Efficient sequential and parallel algorithms for finding edit distance based motifs</title>
<author>
<name sortKey="Pal, Soumitra" sort="Pal, Soumitra" uniqKey="Pal S" first="Soumitra" last="Pal">Soumitra Pal</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Xiao, Peng" sort="Xiao, Peng" uniqKey="Xiao P" first="Peng" last="Xiao">Peng Xiao</name>
<affiliation>
<nlm:aff id="Aff1">Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</nlm:aff>
</affiliation>
</author>
<author>
<name sortKey="Rajasekaran, Sanguthevar" sort="Rajasekaran, Sanguthevar" uniqKey="Rajasekaran S" first="Sanguthevar" last="Rajasekaran">Sanguthevar Rajasekaran</name>
<affiliation>
<nlm:aff id="Aff2">Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</nlm:aff>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Genomics</title>
<idno type="eISSN">1471-2164</idno>
<imprint>
<date when="2016">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (
<italic>l,d</italic>
)
<italic>Edit-distance-based Motif Search (EMS)</italic>
problem: given two integers
<italic>l,d</italic>
and
<italic>n</italic>
biological strings, find all strings of length
<italic>l</italic>
that appear in each input string with atmost
<italic>d</italic>
errors of types substitution, insertion and deletion.</p>
</sec>
<sec>
<title>Methods</title>
<p>One popular technique to solve the problem is to explore for each input string the set of all possible
<italic>l</italic>
-mers that belong to the
<italic>d</italic>
-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly
<italic>d</italic>
. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs.</p>
</sec>
<sec>
<title>Results</title>
<p>The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-016-2789-9) contains supplementary material, which is available to authorized users.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nicolae, M" uniqKey="Nicolae M">M Nicolae</name>
</author>
<author>
<name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tanaka, S" uniqKey="Tanaka S">S Tanaka</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yu, Q" uniqKey="Yu Q">Q Yu</name>
</author>
<author>
<name sortKey="Huo, H" uniqKey="Huo H">H Huo</name>
</author>
<author>
<name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
<author>
<name sortKey="Guo, H" uniqKey="Guo H">H Guo</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Karlin, S" uniqKey="Karlin S">S Karlin</name>
</author>
<author>
<name sortKey="Ost, F" uniqKey="Ost F">F Ost</name>
</author>
<author>
<name sortKey="Blaisdell, Be" uniqKey="Blaisdell B">BE Blaisdell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rocke, E" uniqKey="Rocke E">E Rocke</name>
</author>
<author>
<name sortKey="Tompa, M" uniqKey="Tompa M">M Tompa</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sagot, Mf" uniqKey="Sagot M">MF Sagot</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lanctot, Jk" uniqKey="Lanctot J">JK Lanctot</name>
</author>
<author>
<name sortKey="Li, M" uniqKey="Li M">M Li</name>
</author>
<author>
<name sortKey="Ma, B" uniqKey="Ma B">B Ma</name>
</author>
<author>
<name sortKey="Wang, S" uniqKey="Wang S">S Wang</name>
</author>
<author>
<name sortKey="Zhang, L" uniqKey="Zhang L">L Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Adebiyi, Ef" uniqKey="Adebiyi E">EF Adebiyi</name>
</author>
<author>
<name sortKey="Kaufmann, M" uniqKey="Kaufmann M">M Kaufmann</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Wang, X" uniqKey="Wang X">X Wang</name>
</author>
<author>
<name sortKey="Miao, Y" uniqKey="Miao Y">Y Miao</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
<author>
<name sortKey="Balla, S" uniqKey="Balla S">S Balla</name>
</author>
<author>
<name sortKey="Huang, Ch" uniqKey="Huang C">CH Huang</name>
</author>
<author>
<name sortKey="Thapar, V" uniqKey="Thapar V">V Thapar</name>
</author>
<author>
<name sortKey="Gryk, M" uniqKey="Gryk M">M Gryk</name>
</author>
<author>
<name sortKey="Maciejewski, M" uniqKey="Maciejewski M">M Maciejewski</name>
</author>
<author>
<name sortKey="Schiller, M" uniqKey="Schiller M">M Schiller</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pathak, S" uniqKey="Pathak S">S Pathak</name>
</author>
<author>
<name sortKey="Rajasekaran, S" uniqKey="Rajasekaran S">S Rajasekaran</name>
</author>
<author>
<name sortKey="Nicolae, M" uniqKey="Nicolae M">M Nicolae</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Knuth, De" uniqKey="Knuth D">DE Knuth</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<pmc article-type="research-article">
<pmc-dir>properties open_access</pmc-dir>
<front>
<journal-meta>
<journal-id journal-id-type="nlm-ta">BMC Genomics</journal-id>
<journal-id journal-id-type="iso-abbrev">BMC Genomics</journal-id>
<journal-title-group>
<journal-title>BMC Genomics</journal-title>
</journal-title-group>
<issn pub-type="epub">1471-2164</issn>
<publisher>
<publisher-name>BioMed Central</publisher-name>
<publisher-loc>London</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="pmid">27557423</article-id>
<article-id pub-id-type="pmc">5001234</article-id>
<article-id pub-id-type="publisher-id">2789</article-id>
<article-id pub-id-type="doi">10.1186/s12864-016-2789-9</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Efficient sequential and parallel algorithms for finding edit distance based motifs</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname>Pal</surname>
<given-names>Soumitra</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Xiao</surname>
<given-names>Peng</given-names>
</name>
<xref ref-type="aff" rid="Aff1">1</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Rajasekaran</surname>
<given-names>Sanguthevar</given-names>
</name>
<address>
<email>rajasek@engr.uconn.edu</email>
</address>
<xref ref-type="aff" rid="Aff2">2</xref>
</contrib>
<aff id="Aff1">
<label>1</label>
Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</aff>
<aff id="Aff2">
<label>2</label>
Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road,, Storrs, 06269 CT USA</aff>
</contrib-group>
<pub-date pub-type="epub">
<day>18</day>
<month>8</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="pmc-release">
<day>18</day>
<month>8</month>
<year>2016</year>
</pub-date>
<pub-date pub-type="collection">
<year>2016</year>
</pub-date>
<volume>17</volume>
<issue>Suppl 4</issue>
<issue-sponsor>Publication of this supplement has not been supported by sponsorship. Information about the source of funding for publication charges can be found in the individual articles. The articles have undergone the journal's standard peer review process for supplements. The Supplement Editor declares that they have no competing interests.</issue-sponsor>
<elocation-id>465</elocation-id>
<permissions>
<copyright-statement>© The Author(s) 2016</copyright-statement>
<license license-type="OpenAccess">
<license-p>
<bold>Open Access</bold>
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</ext-link>
), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (
<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/">http://creativecommons.org/publicdomain/zero/1.0/</ext-link>
) applies to the data made available in this article, unless otherwise stated.</license-p>
</license>
</permissions>
<abstract id="Abs1">
<sec>
<title>Background</title>
<p>Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (
<italic>l,d</italic>
)
<italic>Edit-distance-based Motif Search (EMS)</italic>
problem: given two integers
<italic>l,d</italic>
and
<italic>n</italic>
biological strings, find all strings of length
<italic>l</italic>
that appear in each input string with atmost
<italic>d</italic>
errors of types substitution, insertion and deletion.</p>
</sec>
<sec>
<title>Methods</title>
<p>One popular technique to solve the problem is to explore for each input string the set of all possible
<italic>l</italic>
-mers that belong to the
<italic>d</italic>
-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly
<italic>d</italic>
. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs.</p>
</sec>
<sec>
<title>Results</title>
<p>The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).</p>
</sec>
<sec>
<title>Electronic supplementary material</title>
<p>The online version of this article (doi:10.1186/s12864-016-2789-9) contains supplementary material, which is available to authorized users.</p>
</sec>
</abstract>
<kwd-group xml:lang="en">
<title>Keywords</title>
<kwd>Motif</kwd>
<kwd>Edit distance</kwd>
<kwd>Trie</kwd>
<kwd>Radix sort</kwd>
</kwd-group>
<conference>
<conf-name>IEEE International Conference on Bioinformatics and Biomedicine 2015</conf-name>
<conf-loc>Washington, DC, USA</conf-loc>
<conf-date>9-12 November 2015</conf-date>
<string-conf>
<uri>http://cci.drexel.edu/ieeebibm/bibm2015/</uri>
</string-conf>
</conference>
<custom-meta-group>
<custom-meta>
<meta-name>issue-copyright-statement</meta-name>
<meta-value>© The Author(s) 2016</meta-value>
</custom-meta>
</custom-meta-group>
</article-meta>
</front>
<body>
<sec id="Sec1">
<title>Background</title>
<p>Motif search has applications in solving such crucial problems as identification of alternative splicing sites, determination of open reading frames, identification of promoter elements of genes, identification of transcription factors and their binding sites, etc. (see e.g., Nicolae and Rajasekaran [
<xref ref-type="bibr" rid="CR1">1</xref>
]). There are many formulations of the motif search problem. A widely studied formulation is known as (
<italic>l,d</italic>
)-motif search or Planted Motif Search (PMS) [
<xref ref-type="bibr" rid="CR2">2</xref>
]. Given two integers
<italic>l,d</italic>
and
<italic>n</italic>
biological strings the problem is to find all strings of length
<italic>l</italic>
that appear in each of the
<italic>n</italic>
input strings with atmost
<italic>d</italic>
mismatches. There is a significant amount of work in the literature on PMS (see e.g., [
<xref ref-type="bibr" rid="CR1">1</xref>
,
<xref ref-type="bibr" rid="CR3">3</xref>
<xref ref-type="bibr" rid="CR5">5</xref>
], and so on).</p>
<p>PMS considers only point mutations as events of divergence in biological sequences. However, insertions and deletions also play important roles in divergence [
<xref ref-type="bibr" rid="CR2">2</xref>
,
<xref ref-type="bibr" rid="CR6">6</xref>
]. Therefore, researchers have also considered a formulation in which the Levenshtein distance (or edit distance), instead of mismatches, is used for measuring the degree of divergence [
<xref ref-type="bibr" rid="CR7">7</xref>
,
<xref ref-type="bibr" rid="CR8">8</xref>
]. Given
<italic>n</italic>
strings
<italic>S</italic>
<sup>(1)</sup>
,
<italic>S</italic>
<sup>(2)</sup>
,…,
<italic>S</italic>
<sup>(
<italic>n</italic>
)</sup>
, each of length
<italic>m</italic>
from a fixed alphabet
<italic>Σ</italic>
, and integers
<italic>l,d</italic>
, the
<italic>Edit-distance-based Motif Search (EMS)</italic>
problem is to find all patterns
<italic>M</italic>
of length
<italic>l</italic>
that occur in atleast one position in each
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
with an edit distance of atmost
<italic>d</italic>
. More formally,
<italic>M</italic>
is a motif if and only if ∀
<italic>i</italic>
, there exist
<italic>k</italic>
∈ [
<italic>l</italic>
<italic>d,l</italic>
+
<italic>d</italic>
],
<italic>j</italic>
∈ [ 1,
<italic>m</italic>
<italic>k</italic>
+1] such that for the substring
<inline-formula id="IEq1">
<alternatives>
<tex-math id="M1">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S^{(i)}_{j,k}$\end{document}</tex-math>
<mml:math id="M2">
<mml:msubsup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq1.gif"></inline-graphic>
</alternatives>
</inline-formula>
of length
<italic>k</italic>
at position
<italic>j</italic>
of
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
,
<inline-formula id="IEq2">
<alternatives>
<tex-math id="M3">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$ED\left (S^{(i)}_{j,k},M\right) \le d$\end{document}</tex-math>
<mml:math id="M4">
<mml:mtext mathvariant="italic">ED</mml:mtext>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:mi>M</mml:mi>
</mml:mrow>
</mml:mfenced>
<mml:mo></mml:mo>
<mml:mi>d</mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq2.gif"></inline-graphic>
</alternatives>
</inline-formula>
. Here
<italic>E</italic>
<italic>D</italic>
(
<italic>X,Y</italic>
) stands for the edit distance between two strings
<italic>X</italic>
and
<italic>Y</italic>
.</p>
<p>EMS is also NP-hard since PMS is a special case of EMS and PMS is known to be NP-hard [
<xref ref-type="bibr" rid="CR9">9</xref>
]. As a result, any exact algorithm for EMS that finds all the motifs for a given input can be expected to have an exponential (in some of the parameters) worst case runtime. One of the earliest EMS algorithms is due to Rocke and Tompa [
<xref ref-type="bibr" rid="CR7">7</xref>
] and is based on Gibbs Sampling which requires repeated searching of the motifs in a constantly evolving collection of aligned strings, and each search pass requires
<italic>O</italic>
(
<italic>n</italic>
<italic>l</italic>
) time. This is an approximate algorithm. Sagot [
<xref ref-type="bibr" rid="CR8">8</xref>
] gave a suffix tree based exact algorithm that takes
<italic>O</italic>
(
<italic>n</italic>
<sup>2</sup>
<italic>m</italic>
<italic>l</italic>
<sup>
<italic>d</italic>
</sup>
|
<italic>Σ</italic>
|
<sup>
<italic>d</italic>
</sup>
) time and
<italic>O</italic>
(
<italic>n</italic>
<sup>2</sup>
<italic>m</italic>
/
<italic>w</italic>
) space where
<italic>w</italic>
is the word length of the computer. Adebiyi and Kaufmann [
<xref ref-type="bibr" rid="CR10">10</xref>
] proposed an exact algorithm with an expected runtime of
<italic>O</italic>
(
<italic>n</italic>
<italic>m</italic>
+
<italic>d</italic>
(
<italic>n</italic>
<italic>m</italic>
)
<sup>(1+
<italic>p</italic>
<italic>o</italic>
<italic>w</italic>
(
<italic>ε</italic>
))</sup>
log
<italic>n</italic>
<italic>m</italic>
) where
<italic>ε</italic>
=
<italic>d</italic>
/
<italic>l</italic>
and
<italic>p</italic>
<italic>o</italic>
<italic>w</italic>
(
<italic>ε</italic>
) is an increasing concave function. The value of
<italic>p</italic>
<italic>o</italic>
<italic>w</italic>
(
<italic>ε</italic>
) is roughly 0.9 for protein and DNA sequences. Wang and Miao [
<xref ref-type="bibr" rid="CR11">11</xref>
] gave an expectation minimization based heuristic genetic algorithm.</p>
<p>Rajasekaran et al. [
<xref ref-type="bibr" rid="CR12">12</xref>
] proposed a simpler Deterministic Motif Search (DMS) that has the same worst case time complexity as the algorithm by Sagot [
<xref ref-type="bibr" rid="CR8">8</xref>
]. The algorithm generates and stores the neighborhood of every substring of length in the range [
<italic>l</italic>
<italic>d,l</italic>
+
<italic>d</italic>
] of every input string and using a radix sort based method, outputs the neighbors that are common to atleast one substring of each input string. This algorithm was implemented by Pathak et al. [
<xref ref-type="bibr" rid="CR13">13</xref>
].</p>
<p>Following a useful practice for PMS algorithms, Pathak et al. [
<xref ref-type="bibr" rid="CR13">13</xref>
] evaluated their algorithm on certain instances that are considered challenging for PMS: (9,2), (11,3), (13,4) and so on [
<xref ref-type="bibr" rid="CR1">1</xref>
], and are generated as follows:
<italic>n</italic>
=20 random DNA/protein strings of length
<italic>m</italic>
=600, and a short random string
<italic>M</italic>
of length
<italic>l</italic>
are generated according to the independent identically distributed (i.i.d) model. A separate random
<italic>d</italic>
-hamming distance neighbor of
<italic>M</italic>
is “planted” in each of the
<italic>n</italic>
input strings. Such an (
<italic>l,d</italic>
) instance is defined to be a
<italic>challenging instance</italic>
if
<italic>l</italic>
is the largest integer for which the expected number of spurious motifs,
<italic>i.e.</italic>
, the motifs that would occur in the input by random chance, is atleast 1.</p>
<p>The expected number of spurious motifs in EMS are different from those in PMS. Table
<xref rid="Tab1" ref-type="table">1</xref>
shows the expected number of spurious motifs for
<italic>l</italic>
∈ [ 5,21] and
<italic>d</italic>
upto max{
<italic>l</italic>
−2,13},
<italic>n</italic>
=20,
<italic>m</italic>
=600 and
<italic>Σ</italic>
={
<italic>A,C,G,T</italic>
} [see Additional file
<xref rid="MOESM1" ref-type="media">1</xref>
]. The challenging instances for EMS turn out to be (8,1), (12,2), (16,3), (20,4) and so on. To compare with [
<xref ref-type="bibr" rid="CR13">13</xref>
], we consider both types of instances, specifically, (8,1), (9,2), (11,3), (12,2), (13,4) and (16,3).
<table-wrap id="Tab1">
<label>Table 1</label>
<caption>
<p>Expected number of spurious motifs in random instances for
<italic>n</italic>
=20,
<italic>m</italic>
=600. Here,
<italic></italic>
represents value ≥1.0
<italic>e</italic>
+7</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">
<italic>l</italic>
</th>
<th align="left">
<italic>d</italic>
=0</th>
<th align="left">1</th>
<th align="left">2</th>
<th align="left">3</th>
<th align="left">4</th>
<th align="left">5</th>
<th align="left">6</th>
<th align="left">7</th>
<th align="left">8</th>
<th align="left">9</th>
<th align="left">10</th>
<th align="left">11</th>
<th align="left">12</th>
<th align="left">13</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">5</td>
<td align="left">0.0</td>
<td align="left">1024.0</td>
<td align="left">1024.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">0.0</td>
<td align="left">4096.0</td>
<td align="left">4096.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">0.0</td>
<td align="left">14141.8</td>
<td align="left">16384.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">0.0</td>
<td align="left">
<bold>225.8</bold>
</td>
<td align="left">65536.0</td>
<td align="left">65536.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">9</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">262144.0</td>
<td align="left">262144.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">10</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">1047003.6</td>
<td align="left">1048576.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">11</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">1332519.5</td>
<td align="left">4194304.0</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">12</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">
<bold>294.7</bold>
</td>
<td align="left">1.678e+07</td>
<td align="left">1.678e+07</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">13</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">6.711e+07</td>
<td align="left">6.711e+07</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
<td align="left"></td>
</tr>
<tr>
<td align="left">14</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">2.517e+08</td>
<td align="left">2.684e+08</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left"></td>
</tr>
<tr>
<td align="left">15</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">2.749e+07</td>
<td align="left">1.074e+09</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
<tr>
<td align="left">16</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">
<bold>139.1</bold>
</td>
<td align="left">4.295e+09</td>
<td align="left">4.295e+09</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
<tr>
<td align="left">17</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">1.718e+10</td>
<td align="left">1.718e+10</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
<tr>
<td align="left">18</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">3.965e+10</td>
<td align="left">6.872e+10</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
<tr>
<td align="left">19</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">1.226e+08</td>
<td align="left">2.749e+11</td>
<td align="left">2.749e+11</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
<tr>
<td align="left">20</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">
<bold>35.8</bold>
</td>
<td align="left">1.100e+12</td>
<td align="left">1.100e+12</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
<tr>
<td align="left">21</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">0.0</td>
<td align="left">4.333e+12</td>
<td align="left">4.398e+12</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
<td align="left">
<italic></italic>
</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>The instances in bold represents challenging instances</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>The sequential algorithm by Pathak et al. [
<xref ref-type="bibr" rid="CR13">13</xref>
] solves the moderately hard instance (11,3) in a few hours and does not solve the next difficult instance (13,4) even after 3 days. A key time-consuming part of the algorithm is in the generation of the edit distance neighborhood of all substrings as there are many common neighbors.</p>
<sec id="Sec2">
<title>Contributions</title>
<p>In this paper we present several improved algorithms for EMS that solve instance (11,3) in less than a couple of minutes and instance (13,4) in less than a couple of hours. On (16,3) our algorithm is more than 20 times faster than EMS1. Our algorithm uses an efficient technique (introduced in this paper) to generate the edit distance neighborhood of length
<italic>l</italic>
with distance atmost
<italic>d</italic>
of all substrings of an input string. Our parallel algorithm in the multi-core shared memory setting has more than 600 % scaling performance on 16 threads. Our approach uses following five ideas which can be applied to other motif search problems as well:</p>
<p>
<bold>Efficient neighborhood generation:</bold>
We show that it is enough to consider the neighbors which are at a distance exactly
<italic>d</italic>
from all possible substrings of the input strings. This works because the neighbors at a lesser distance are also included in the neighborhood of some other substrings.</p>
<p>
<bold>Compact representation using wildcard characters:</bold>
We represent all possible neighbors which are due to an insertion or a substitution at a position by a single neighbor using a wildcard character at the same position. This compact representation of the candidate motifs in the neighborhood requires less space.</p>
<p>
<bold>Avoiding duplication of candidate motifs:</bold>
Our algorithm uses several rules to avoid duplication in candidate motifs and we prove that our technique generates neighborhood that is nearly duplication free. In other words, our neighborhood generation technique does not spend a lot of time generating neighbors that have already been generated.</p>
<p>
<bold>Trie based data structure for storing compact motifs:</bold>
We use a trie based data structure to efficiently store the neighborhood. This not only simplifies the removal of duplicate neighbors but also helps in outputting the final motifs in sorted order using a depth first search traversal of the trie.</p>
<p>
<bold>Modified radix-sort for compact motifs:</bold>
Our parallel algorithm stores the compact motifs in an array and uses a modified radix-sort algorithm to sort them. Use of arrays instead of tries simplifies updating the set of candidate motifs by multiple threads.</p>
</sec>
</sec>
<sec id="Sec3">
<title>Methods</title>
<p>In this section we introduce some notations and observations.</p>
<p>An (
<italic>l,d</italic>
)-friend of a
<italic>k</italic>
-mer
<italic>L</italic>
is an
<italic>l</italic>
-mer at an exact distance of
<italic>d</italic>
from
<italic>L</italic>
. Let
<italic>F</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>L</italic>
) denote the set of all (
<italic>l,d</italic>
)-friends of
<italic>L</italic>
. An (
<italic>l,d</italic>
)
<italic>-neighbor</italic>
of a
<italic>k</italic>
-mer
<italic>L</italic>
is an
<italic>l</italic>
-mer at a distance of atmost
<italic>d</italic>
from
<italic>L</italic>
. Let
<italic>N</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>L</italic>
) denote the set of all (
<italic>l,d</italic>
)-neighbors of
<italic>L</italic>
. Then
<disp-formula id="Equ1">
<label>1</label>
<alternatives>
<tex-math id="M5">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} N_{l,d}(L) = \cup_{t = 0}^{d} F_{l,t}(L). \end{array} $$ \end{document}</tex-math>
<mml:math id="M6">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ1.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>For a string
<italic>S</italic>
of length
<italic>m</italic>
, an (
<italic>l,d</italic>
)
<italic>-motif</italic>
of
<italic>S</italic>
is an
<italic>l</italic>
-mer at a distance atmost
<italic>d</italic>
from some substring of
<italic>S</italic>
. Thus an (
<italic>l,d</italic>
)-motif of
<italic>S</italic>
is an (
<italic>l,d</italic>
)-neighbor of atleast one substring
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
=
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
<italic>S</italic>
<sub>
<italic>j</italic>
+1</sub>
<italic>S</italic>
<sub>
<italic>j</italic>
+
<italic>k</italic>
−1</sub>
where
<italic>k</italic>
∈[
<italic>l</italic>
<italic>d,l</italic>
+
<italic>d</italic>
]. Therefore, the set of (
<italic>l,d</italic>
)-motifs of
<italic>S</italic>
, denoted by
<italic>M</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>S</italic>
), is given by
<disp-formula id="Equ2">
<label>2</label>
<alternatives>
<tex-math id="M7">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} M_{l,d}(S) = \cup_{k=l-d}^{l+d} \cup_{j=1}^{m-k+1} N_{l,d}(S_{j,k}). \end{array} $$ \end{document}</tex-math>
<mml:math id="M8">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo></mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>N</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ2.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>For a collection of strings
<inline-formula id="IEq3">
<alternatives>
<tex-math id="M9">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {S} = \{S^{(1)}, S^{(2)}, \ldots, S^{(m)}\}$\end{document}</tex-math>
<mml:math id="M10">
<mml:mi mathvariant="script">S</mml:mi>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>}</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq3.gif"></inline-graphic>
</alternatives>
</inline-formula>
, a (common) (
<italic>l,d</italic>
)-motif is an
<italic>l</italic>
-mer at a distance atmost
<italic>d</italic>
from atleast one substring of each
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
. Thus the set of (common) (
<italic>l,d</italic>
)-motifs of
<inline-formula id="IEq4">
<alternatives>
<tex-math id="M11">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {S}$\end{document}</tex-math>
<mml:math id="M12">
<mml:mi mathvariant="script">S</mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq4.gif"></inline-graphic>
</alternatives>
</inline-formula>
, denoted by
<inline-formula id="IEq5">
<alternatives>
<tex-math id="M13">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$M_{l,d}(\mathcal {S})$\end{document}</tex-math>
<mml:math id="M14">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">S</mml:mi>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq5.gif"></inline-graphic>
</alternatives>
</inline-formula>
, is given by
<disp-formula id="Equ3">
<label>3</label>
<alternatives>
<tex-math id="M15">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} M_{l,d}(\mathcal{S}) = \cap_{i=1}^{n} M_{l,d}(S^{(i)}). \end{array} $$ \end{document}</tex-math>
<mml:math id="M16">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi mathvariant="script">S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>i</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>n</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ3.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>One simple way of computing
<italic>F</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>L</italic>
) is to grow the friendhood of
<italic>L</italic>
by one distance at a time for
<italic>d</italic>
times and to select only the friends having length
<italic>l</italic>
. Let
<italic>G</italic>
(
<italic>L</italic>
) denote the set of strings obtained by one edit operation on
<italic>L</italic>
and
<inline-formula id="IEq6">
<alternatives>
<tex-math id="M17">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$G(\{L_{1}, L_{2}, \ldots, L_{r}\}) = \cup _{t=1}^{r} G(L_{t})$\end{document}</tex-math>
<mml:math id="M18">
<mml:mi>G</mml:mi>
<mml:mo>(</mml:mo>
<mml:mo>{</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>}</mml:mo>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:mi>G</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq6.gif"></inline-graphic>
</alternatives>
</inline-formula>
. If
<italic>G</italic>
<sup>1</sup>
(
<italic>L</italic>
)=
<italic>G</italic>
(
<italic>L</italic>
), and for
<italic>t</italic>
>1,
<italic>G</italic>
<sup>
<italic>t</italic>
</sup>
(
<italic>L</italic>
)=
<italic>G</italic>
(
<italic>G</italic>
<sup>
<italic>t</italic>
−1</sup>
(
<italic>L</italic>
)) then
<disp-formula id="Equ4">
<label>4</label>
<alternatives>
<tex-math id="M19">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} F_{l,d}(L) = \{x \in G^{d}(L) : |x| = l\}. \end{array} $$ \end{document}</tex-math>
<mml:math id="M20">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>{</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo></mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>G</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msup>
<mml:mo>(</mml:mo>
<mml:mi>L</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>:</mml:mo>
<mml:mo>|</mml:mo>
<mml:mi>x</mml:mi>
<mml:mo>|</mml:mo>
<mml:mo>=</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>}</mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ4.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Using Eqs. (
<xref rid="Equ1" ref-type="">1</xref>
), (
<xref rid="Equ2" ref-type="">2</xref>
), (
<xref rid="Equ3" ref-type="">3</xref>
) and (
<xref rid="Equ4" ref-type="">4</xref>
), Pathak et al. [
<xref ref-type="bibr" rid="CR13">13</xref>
] gave an algorithm that stores all possible candidate motifs in an array of size |
<italic>Σ</italic>
|
<sup>
<italic>l</italic>
</sup>
. However the algorithm is inefficient in generating the neighborhood as the same candidate motif is generated by several combinations of the basic edit operations. Also, the
<italic>O</italic>
(|
<italic>Σ</italic>
|
<sup>
<italic>l</italic>
</sup>
) memory requirement makes the algorithm inapplicable for larger instances. In this paper we mitigate these two limitations.</p>
<sec id="Sec4">
<title>Efficient neighborhood generation</title>
<p>We now give a more efficient algorithm to generate the (
<italic>l,d</italic>
)-neighborhood of all possible
<italic>k</italic>
-mers of a string. Instead of computing (
<italic>l,t</italic>
)-friendhood for all 0≤
<italic>t</italic>
<italic>d</italic>
, we compute only the exact (
<italic>l,d</italic>
)-friendhood.</p>
<sec id="d30e2676">
<title>
<bold>Lemma</bold>
<bold>1</bold>
.</title>
<p>
<inline-formula id="IEq7">
<alternatives>
<tex-math id="M21">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$M_{l,d}(S) = \cup _{k=l-d}^{l+d} \cup _{j=1}^{m-k+1} F_{l,d}(S_{j,k})$\end{document}</tex-math>
<mml:math id="M22">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo></mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>m</mml:mi>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq7.gif"></inline-graphic>
</alternatives>
</inline-formula>
.</p>
</sec>
<sec id="d30e2781">
<title>
<italic>Proof</italic>
.</title>
<p>Consider the
<italic>k</italic>
-mer
<italic>L</italic>
=
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
. If
<italic>k</italic>
=
<italic>l</italic>
+
<italic>d</italic>
then we need
<italic>d</italic>
deletions to make
<italic>L</italic>
an
<italic>l</italic>
-mer. There cannot be any (
<italic>l,t</italic>
)-neighbor of
<italic>L</italic>
for
<italic>t</italic>
<
<italic>d</italic>
. Thus
<disp-formula id="Equ5">
<label>5</label>
<alternatives>
<tex-math id="M23">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} \cup_{t = 0}^{d} F_{l,t}(S_{j,l+d}) = F_{l,d}(S_{j,l+d}). \end{array} $$ \end{document}</tex-math>
<mml:math id="M24">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ5.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Suppose
<italic>k</italic>
<
<italic>l</italic>
+
<italic>d</italic>
. Any (
<italic>l,d</italic>
−1)-neighbor
<italic>B</italic>
of
<italic>L</italic>
is also an (
<italic>l,d</italic>
)-neighbor of
<italic>L</italic>
<sup></sup>
=
<italic>S</italic>
<sub>
<italic>j,k</italic>
+1</sub>
because
<italic>E</italic>
<italic>D</italic>
(
<italic>B,L</italic>
<sup></sup>
)≤
<italic>E</italic>
<italic>D</italic>
(
<italic>B,L</italic>
)+
<italic>E</italic>
<italic>D</italic>
(
<italic>L,L</italic>
<sup></sup>
)≤(
<italic>d</italic>
−1)+1=
<italic>d</italic>
. Thus
<disp-formula id="Equa">
<alternatives>
<tex-math id="M25">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} \cup_{t=0}^{d} F_{l,t}(S_{j,k}) \subseteq F_{l,d}(S_{j,k}) \bigcup \cup_{t=0}^{d} F_{l,t}(S_{j,k+1}) \end{array} $$ \end{document}</tex-math>
<mml:math id="M26">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo mathsize="big"></mml:mo>
<mml:munderover accent="false" accentunder="false">
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equa.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>which implies that
<disp-formula id="Equ6">
<label>6</label>
<alternatives>
<tex-math id="M27">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{*{20}l} \cup_{r=k}^{k{+}1}\cup_{t{=}0}^{d} F_{l,t}(S_{j,r}) = F_{l,d}(S_{j,k}) \bigcup \cup_{t=0}^{d} F_{l,t}(S_{j,k{+}1}). \end{array} $$ \end{document}</tex-math>
<mml:math id="M28">
<mml:mtable class="align" columnalign="left">
<mml:mtr>
<mml:mtd class="align-1">
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>r</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msubsup>
<mml:msubsup>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msubsup>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>r</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mo mathsize="big"></mml:mo>
<mml:munderover accent="false" accentunder="false">
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>t</mml:mi>
<mml:mo>=</mml:mo>
<mml:mn>0</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:msub>
<mml:mrow>
<mml:mi>F</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>t</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>)</mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ6.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
<p>Applying (
<xref rid="Equ6" ref-type="">6</xref>
) repeatedly for
<italic>k</italic>
=
<italic>l</italic>
<italic>d,l</italic>
<italic>d</italic>
+1,…,
<italic>l</italic>
+
<italic>d</italic>
−1, along with (
<xref rid="Equ5" ref-type="">5</xref>
) in (
<xref rid="Equ1" ref-type="">1</xref>
) and (
<xref rid="Equ2" ref-type="">2</xref>
) gives the result of the lemma.</p>
<p>We generate
<italic>F</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
) in three phases: we apply
<italic>δ</italic>
deletions in the first phase,
<italic>β</italic>
substitutions in the second phase, and
<italic>α</italic>
insertions in the final phase, where
<italic>d</italic>
=
<italic>δ</italic>
+
<italic>α</italic>
+
<italic>β</italic>
and
<italic>l</italic>
=
<italic>k</italic>
<italic>δ</italic>
+
<italic>α</italic>
. Solving for
<italic>α,β,δ</italic>
gives max{0,
<italic>q</italic>
}≤
<italic>δ</italic>
≤(
<italic>d</italic>
+
<italic>q</italic>
)/2,
<italic>α</italic>
=
<italic>δ</italic>
<italic>q</italic>
and
<italic>β</italic>
=
<italic>d</italic>
−2
<italic>δ</italic>
+
<italic>q</italic>
where
<italic>q</italic>
=
<italic>k</italic>
<italic>l</italic>
. In each of the phases, the neighborhood is grown by one edit operation at a time.</p>
</sec>
</sec>
<sec id="Sec5">
<title>Compact motifs</title>
<p>The candidate motifs in
<italic>F</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
) are generated in a compact way. Instead of inserting each character in
<italic>Σ</italic>
separately at a required position in
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
, we insert a new character ∗∉
<italic>Σ</italic>
at that position. Similarly, instead of substituting a character
<italic>σ</italic>
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
by each
<italic>σ</italic>
<sup></sup>
<italic>Σ</italic>
∖{
<italic>σ</italic>
} separately, we substitute
<italic>σ</italic>
by ∗. The motifs common to all strings in
<inline-formula id="IEq8">
<alternatives>
<tex-math id="M29">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {S}$\end{document}</tex-math>
<mml:math id="M30">
<mml:mi mathvariant="script">S</mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq8.gif"></inline-graphic>
</alternatives>
</inline-formula>
is determined by using the usual definition of union and the following definition of intersection on compact strings
<italic>A,B</italic>
∈(
<italic>Σ</italic>
∪{∗})
<sup>
<italic>l</italic>
</sup>
in (
<xref rid="Equ3" ref-type="">3</xref>
):
<disp-formula id="Equ7">
<label>7</label>
<alternatives>
<tex-math id="M31">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document} $$\begin{array}{@{}rcl@{}} \arraycolsep=1pt A {\cap} B = \left\{ \begin{array}{ll} \emptyset & \text{if}~ \exists j~\text{s.t.}~A_{j}, B_{j} \in \Sigma, A_{j} \ne B_{j}\\ \sigma_{1}\sigma_{2}\ldots \sigma_{l} & \text{else, where}~\sigma_{j} = \left\{ \begin{array}{ll} b_{j} & \text{if}~a_{j} {=} * \\ a_{j} & \text{if}~b_{j} {=} *. \end{array} \right. \end{array} \right. \end{array} $$ \end{document}</tex-math>
<mml:math id="M32">
<mml:mtable class="eqnarray" columnalign="left center right">
<mml:mtr>
<mml:mtd class="eqnarray-1">
<mml:mi>A</mml:mi>
<mml:mo></mml:mo>
<mml:mi>B</mml:mi>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable class="array" columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:mi></mml:mi>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="1em"></mml:mspace>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
<mml:mspace width="1em"></mml:mspace>
<mml:mtext>s.t.</mml:mtext>
<mml:mspace width="1em"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi>Σ</mml:mi>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>A</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>B</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd>
<mml:mtext>else, where</mml:mtext>
<mml:mspace width="1em"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>σ</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mtable class="array" columnalign="left">
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="1em"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
</mml:mtd>
</mml:mtr>
<mml:mtr>
<mml:mtd>
<mml:msub>
<mml:mrow>
<mml:mi>a</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mtd>
<mml:mtd>
<mml:mtext>if</mml:mtext>
<mml:mspace width="1em"></mml:mspace>
<mml:msub>
<mml:mrow>
<mml:mi>b</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:mi>.</mml:mi>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:mrow>
</mml:mfenced>
</mml:mtd>
</mml:mtr>
</mml:mtable>
</mml:math>
<graphic xlink:href="12864_2016_2789_Article_Equ7.gif" position="anchor"></graphic>
</alternatives>
</disp-formula>
</p>
</sec>
<sec id="Sec6">
<title>Trie for storing compact motifs</title>
<p>We store the compact motifs in a trie based data structure which we call a
<italic>motif trie</italic>
. This helps implement the intersection defined in (
<xref rid="Equ7" ref-type="">7</xref>
). Each node in the motif trie has atmost |
<italic>Σ</italic>
| children. The edges from a node
<italic>u</italic>
to its children
<italic>v</italic>
are labeled with mutually exclusive subsets
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
)⊆
<italic>Σ</italic>
. An empty set of compact motifs is represented by a single root node. A non-empty trie has
<italic>l</italic>
+1 levels of nodes, the root being at level 0. The trie stores the
<italic>l</italic>
-mer
<italic>σ</italic>
<sub>1</sub>
<italic>σ</italic>
<sub>2</sub>
<italic>σ</italic>
<sub>
<italic>l</italic>
</sub>
, all
<italic>σ</italic>
<sub>
<italic>j</italic>
</sub>
<italic>Σ</italic>
, if there is a path from the root to a leaf where
<italic>σ</italic>
<sub>
<italic>j</italic>
</sub>
appears in the label of the edge from level
<italic>j</italic>
−1 to level
<italic>j</italic>
.</p>
<p>For each string
<inline-formula id="IEq9">
<alternatives>
<tex-math id="M33">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S=\mathcal {S}^{(i)}$\end{document}</tex-math>
<mml:math id="M34">
<mml:mi>S</mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi mathvariant="script">S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>i</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msup>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq9.gif"></inline-graphic>
</alternatives>
</inline-formula>
we keep a separate motif trie
<italic>M</italic>
<sup>(
<italic>i</italic>
)</sup>
. Each compact neighbor
<italic>A</italic>
<italic>F</italic>
<sub>
<italic>l,d</italic>
</sub>
(
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
) is inserted into the motif trie recursively as follows. We start with the root node where we insert
<italic>A</italic>
<sub>1</sub>
<italic>A</italic>
<sub>2</sub>
<italic>A</italic>
<sub>
<italic>l</italic>
</sub>
. At a node
<italic>u</italic>
at level
<italic>j</italic>
where the prefix
<italic>A</italic>
<sub>1</sub>
<italic>A</italic>
<sub>2</sub>
<italic>A</italic>
<sub>
<italic>j</italic>
−1</sub>
is already inserted, we insert the suffix
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
<italic>A</italic>
<sub>
<italic>j</italic>
+1</sub>
<italic>A</italic>
<sub>
<italic>l</italic>
</sub>
as follows. If
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
<italic>Σ</italic>
we insert
<italic>A</italic>
<sup></sup>
=
<italic>A</italic>
<sub>
<italic>j</italic>
+1</sub>
<italic>A</italic>
<sub>
<italic>j</italic>
+2</sub>
<italic>A</italic>
<sub>
<italic>l</italic>
</sub>
to the children
<italic>v</italic>
of
<italic>u</italic>
such that
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
). If
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
)≠{
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
}, before inserting we make a copy of subtrie rooted at
<italic>v</italic>
. Let
<italic>v</italic>
<sup></sup>
be the root of the new copy. We make
<italic>v</italic>
<sup></sup>
a new child of
<italic>u</italic>
, set
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
<sup></sup>
)={
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
}, remove
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
from
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
), and insert
<italic>A</italic>
<sup></sup>
to
<italic>v</italic>
<sup></sup>
. On the other hand if
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
=∗ we insert
<italic>A</italic>
<sup></sup>
to each children of
<italic>u</italic>
. Let
<italic>T</italic>
=
<italic>Σ</italic>
if
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
=∗ and
<italic>T</italic>
={
<italic>A</italic>
<sub>
<italic>j</italic>
</sub>
} otherwise. Let
<italic>R</italic>
=
<italic>T</italic>
∖∪
<sub>
<italic>v</italic>
</sub>
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
). If
<italic>T</italic>
<italic></italic>
we create a new child
<italic>v</italic>
<sup></sup>
of
<italic>u</italic>
, set
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
<sup></sup>
)=
<italic>R</italic>
and recursively insert
<italic>A</italic>
<sup></sup>
to
<italic>v</italic>
<sup></sup>
. Figure
<xref rid="Fig1" ref-type="fig">1</xref>
shows examples of inserting into the motif trie.
<fig id="Fig1">
<label>Fig. 1</label>
<caption>
<p>Inserting into motif trie for
<italic>Σ</italic>
={
<italic>A,C,G,T</italic>
} and
<italic>l</italic>
=3.
<bold>a</bold>
After inserting ∗
<italic>G</italic>
<italic>T</italic>
into empty trie.
<bold>b</bold>
After inserting another string
<italic>A</italic>
<italic>C</italic>
</p>
</caption>
<graphic xlink:href="12864_2016_2789_Fig1_HTML" id="MO1"></graphic>
</fig>
</p>
<p>We also maintain a motif trie
<inline-formula id="IEq10">
<alternatives>
<tex-math id="M35">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M}$\end{document}</tex-math>
<mml:math id="M36">
<mml:mi mathvariant="script"></mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq10.gif"></inline-graphic>
</alternatives>
</inline-formula>
for the common compact motifs found so far, starting with
<inline-formula id="IEq11">
<alternatives>
<tex-math id="M37">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M} = M^{(1)}$\end{document}</tex-math>
<mml:math id="M38">
<mml:mi mathvariant="script"></mml:mi>
<mml:mo>=</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msup>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq11.gif"></inline-graphic>
</alternatives>
</inline-formula>
. After processing string
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
we intersect the root of
<italic>M</italic>
<sup>(
<italic>i</italic>
)</sup>
with the root of
<inline-formula id="IEq12">
<alternatives>
<tex-math id="M39">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M}$\end{document}</tex-math>
<mml:math id="M40">
<mml:mi mathvariant="script"></mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq12.gif"></inline-graphic>
</alternatives>
</inline-formula>
. In general a node
<italic>u</italic>
<sub>2</sub>
<italic>M</italic>
<sup>(
<italic>i</italic>
)</sup>
at level
<italic>j</italic>
is intersected with a node
<inline-formula id="IEq13">
<alternatives>
<tex-math id="M41">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$u_{1} \in \mathcal {M}$\end{document}</tex-math>
<mml:math id="M42">
<mml:msub>
<mml:mrow>
<mml:mi>u</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mi mathvariant="script"></mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq13.gif"></inline-graphic>
</alternatives>
</inline-formula>
at level
<italic>j</italic>
using the procedure shown in Algorithm 1. Figure
<xref rid="Fig2" ref-type="fig">2</xref>
shows an example of the intersection of two motif tries.
<fig id="Fig2">
<label>Fig. 2</label>
<caption>
<p>Intersection of motif tries.
<bold>a</bold>
Trie for
<italic>A</italic>
<italic>G</italic>
∗∪
<italic>C</italic>
<italic>T</italic>
.
<bold>b</bold>
Intersection of trie in Fig.
<xref rid="Fig1" ref-type="fig">1</xref>
<xref rid="Fig1" ref-type="fig">b</xref>
and trie in Fig. 2
<bold>a</bold>
</p>
</caption>
<graphic xlink:href="12864_2016_2789_Fig2_HTML" id="MO2"></graphic>
</fig>
</p>
<p>
<graphic xlink:href="12864_2016_2789_Figa_HTML.gif" id="MO3"></graphic>
</p>
<p>The final set of common motifs is obtained by a depth-first traversal of
<inline-formula id="IEq14">
<alternatives>
<tex-math id="M43">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {M}$\end{document}</tex-math>
<mml:math id="M44">
<mml:mi mathvariant="script"></mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq14.gif"></inline-graphic>
</alternatives>
</inline-formula>
outputting the label of the path from the root whenever a leaf is traversed. An edge (
<italic>u,v</italic>
) is traversed separately for each
<italic>σ</italic>
<italic>l</italic>
<italic>a</italic>
<italic>b</italic>
<italic>e</italic>
<italic>l</italic>
(
<italic>u,v</italic>
).</p>
</sec>
<sec id="Sec7">
<title>Efficient compact neighborhood generation</title>
<p>A significant part of the time taken by our algorithm is in inserting compact neighbors into the motif trie as it is executed for each neighbor in the friendhood. Our efficient neighborhood generation technique and the use of compact neighbors reduce duplication in neighborhood but do not guarantee completely duplication free neighborhood. In this section, we design few simple rules to reduce duplication further. Later we will see that these rules are quite close to the ideal as we will prove that the compact motif generated after skipping using the rules, are distinct if all the characters in the input string are distinct.</p>
<p>To differentiate multiple copies of the same compact neighbor, we augment it with the information about how it is generated. This information is required only in the proof and is not used in the actual algorithm. Formally, each compact neighbor
<italic>L</italic>
of a
<italic>k</italic>
-mer
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
is represented as an ordered tuple 〈
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
,
<italic>T</italic>
〉 where
<italic>T</italic>
denotes the sequence of edit operations applied to
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
. Each edit operation in
<italic>T</italic>
is represented as a tuple 〈
<italic>p,o</italic>
〉 where
<italic>p</italic>
denotes the position (as in
<italic>S</italic>
) where the edit operation is applied and
<italic>o</italic>
∈{
<italic>D,R,I</italic>
} denotes the type of the operation – deletion, substitution and insertion, respectively. At each position there can be one deletion or one substitution but one or more insertions. The tuples in
<italic>T</italic>
are sorted lexicographically with the natural order for
<italic>p</italic>
and for
<italic>o</italic>
,
<italic>D</italic>
<
<italic>R</italic>
<
<italic>I</italic>
.</p>
<p>The rules for skipping compact neighbors are given in Table
<xref rid="Tab2" ref-type="table">2</xref>
. Rule 1 applies when
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
is not the rightmost
<italic>k</italic>
-mer and the current edit operation deletes the leftmost base of
<italic>S</italic>
<sub>
<italic>j,k</italic>
</sub>
,
<italic>i.e.</italic>
,
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
. Rule 2 applies when the current edit operation substitutes a base just after a base that was already deleted. Rule 3 skips the neighbor which is generated from a
<italic>k</italic>
-mer except the rightmost by deleting a base and substituting all bases before it. Rules 4–9 apply when the current operation is an insertion. Rule 4,6 apply when the insertion is just before a deletion and a substitution, respectively. Rule 5 applies when the insertion is just after a deletion. Rule 7,8 apply when the
<italic>k</italic>
-mer is not the leftmost. Rule 7 applies when the insertion is at the leftmost position and Rule 8 applies when all bases before the position of insertion are already substituted. Rule 9 applies when the
<italic>k</italic>
-mer is not the rightmost and the insertion is at the right end. The first in each pair of the figures in Fig.
<xref rid="Fig3" ref-type="fig">3</xref>
illustrates the situation where the corresponding rule applies.
<fig id="Fig3">
<label>Fig. 3</label>
<caption>
<p>Construction of
<italic>L</italic>
<sup></sup>
under different rules in the proof of Lemma
<xref rid="Sec7" ref-type="sec">2</xref>
. Insertions are shown using arrows, deletions using − and substitutions using ∗. Rule 5 case (i) is similar to Rule 4 case (i)</p>
</caption>
<graphic xlink:href="12864_2016_2789_Fig3_HTML" id="MO4"></graphic>
</fig>
<table-wrap id="Tab2">
<label>Table 2</label>
<caption>
<p>Conditions for skipping motif
<italic>L</italic>
=〈
<italic>M,S</italic>
<sub>
<italic>j,k</italic>
</sub>
,
<italic>T</italic>
</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Rule</th>
<th align="left">Conditions (in all rules
<italic>t</italic>
≥0)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">(
<italic>j</italic>
+
<italic>k</italic>
<italic>m</italic>
)∧〈
<italic>j,D</italic>
〉∈
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">{〈
<italic>j</italic>
+
<italic>t,D</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>R</italic>
〉}⊆
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">(
<italic>j</italic>
+
<italic>k</italic>
<italic>m</italic>
)∧{〈
<italic>j,R</italic>
〉,〈
<italic>j</italic>
+1,
<italic>R</italic>
〉,…,〈
<italic>j</italic>
+
<italic>t,R</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>D</italic>
〉}⊆
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">{〈
<italic>j</italic>
+
<italic>t,D</italic>
〉,〈
<italic>j</italic>
+
<italic>t,I</italic>
〉}⊆
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">{〈
<italic>j</italic>
+
<italic>t,D</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>I</italic>
〉}⊆
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">{〈
<italic>j</italic>
+
<italic>t,R</italic>
〉,〈
<italic>j</italic>
+
<italic>t,I</italic>
〉}⊆
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">(
<italic>j</italic>
>1)∧〈
<italic>j,I</italic>
〉∈
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">(
<italic>j</italic>
>1)∧{〈
<italic>j,R</italic>
〉,〈
<italic>j</italic>
+1,
<italic>R</italic>
〉,…,〈
<italic>j</italic>
+
<italic>t,R</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>I</italic>
〉}⊆
<italic>T</italic>
</td>
</tr>
<tr>
<td align="left">9</td>
<td align="left">(
<italic>j</italic>
+
<italic>k</italic>
<italic>m</italic>
)∧〈
<italic>j</italic>
+
<italic>k,I</italic>
〉∈
<italic>T</italic>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>Let
<inline-formula id="IEq15">
<alternatives>
<tex-math id="M45">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\bar {M}_{l,\,d}(S)$\end{document}</tex-math>
<mml:math id="M46">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq15.gif"></inline-graphic>
</alternatives>
</inline-formula>
denote the multi-set of tuples for the compact motifs of
<italic>S</italic>
that were not skipped by our algorithm using the rules in Table
<xref rid="Tab2" ref-type="table">2</xref>
and
<italic>M</italic>
<sub>
<italic>l</italic>
,
<italic>d</italic>
</sub>
(
<italic>S</italic>
) be the set of compact motifs generated by (
<xref rid="Equ3" ref-type="">3</xref>
). Let
<italic>Γ</italic>
(〈
<italic>S</italic>
<sub>
<italic>j</italic>
,
<italic>k</italic>
</sub>
,
<italic>T</italic>
〉) be the resulting string when the operations in
<italic>T</italic>
are applied to
<italic>S</italic>
<sub>
<italic>j</italic>
,
<italic>k</italic>
</sub>
and
<italic>Γ</italic>
(
<italic>Z</italic>
)=∪
<sub>
<italic>L</italic>
<italic>Z</italic>
</sub>
<italic>Γ</italic>
(
<italic>L</italic>
).</p>
<sec id="d30e4828">
<title>
<bold>Lemma</bold>
<bold>2</bold>
.</title>
<p>
<inline-formula id="IEq16">
<alternatives>
<tex-math id="M47">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Gamma (\bar {M}_{l,d}(S)) = M_{l,d}(S)$\end{document}</tex-math>
<mml:math id="M48">
<mml:mi>Γ</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq16.gif"></inline-graphic>
</alternatives>
</inline-formula>
.</p>
</sec>
<sec id="d30e4891">
<title>
<italic>Proof</italic>
.</title>
<p>By construction,
<inline-formula id="IEq17">
<alternatives>
<tex-math id="M49">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Gamma (\bar {M}_{l,d}(S)) \subseteq M_{l,d}(S)$\end{document}</tex-math>
<mml:math id="M50">
<mml:mi>Γ</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq17.gif"></inline-graphic>
</alternatives>
</inline-formula>
. We show
<inline-formula id="IEq18">
<alternatives>
<tex-math id="M51">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$M_{l,d}(S) \subseteq \Gamma (\bar {M}_{l,d}(S))$\end{document}</tex-math>
<mml:math id="M52">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mi>Γ</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq18.gif"></inline-graphic>
</alternatives>
</inline-formula>
by giving a contradiction when
<inline-formula id="IEq19">
<alternatives>
<tex-math id="M53">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$M_{l,d}(S) \setminus \Gamma (\bar {M}_{l,d}(S)) \ne \emptyset $\end{document}</tex-math>
<mml:math id="M54">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mi>Γ</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mi></mml:mi>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq19.gif"></inline-graphic>
</alternatives>
</inline-formula>
.</p>
<p>We define an order on the compact neighbors
<inline-formula id="IEq20">
<alternatives>
<tex-math id="M55">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\phantom {\dot {i}\!}L_{1} = \langle {S_{j_{1},k_{1}}, T_{1}}\rangle $\end{document}</tex-math>
<mml:math id="M56">
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq20.gif"></inline-graphic>
</alternatives>
</inline-formula>
and
<inline-formula id="IEq21">
<alternatives>
<tex-math id="M57">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\phantom {\dot {i}\!}L_{2} = \langle {S_{j_{2},k_{2}}, T_{2}}\rangle $\end{document}</tex-math>
<mml:math id="M58">
<mml:msub>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq21.gif"></inline-graphic>
</alternatives>
</inline-formula>
as follows:
<italic>L</italic>
<sub>1</sub>
<
<italic>L</italic>
<sub>2</sub>
if
<italic>Γ</italic>
(
<italic>L</italic>
<sub>1</sub>
)<
<italic>Γ</italic>
(
<italic>L</italic>
<sub>2</sub>
) and
<italic>L</italic>
<sub>2</sub>
<
<italic>L</italic>
<sub>1</sub>
if
<italic>Γ</italic>
(
<italic>L</italic>
<sub>2</sub>
)<
<italic>Γ</italic>
(
<italic>L</italic>
<sub>1</sub>
). When
<italic>Γ</italic>
(
<italic>L</italic>
<sub>1</sub>
)=
<italic>Γ</italic>
(
<italic>L</italic>
<sub>2</sub>
) we have
<italic>L</italic>
<sub>1</sub>
<
<italic>L</italic>
<sub>2</sub>
if and only if (
<italic>k</italic>
<sub>1</sub>
<
<italic>k</italic>
<sub>2</sub>
)∨((
<italic>k</italic>
<sub>1</sub>
=
<italic>k</italic>
<sub>2</sub>
)∧(
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
))∨((
<italic>k</italic>
<sub>1</sub>
=
<italic>k</italic>
<sub>2</sub>
)∧(
<italic>p</italic>
<sub>1</sub>
=
<italic>p</italic>
<sub>2</sub>
)∧(
<italic>o</italic>
<sub>1</sub>
<
<italic>o</italic>
<sub>2</sub>
)) where 〈
<italic>p</italic>
<sub>1</sub>
,
<italic>o</italic>
<sub>1</sub>
〉∈
<italic>T</italic>
<sub>1</sub>
,〈
<italic>p</italic>
<sub>2</sub>
,
<italic>o</italic>
<sub>2</sub>
〉∈
<italic>T</italic>
<sub>2</sub>
are the leftmost edit operations where
<italic>T</italic>
<sub>1</sub>
,
<italic>T</italic>
<sub>2</sub>
differ.</p>
<p>Suppose
<inline-formula id="IEq22">
<alternatives>
<tex-math id="M59">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$M \in M_{l,d}(S) \setminus \Gamma (\bar {M}_{l,d}(S))$\end{document}</tex-math>
<mml:math id="M60">
<mml:mi>M</mml:mi>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo></mml:mo>
<mml:mi>Γ</mml:mi>
<mml:mo>(</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq22.gif"></inline-graphic>
</alternatives>
</inline-formula>
. Let
<italic>L</italic>
=〈
<italic>S</italic>
<sub>
<italic>j</italic>
,
<italic>k</italic>
</sub>
,
<italic>T</italic>
〉 be the largest (in the order defined above) tuple skipped by our algorithm such that
<italic>Γ</italic>
(
<italic>L</italic>
)=
<italic>M</italic>
. For each
<italic>r</italic>
=1,…,9 we show a contradiction that if
<italic>L</italic>
is skipped by Rule
<italic>r</italic>
then there is another
<inline-formula id="IEq23">
<alternatives>
<tex-math id="M61">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\phantom {\dot {i}\!}L'=\langle {S_{j',\,k'},T'}\rangle $\end{document}</tex-math>
<mml:math id="M62">
<mml:msup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msup>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo>,</mml:mo>
<mml:mspace width="0.3em"></mml:mspace>
<mml:msup>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msup>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
</mml:msup>
<mml:mo></mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq23.gif"></inline-graphic>
</alternatives>
</inline-formula>
with the same number of edit operations and
<italic>Γ</italic>
(
<italic>L</italic>
<sup></sup>
)=
<italic>M</italic>
but
<italic>L</italic>
<
<italic>L</italic>
<sup></sup>
. Figure
<xref rid="Fig3" ref-type="fig">3</xref>
illustrates the choice of
<italic>L</italic>
<sup></sup>
under different rules.</p>
<p>Rule 1. Here
<italic>j</italic>
+
<italic>k</italic>
<italic>m</italic>
and 〈
<italic>j,D</italic>
〉∈
<italic>T</italic>
. Consider
<italic>T</italic>
<sup></sup>
=(
<italic>T</italic>
∖{〈
<italic>j,D</italic>
〉})∪{
<italic>j</italic>
+
<italic>k,D</italic>
}, and
<italic>j</italic>
<sup></sup>
=
<italic>j</italic>
+1,
<italic>k</italic>
<sup></sup>
=
<italic>k</italic>
.</p>
<p>Rule 2. Consider
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j</italic>
+
<italic>t,D</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>R</italic>
〉}∪{〈
<italic>j</italic>
+
<italic>t,R</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>D</italic>
〉}, and
<italic>j</italic>
<sup></sup>
=
<italic>j,k</italic>
<sup></sup>
=
<italic>k</italic>
.</p>
<p>Rule 3.
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j,R</italic>
〉,〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>D</italic>
〉}∪{〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>R</italic>
〉,〈
<italic>j</italic>
+
<italic>k,D</italic>
〉},
<italic>j</italic>
<sup></sup>
=
<italic>j</italic>
+1,
<italic>k</italic>
<sup></sup>
=
<italic>k</italic>
.</p>
<p>Rule 4. For this and subsequent rules
<italic>k</italic>
<
<italic>l</italic>
+
<italic>d</italic>
as there is atleast one insertion and hence
<italic>k</italic>
<sup></sup>
could possibly be equal to
<italic>k</italic>
+1. We consider two cases. Case (i)
<italic>j</italic>
+
<italic>k</italic>
<italic>m</italic>
:
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j</italic>
+
<italic>t,D</italic>
〉,〈
<italic>j</italic>
+
<italic>t,I</italic>
〉}∪{〈
<italic>j</italic>
+
<italic>t,R</italic>
〉,〈
<italic>j</italic>
+
<italic>k,D</italic>
〉},
<italic>j</italic>
<sup></sup>
=
<italic>j,k</italic>
<sup></sup>
=
<italic>k</italic>
+1. Case (ii)
<italic>j</italic>
+
<italic>k</italic>
=
<italic>m</italic>
+1: Here deletion of
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
is allowed by Rule 1.
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j</italic>
+
<italic>t,D</italic>
〉,〈
<italic>j</italic>
+
<italic>t,I</italic>
〉}∪{〈
<italic>j</italic>
−1,
<italic>D</italic>
〉,〈
<italic>j</italic>
+
<italic>t,R</italic>
〉},
<italic>j</italic>
<sup></sup>
=
<italic>j</italic>
−1,
<italic>k</italic>
<sup></sup>
=
<italic>k</italic>
+1.</p>
<p>Rule 5. The same argument for Rule 4 applies considering 〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>I</italic>
〉 instead of 〈
<italic>j</italic>
+
<italic>t,I</italic>
〉.</p>
<p>Rule 6.
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j</italic>
+
<italic>t,I</italic>
〉}∪{〈
<italic>j</italic>
+
<italic>t</italic>
+1,
<italic>I</italic>
〉}, and
<italic>j</italic>
<sup></sup>
=
<italic>j,k</italic>
<sup></sup>
=
<italic>k</italic>
.</p>
<p>Rule 7.
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j,I</italic>
〉}∪{〈
<italic>j</italic>
−1,
<italic>R</italic>
〉},
<italic>j</italic>
<sup></sup>
=
<italic>j</italic>
−1,
<italic>k</italic>
<sup></sup>
=
<italic>k</italic>
+1.</p>
<p>Rule 8.
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j</italic>
+
<italic>t,I</italic>
〉}∪{〈
<italic>j</italic>
−1,
<italic>R</italic>
〉},
<italic>j</italic>
<sup></sup>
=
<italic>j</italic>
−1,
<italic>k</italic>
<sup></sup>
=
<italic>k</italic>
+1.</p>
<p>Rule 9.
<italic>T</italic>
<sup></sup>
=
<italic>T</italic>
∖{〈
<italic>j</italic>
+
<italic>k,I</italic>
〉}∪{〈
<italic>j</italic>
+
<italic>k,R</italic>
〉},
<italic>j</italic>
<sup></sup>
=
<italic>j,k</italic>
<sup></sup>
=
<italic>k</italic>
+1.</p>
<p>Consider two compact motifs
<inline-formula id="IEq24">
<alternatives>
<tex-math id="M63">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\phantom {\dot {i}\!}M_{1} = \langle {S_{j_{1},k_{1}}, T_{1}}\rangle $\end{document}</tex-math>
<mml:math id="M64">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq24.gif"></inline-graphic>
</alternatives>
</inline-formula>
and
<inline-formula id="IEq25">
<alternatives>
<tex-math id="M65">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\phantom {\dot {i}\!}M_{2} = \langle {S_{j_{2},k_{2}}, T_{2}}\rangle $\end{document}</tex-math>
<mml:math id="M66">
<mml:msub>
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>T</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq25.gif"></inline-graphic>
</alternatives>
</inline-formula>
in
<inline-formula id="IEq26">
<alternatives>
<tex-math id="M67">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\bar {M}_{l,d}(S)$\end{document}</tex-math>
<mml:math id="M68">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq26.gif"></inline-graphic>
</alternatives>
</inline-formula>
. For
<italic>q</italic>
∈{1,2}, let
<inline-formula id="IEq27">
<alternatives>
<tex-math id="M69">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left \langle {p_{q}^{(1)}, o_{q}^{(1)}}\right \rangle, \left \langle {p_{q}^{(2)}, o_{q}^{(2)}}\right \rangle, \ldots, \left \langle {p_{q}^{(d)}, o_{q}^{(d)}}\right \rangle $\end{document}</tex-math>
<mml:math id="M70">
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>d</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq27.gif"></inline-graphic>
</alternatives>
</inline-formula>
be the sequence of edit operations in
<italic>T</italic>
<sub>
<italic>q</italic>
</sub>
arranged in the order as the neighbors are generated by our algorithm, and the intermediate neighbors be
<inline-formula id="IEq28">
<alternatives>
<tex-math id="M71">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L_{q}^{(h)} = \left \langle S_{j_{q},k_{q}}, \left \{\left \langle {p_{q}^{(1)}, o_{q}^{(1)}}\right \rangle,\right.\right.\left.\left. \left \langle {p_{q}^{(2)}, o_{q}^{(2)}}\right \rangle, \ldots, \left \langle {p_{q}^{(h)}, o_{q}^{(h)}}\right \rangle \right \} \right \rangle $\end{document}</tex-math>
<mml:math id="M72">
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mfenced close="" open="〈" separators="">
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mfenced close="" open="{" separators="">
<mml:mrow>
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
<mml:mfenced close="〉" open="" separators="">
<mml:mrow>
<mml:mfenced close="}" open="" separators="">
<mml:mrow>
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq28.gif"></inline-graphic>
</alternatives>
</inline-formula>
for all
<italic>h</italic>
=1,2,…,
<italic>d</italic>
. We also denote the initial
<italic>k</italic>
-mer as a neighbor
<inline-formula id="IEq29">
<alternatives>
<tex-math id="M73">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L_{q}^{(0)} = \langle {S_{j_{q},k_{q}}, \emptyset }\rangle $\end{document}</tex-math>
<mml:math id="M74">
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mn>0</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>=</mml:mo>
<mml:mo></mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mi></mml:mi>
<mml:mo></mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq29.gif"></inline-graphic>
</alternatives>
</inline-formula>
.</p>
</sec>
<sec id="d30e6484">
<title>
<bold>Lemma</bold>
<bold>3</bold>
.</title>
<p>If
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
s are all distinct and
<inline-formula id="IEq30">
<alternatives>
<tex-math id="M75">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Gamma \left (L_{1}^{(h)}\right) = \Gamma \left (L_{2}^{(h)}\right)$\end{document}</tex-math>
<mml:math id="M76">
<mml:mi>Γ</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mi>Γ</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq30.gif"></inline-graphic>
</alternatives>
</inline-formula>
for 1≤
<italic>h</italic>
<italic>d</italic>
then
<inline-formula id="IEq31">
<alternatives>
<tex-math id="M77">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\left \langle {p_{1}^{(h)}, o_{1}^{(h)}}\right \rangle = \left \langle {p_{2}^{(h)}, o_{2}^{(h)}}\right \rangle $\end{document}</tex-math>
<mml:math id="M78">
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mfenced close="〉" open="〈" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq31.gif"></inline-graphic>
</alternatives>
</inline-formula>
and
<inline-formula id="IEq32">
<alternatives>
<tex-math id="M79">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Gamma \left (L_{1}^{(h-1)}\right) = \Gamma \left (L_{2}^{(h-1)}\right)$\end{document}</tex-math>
<mml:math id="M80">
<mml:mi>Γ</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mi>Γ</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq32.gif"></inline-graphic>
</alternatives>
</inline-formula>
.</p>
</sec>
<sec id="d30e6680">
<title>
<italic>Proof</italic>
.</title>
<p>To simplify the proof, we use
<italic>p</italic>
<sub>
<italic>q</italic>
</sub>
,
<italic>o</italic>
<sub>
<italic>q</italic>
</sub>
,
<italic>L</italic>
<sub>
<italic>q</italic>
</sub>
to denote
<inline-formula id="IEq33">
<alternatives>
<tex-math id="M81">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$p_{q}^{(h)}, o_{q}^{(h)}, L_{q}^{(h)}$\end{document}</tex-math>
<mml:math id="M82">
<mml:msubsup>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>o</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
<mml:mo>,</mml:mo>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>q</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq33.gif"></inline-graphic>
</alternatives>
</inline-formula>
, respectively, for all
<italic>q</italic>
∈{1,2}. Without loss of generality we assume
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
.</p>
<p>As
<italic>p</italic>
<sub>1</sub>
,
<italic>p</italic>
<sub>2</sub>
are positions in
<italic>S</italic>
, it would be enough to prove 〈
<italic>p</italic>
<sub>1</sub>
,
<italic>o</italic>
<sub>1</sub>
〉=〈
<italic>p</italic>
<sub>2</sub>
,
<italic>o</italic>
<sub>2</sub>
〉 because that would imply
<inline-formula id="IEq34">
<alternatives>
<tex-math id="M83">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\Gamma \left (L_{1}^{(h-1)}\right) = \Gamma \left (L_{2}^{(h-1)}\right)$\end{document}</tex-math>
<mml:math id="M84">
<mml:mi>Γ</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
<mml:mo>=</mml:mo>
<mml:mi>Γ</mml:mi>
<mml:mfenced close=")" open="(" separators="">
<mml:mrow>
<mml:msubsup>
<mml:mrow>
<mml:mi>L</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mi>h</mml:mi>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:msubsup>
</mml:mrow>
</mml:mfenced>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq34.gif"></inline-graphic>
</alternatives>
</inline-formula>
.</p>
<p>If 〈
<italic>p</italic>
<sub>1</sub>
,
<italic>o</italic>
<sub>1</sub>
〉≠〈
<italic>p</italic>
<sub>2</sub>
,
<italic>o</italic>
<sub>2</sub>
〉 then either (a)
<italic>o</italic>
<sub>1</sub>
=
<italic>o</italic>
<sub>2</sub>
and
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
or (b)
<italic>o</italic>
<sub>1</sub>
<italic>o</italic>
<sub>2</sub>
and
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
, giving us the following 9 possible cases. We complete the proof by giving a contradiction in each of these 9 cases:
<table-wrap id="Taba">
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">
<bold>Case</bold>
</th>
<th align="left">
<bold>
<italic>o</italic>
</bold>
<sub>
<bold>1</bold>
</sub>
</th>
<th align="left">
<bold>
<italic>o</italic>
</bold>
<sub>
<bold>2</bold>
</sub>
</th>
<th align="left">
<bold>cond.</bold>
</th>
<th align="left">
<bold>Case</bold>
</th>
<th align="left">
<bold>
<italic>o</italic>
</bold>
<sub>
<bold>1</bold>
</sub>
</th>
<th align="left">
<bold>
<italic>o</italic>
</bold>
<sub>
<bold>2</bold>
</sub>
</th>
<th align="left">
<bold>cond.</bold>
</th>
<th align="left">
<bold>Case</bold>
</th>
<th align="left">
<bold>
<italic>o</italic>
</bold>
<sub>
<bold>1</bold>
</sub>
</th>
<th align="left">
<bold>
<italic>o</italic>
</bold>
<sub>
<bold>2</bold>
</sub>
</th>
<th align="left">
<bold>cond.</bold>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center">1</td>
<td align="center">
<italic>D</italic>
</td>
<td align="center">
<italic>D</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
</td>
<td align="center">4</td>
<td align="center">
<italic>R</italic>
</td>
<td align="center">
<italic>D</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
</td>
<td align="center">7</td>
<td align="center">
<italic>I</italic>
</td>
<td align="center">
<italic>D</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">
<italic>D</italic>
</td>
<td align="center">
<italic>R</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
</td>
<td align="center">5</td>
<td align="center">
<italic>R</italic>
</td>
<td align="center">
<italic>R</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
</td>
<td align="center">8</td>
<td align="center">
<italic>I</italic>
</td>
<td align="center">
<italic>R</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">
<italic>D</italic>
</td>
<td align="center">
<italic>I</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
</td>
<td align="center">6</td>
<td align="center">
<italic>R</italic>
</td>
<td align="center">
<italic>I</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<italic>p</italic>
<sub>2</sub>
</td>
<td align="center">9</td>
<td align="center">
<italic>I</italic>
</td>
<td align="center">
<italic>I</italic>
</td>
<td align="center">
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
</td>
</tr>
</tbody>
</table>
</table-wrap>
</p>
<p>
<bold>
<italic>Cases 2, 3, 4, 7</italic>
</bold>
</p>
<p>Our algorithm applies edit operations in phases: first deletions, followed by substitutions and finally insertions. In all these cases, one of
<italic>Γ</italic>
(
<italic>L</italic>
<sub>1</sub>
),
<italic>Γ</italic>
(
<italic>L</italic>
<sub>2</sub>
) does not have any ∗ because only deletions have been applied so far and the other has at least one ∗ because a substitution or an insertion has been applied. This implies
<italic>Γ</italic>
(
<italic>L</italic>
<sub>1</sub>
)≠
<italic>Γ</italic>
(
<italic>L</italic>
<sub>2</sub>
), a contradiction.</p>
<p>
<bold>
<italic>Case 1</italic>
</bold>
</p>
<p>
<italic>L</italic>
<sub>2</sub>
has
<inline-formula id="IEq35">
<alternatives>
<tex-math id="M85">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M86">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq35.gif"></inline-graphic>
</alternatives>
</inline-formula>
deleted. As
<italic>Γ</italic>
(
<italic>L</italic>
<sub>1</sub>
)=
<italic>Γ</italic>
(
<italic>L</italic>
<sub>2</sub>
),
<inline-formula id="IEq36">
<alternatives>
<tex-math id="M87">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M88">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq36.gif"></inline-graphic>
</alternatives>
</inline-formula>
must have been deleted in some operation prior to reaching
<italic>L</italic>
<sub>1</sub>
. As the deletions are applied in order, left to right, we must have
<italic>p</italic>
<sub>1</sub>
=
<italic>p</italic>
<sub>2</sub>
which is a contradiction.</p>
<p>
<bold>
<italic>Case 5</italic>
</bold>
</p>
<p>This case has been illustrated in Fig.
<xref rid="Fig4" ref-type="fig">4</xref>
<xref rid="Fig4" ref-type="fig">a</xref>
.
<italic>L</italic>
<sub>1</sub>
has no substitution at a position >
<italic>p</italic>
<sub>1</sub>
and no insertion at all. The ∗ at
<italic>p</italic>
<sub>2</sub>
in
<italic>L</italic>
<sub>2</sub>
must be matched with the ∗ at
<italic>p</italic>
<sub>1</sub>
in
<italic>L</italic>
<sub>1</sub>
and as the characters in
<italic>S</italic>
are distinct, all of
<inline-formula id="IEq37">
<alternatives>
<tex-math id="M89">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{1}+1},\ldots,S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M90">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq37.gif"></inline-graphic>
</alternatives>
</inline-formula>
cannot appear in
<italic>L</italic>
<sub>1</sub>
and hence must be deleted in
<italic>L</italic>
<sub>1</sub>
.
<fig id="Fig4">
<label>Fig. 4</label>
<caption>
<p>Proof of uniqueness (Lemma
<xref rid="Sec7" ref-type="sec">2</xref>
). Subfigures a,b,c,d illustrates the cases 5,6,7,8,9 respectively</p>
</caption>
<graphic xlink:href="12864_2016_2789_Fig4_HTML" id="MO5"></graphic>
</fig>
</p>
<p>Now for each
<italic>t</italic>
<
<italic>p</italic>
<sub>1</sub>
, right to left, and
<italic>y</italic>
=
<italic>t</italic>
+
<italic>p</italic>
<sub>2</sub>
<italic>p</italic>
<sub>1</sub>
, we have the following:
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
is either deleted or substituted in
<italic>L</italic>
<sub>1</sub>
, which implies that
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
must be substituted in
<italic>L</italic>
<sub>2</sub>
as the deletion of
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
in
<italic>L</italic>
<sub>2</sub>
is prohibited by Rule 2, and finally to match this ∗ in
<italic>L</italic>
<sub>2</sub>
,
<italic>S</italic>
<sub>
<italic>t</italic>
</sub>
must be substituted in
<italic>L</italic>
<sub>1</sub>
as
<italic>S</italic>
<sub>
<italic>t</italic>
</sub>
cannot be deleted in
<italic>L</italic>
<sub>1</sub>
, again by Rule 2.</p>
<p>But this makes Rule 3 applicable to
<italic>L</italic>
<sub>1</sub>
and
<italic>L</italic>
<sub>1</sub>
must have been skipped. This is a contradiction.</p>
<p>
<bold>
<italic>Case 6</italic>
</bold>
</p>
<p>By Rule 9 the insertion in
<italic>L</italic>
<sub>2</sub>
cannot be at the rightmost position and hence
<italic>L</italic>
<sub>2</sub>
must have at least one character after the insertion. By Rules 4 and 6,
<inline-formula id="IEq38">
<alternatives>
<tex-math id="M91">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M92">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq38.gif"></inline-graphic>
</alternatives>
</inline-formula>
must not be deleted or substituted in
<italic>L</italic>
<sub>2</sub>
and hence it must not be deleted or substituted in
<italic>L</italic>
<sub>1</sub>
either. Thus
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
. There cannot be any insertion or substitution at a position >
<italic>p</italic>
<sub>1</sub>
in
<italic>L</italic>
<sub>1</sub>
. Thus the ∗ due to the insertion at
<italic>p</italic>
<sub>2</sub>
in
<italic>L</italic>
<sub>2</sub>
must be matched by the ∗ due to the substitution at
<italic>p</italic>
<sub>1</sub>
in
<italic>L</italic>
<sub>1</sub>
and all of
<inline-formula id="IEq39">
<alternatives>
<tex-math id="M93">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{1}+1},\ldots,S_{p_{2}-1}$\end{document}</tex-math>
<mml:math id="M94">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq39.gif"></inline-graphic>
</alternatives>
</inline-formula>
must be deleted in
<italic>L</italic>
<sub>1</sub>
.</p>
<p>By Rule 7,
<inline-formula id="IEq40">
<alternatives>
<tex-math id="M95">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M96">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq40.gif"></inline-graphic>
</alternatives>
</inline-formula>
cannot be the leftmost in
<inline-formula id="IEq41">
<alternatives>
<tex-math id="M97">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{j_{2},k_{2}}$\end{document}</tex-math>
<mml:math id="M98">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq41.gif"></inline-graphic>
</alternatives>
</inline-formula>
. So we consider
<inline-formula id="IEq42">
<alternatives>
<tex-math id="M99">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{2}-1}$\end{document}</tex-math>
<mml:math id="M100">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq42.gif"></inline-graphic>
</alternatives>
</inline-formula>
in
<italic>L</italic>
<sub>1</sub>
,
<italic>L</italic>
<sub>2</sub>
. It is either deleted or substituted in
<italic>L</italic>
<sub>1</sub>
and hence by Rule 5, it must be substituted in
<inline-formula id="IEq43">
<alternatives>
<tex-math id="M101">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M102">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq43.gif"></inline-graphic>
</alternatives>
</inline-formula>
(there can be multiple insertions at
<italic>p</italic>
<sub>2</sub>
in
<italic>L</italic>
<sub>2</sub>
but that does not affect this argument). To match this ∗,
<inline-formula id="IEq44">
<alternatives>
<tex-math id="M103">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{1}-1}$\end{document}</tex-math>
<mml:math id="M104">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq44.gif"></inline-graphic>
</alternatives>
</inline-formula>
must be substituted in
<italic>L</italic>
<sub>1</sub>
.</p>
<p>Using a similar argument as in Case 5,
<italic>S</italic>
<sub>
<italic>t</italic>
</sub>
must be substituted in
<italic>L</italic>
<sub>1</sub>
for each
<italic>t</italic>
<
<italic>p</italic>
<sub>1</sub>
−1. But this again makes Rule 3 applicable to
<italic>L</italic>
<sub>1</sub>
and
<italic>L</italic>
<sub>1</sub>
must have been skipped, which is not possible. This case has been illustrated in Fig.
<xref rid="Fig4" ref-type="fig">4</xref>
<xref rid="Fig4" ref-type="fig">b</xref>
.</p>
<p>
<bold>
<italic>Case 8</italic>
</bold>
</p>
<p>Due to Rules 4, 6 and 9,
<inline-formula id="IEq45">
<alternatives>
<tex-math id="M105">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{1}}$\end{document}</tex-math>
<mml:math id="M106">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq45.gif"></inline-graphic>
</alternatives>
</inline-formula>
must not be deleted or substituted in
<italic>L</italic>
<sub>1</sub>
and hence it must not be deleted or substituted in
<italic>L</italic>
<sub>2</sub>
either. Thus
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>2</sub>
. The ∗ due to the insertion in
<italic>L</italic>
<sub>1</sub>
must be matched by a substitution at
<italic>p</italic>
<sub>3</sub>
<
<italic>p</italic>
<sub>1</sub>
such that all of
<inline-formula id="IEq46">
<alternatives>
<tex-math id="M107">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{3}+1}, \dots, S_{p_{1}-1}$\end{document}</tex-math>
<mml:math id="M108">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq46.gif"></inline-graphic>
</alternatives>
</inline-formula>
are deleted in
<italic>L</italic>
<sub>2</sub>
.</p>
<p>By Rule 7,
<italic>p</italic>
<sub>1</sub>
cannot be the leftmost in
<italic>L</italic>
<sub>1</sub>
. For each
<italic>t</italic>
<
<italic>p</italic>
<sub>1</sub>
, right to left, and
<italic>y</italic>
=
<italic>t</italic>
+
<italic>p</italic>
<sub>3</sub>
<italic>p</italic>
<sub>1</sub>
, we have the following:
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
is substituted in
<italic>L</italic>
<sub>1</sub>
because as the deletion of
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
in
<italic>L</italic>
<sub>1</sub>
is prohibited by Rules 2 and 5,
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
must be substituted in
<italic>L</italic>
<sub>2</sub>
again by Rule 2, and to match this ∗,
<italic>S</italic>
<sub>
<italic>t</italic>
</sub>
must be substituted in
<italic>L</italic>
<sub>1</sub>
.</p>
<p>But this makes Rule 8 applicable to
<italic>L</italic>
<sub>1</sub>
and
<italic>L</italic>
<sub>1</sub>
must have been skipped which is not possible. This case has been illustrated in Fig.
<xref rid="Fig4" ref-type="fig">4</xref>
<xref rid="Fig4" ref-type="fig">c</xref>
.</p>
<p>
<bold>
<italic>Case 9</italic>
</bold>
</p>
<p>This case has been illustrated in Fig.
<xref rid="Fig4" ref-type="fig">4</xref>
<xref rid="Fig4" ref-type="fig">d</xref>
. Due to Rules 4, 6 and 9,
<inline-formula id="IEq47">
<alternatives>
<tex-math id="M109">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{1}},S_{p_{2}}$\end{document}</tex-math>
<mml:math id="M110">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq47.gif"></inline-graphic>
</alternatives>
</inline-formula>
must not be deleted or substituted in
<italic>L</italic>
<sub>1</sub>
,
<italic>L</italic>
<sub>2</sub>
. The insertion at
<italic>p</italic>
<sub>2</sub>
in
<italic>L</italic>
<sub>2</sub>
must be matched by a substitution at a position
<italic>p</italic>
<sub>3</sub>
in
<italic>L</italic>
<sub>1</sub>
such that
<italic>p</italic>
<sub>1</sub>
<
<italic>p</italic>
<sub>3</sub>
<
<italic>p</italic>
<sub>2</sub>
and all of
<inline-formula id="IEq48">
<alternatives>
<tex-math id="M111">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{p_{3}+1},\ldots,S_{p_{2}-1}$\end{document}</tex-math>
<mml:math id="M112">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>3</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:mo></mml:mo>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>p</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo></mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq48.gif"></inline-graphic>
</alternatives>
</inline-formula>
must be deleted in
<italic>L</italic>
<sub>1</sub>
.</p>
<p>Now for each position
<italic>y</italic>
, from right to left, where
<italic>p</italic>
<sub>1</sub>
<
<italic>y</italic>
<
<italic>p</italic>
<sub>2</sub>
,
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
is either deleted or substituted in
<italic>S</italic>
<sub>1</sub>
,
<italic>S</italic>
<sub>
<italic>y</italic>
</sub>
cannot be deleted in
<italic>L</italic>
<sub>2</sub>
by Rules 2 and 5 and hence must be substituted in
<italic>L</italic>
<sub>2</sub>
, which again must be matched by a substitution at a position
<italic>t</italic>
in
<italic>L</italic>
<sub>1</sub>
such that
<italic>p</italic>
<sub>1</sub>
<
<italic>t</italic>
<
<italic>p</italic>
<sub>3</sub>
. However this is impossible as the number of possible
<italic>y</italic>
s is larger than the number of possible
<italic>t</italic>
s.</p>
<p>If all
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
s are distinct and
<italic>Γ</italic>
(
<italic>M</italic>
<sub>1</sub>
)=
<italic>Γ</italic>
(
<italic>M</italic>
<sub>2</sub>
) then applying Lemma 3 repeatedly for
<italic>h</italic>
=
<italic>d,d</italic>
−1,…,0 gives us the fact that starting
<italic>k</italic>
-mers
<inline-formula id="IEq49">
<alternatives>
<tex-math id="M113">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S_{j_{1},k_{1}},S_{j_{2},k_{2}}$\end{document}</tex-math>
<mml:math id="M114">
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>S</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:msub>
<mml:mrow>
<mml:mi>j</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
<mml:mo>,</mml:mo>
<mml:msub>
<mml:mrow>
<mml:mi>k</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:msub>
</mml:mrow>
</mml:msub>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq49.gif"></inline-graphic>
</alternatives>
</inline-formula>
as well as the corresponding edit operations in
<italic>T</italic>
<sub>1</sub>
,
<italic>T</italic>
<sub>2</sub>
for
<italic>M</italic>
<sub>1</sub>
,
<italic>M</italic>
<sub>2</sub>
must be the same. This is another way of stating the following theorem.</p>
</sec>
<sec id="d30e8518">
<title>
<bold>Theorem</bold>
<bold>1</bold>
.</title>
<p>If
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
s are all distinct then
<inline-formula id="IEq50">
<alternatives>
<tex-math id="M115">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\bar {M}_{l,d}(S)$\end{document}</tex-math>
<mml:math id="M116">
<mml:msub>
<mml:mrow>
<mml:mover accent="true">
<mml:mrow>
<mml:mi>M</mml:mi>
</mml:mrow>
<mml:mo>¯</mml:mo>
</mml:mover>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>,</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:msub>
<mml:mo>(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq50.gif"></inline-graphic>
</alternatives>
</inline-formula>
is duplication free.</p>
<p>In general
<italic>S</italic>
<sub>
<italic>j</italic>
</sub>
s are not distinct. However, as the input strings are random, the duplication due to repeated characters are limited. On instance (11,3) our algorithm generates each compact motif, on an average, 1.55 times using the rules compared to 3.63 times without the rules (see Fig.
<xref rid="Fig5" ref-type="fig">5</xref>
).
<fig id="Fig5">
<label>Fig. 5</label>
<caption>
<p>Histogram of number of times a motif is repeated with and without using the skipping rules 1–9</p>
</caption>
<graphic xlink:href="12864_2016_2789_Fig5_HTML" id="MO6"></graphic>
</fig>
</p>
<p>
<bold>Implementation</bold>
To track the deleted characters, instead of actually deleting we substitute them by a new symbol − not in
<italic>Σ</italic>
<sup></sup>
. We populate the motif trie
<italic>M</italic>
<sup>(
<italic>i</italic>
)</sup>
by calling
<italic>g</italic>
<italic>e</italic>
<italic>n</italic>
<italic>A</italic>
<italic>l</italic>
<italic>l</italic>
(
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
) given in Algorithm 2. Rules 1–8 are incorporated in
<italic>G</italic>
(
<italic>L,j</italic>
,
<italic>δ</italic>
,
<italic>β</italic>
,
<italic>α</italic>
),
<italic>H</italic>
(
<italic>L,j</italic>
,
<italic>β</italic>
,
<italic>α</italic>
) and
<italic>I</italic>
(
<italic>L,j</italic>
,
<italic>α</italic>
) which are shown in Algorithms 3, 4, and 5, respectively where
<italic>s</italic>
<italic>u</italic>
<italic>b</italic>
(
<italic>L,j</italic>
,
<italic>σ</italic>
) substitutes
<italic>L</italic>
<sub>
<italic>j</italic>
</sub>
by
<italic>σ</italic>
and
<italic>i</italic>
<italic>n</italic>
<italic>s</italic>
(
<italic>L,j</italic>
,
<italic>σ</italic>
) inserts
<italic>σ</italic>
just before
<italic>L</italic>
<sub>
<italic>j</italic>
</sub>
.</p>
<p>
<graphic xlink:href="12864_2016_2789_Figb_HTML.gif" id="MO7"></graphic>
</p>
<p>
<graphic xlink:href="12864_2016_2789_Figc_HTML.gif" id="MO8"></graphic>
</p>
<p>
<graphic xlink:href="12864_2016_2789_Figd_HTML.gif" id="MO9"></graphic>
</p>
<p>
<graphic xlink:href="12864_2016_2789_Fige_HTML.gif" id="MO10"></graphic>
</p>
</sec>
</sec>
<sec id="Sec8">
<title>Modified radix-sort for compact motifs</title>
<p>A simpler data structure alternative to tries for storing compact motifs could be an array. However, it becomes difficult to compute the intersection in (
<xref rid="Equ3" ref-type="">3</xref>
) as defined in (
<xref rid="Equ7" ref-type="">7</xref>
) when the compact motifs are stored in arrays. One straight-forward solution is to first expand the ∗s in the compact motifs, then sort the expanded motifs and finally compute the intersection by scanning through the two sorted arrays. This, to a great extent, wipes out the advantage using the ∗s in the compact motifs. However, we salvage execution time by executing a modified radix-sort that simultaneously expands and sorts the array of compact motifs: Compact-Radix-Sort(
<italic>A,l</italic>
) where the first parameter
<italic>A</italic>
represents the array of compact motifs and the second parameter represents the number of digits of the elements in
<italic>A</italic>
which is equal to the number of base positions
<italic>l</italic>
in a motif.</p>
<p>As in the standard radix-sort, our algorithm uses
<italic>l</italic>
phases, one for each base position in the motif. In the
<italic>i</italic>
th phase it sorts the motifs using bucket sort on the
<italic>i</italic>
th base of the motifs. However, in case of compact motifs, for each ∗ at a base position, the bucket counters for all
<italic>σ</italic>
<italic>Σ</italic>
are incremented. While reordering the motifs as per the bucket counts, if there is a ∗ at
<italic>i</italic>
th base position of a motif, |
<italic>Σ</italic>
| copies of the motif are created and they are placed at appropriate locations in the array after finalizing the correct
<italic>σ</italic>
for the ∗. The details are given in Algorithm 6. In each phase a bucket counter
<italic>B</italic>
and a cumulative counter
<italic>C</italic>
are used. The temporary array
<italic>T</italic>
stores the partially expanded motifs from the current phase.</p>
<p>
<bold>Discussion</bold>
We did an experiment to compare the time taken by the two approaches – (i) using the expanded motifs,
<italic>i.e.</italic>
, without using the wildcard character, and (ii) using compact motifs and sorting them using Compact-Radix-Sort. For a single input string of instance (16,3), the first approach generated in 24.4 s 198,991,822 expanded motifs in which 53,965,581 are unique. The second approach generated in 13.7 s 11,474,938 compact motifs with the same number of unique expanded motifs. This shows the effectiveness of the second approach.</p>
<p>
<graphic xlink:href="12864_2016_2789_Figf_HTML.gif" id="MO11"></graphic>
</p>
</sec>
<sec id="Sec9">
<title>Parallel algorithm</title>
<p>We now give our parallel algorithm in the multi-core shared memory setting. To process each input sequence
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
the algorithm uses
<italic>p</italic>
+1 threads. The main thread first prepares the workload for other
<italic>p</italic>
threads. A workload involves the generation of the neighborhood for a
<italic>k</italic>
-mer of
<italic>S</italic>
<sup>(
<italic>i</italic>
)</sup>
, where
<italic>l</italic>
<italic>d</italic>
<italic>k</italic>
<italic>l</italic>
+
<italic>d</italic>
. There are total
<inline-formula id="IEq51">
<alternatives>
<tex-math id="M117">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\sum _{k=l-d}^{l+d} (m-k+1) = (2d+1)(m-l+1)$\end{document}</tex-math>
<mml:math id="M118">
<mml:munderover>
<mml:mrow>
<mml:mo></mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mi>k</mml:mi>
<mml:mo>=</mml:mo>
<mml:mi>l</mml:mi>
<mml:mo></mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>d</mml:mi>
</mml:mrow>
</mml:munderover>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo></mml:mo>
<mml:mi>k</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
<mml:mo>=</mml:mo>
<mml:mo>(</mml:mo>
<mml:mn>2</mml:mn>
<mml:mi>d</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
<mml:mo>(</mml:mo>
<mml:mi>m</mml:mi>
<mml:mo></mml:mo>
<mml:mi>l</mml:mi>
<mml:mo>+</mml:mo>
<mml:mn>1</mml:mn>
<mml:mo>)</mml:mo>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq51.gif"></inline-graphic>
</alternatives>
</inline-formula>
workloads. The number of neighbors generated in the workloads are not the same due to the skipping of some neighbors using rules 1–9. For load balancing, we randomly and evenly distribute workloads to threads. Each thread first generates all the compact motifs in its workloads and then sort them using Compact-Radix-Sort. If
<italic>i</italic>
>2 then it removes all neighbors not present in
<italic>M</italic>
<sup>(
<italic>i</italic>
−1)</sup>
which is the set of common motifs of
<italic>S</italic>
<sup>(1)</sup>
,
<italic>S</italic>
<sup>(2)</sup>
,…,
<italic>S</italic>
<sup>(
<italic>i</italic>
−1)</sup>
. The master thread then merges the residue candidate motifs from all the
<italic>p</italic>
threads to compute
<italic>M</italic>
<sup>(
<italic>i</italic>
)</sup>
. The merging takes place in log2
<italic>p</italic>
phases in a binary tree fashion where the
<italic>j</italic>
th phase uses
<inline-formula id="IEq52">
<alternatives>
<tex-math id="M119">\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$2^{\log _{2}{p} - j}$\end{document}</tex-math>
<mml:math id="M120">
<mml:msup>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
<mml:mrow>
<mml:munder>
<mml:mrow>
<mml:mo>log</mml:mo>
</mml:mrow>
<mml:mrow>
<mml:mn>2</mml:mn>
</mml:mrow>
</mml:munder>
<mml:mi>p</mml:mi>
<mml:mo></mml:mo>
<mml:mi>j</mml:mi>
</mml:mrow>
</mml:msup>
</mml:math>
<inline-graphic xlink:href="12864_2016_2789_Article_IEq52.gif"></inline-graphic>
</alternatives>
</inline-formula>
threads each merging two sorted arrays of motifs.</p>
</sec>
</sec>
<sec id="Sec10">
<title>Results and discussion</title>
<p>We implemented our algorithms in C++ and evaluated on a Dell Precisions Workstation T7910 running RHEL 7.0 on two sockets each containing 8 Dual Intel Xeon Processors E5-2667 (8C HT, 20 MB Cache, 3.2 GHz) and 256 GB RAM. For our experiments we used only one of the two sockets. We generated random (
<italic>l,d</italic>
) instances according to Pevzner and Sze [
<xref ref-type="bibr" rid="CR2">2</xref>
] and as described in the background section. For every (
<italic>l,d</italic>
) combination we report the average time taken by 4 runs. We compare the following four implementations:
<list list-type="bullet">
<list-item>
<p>
<bold>EMS1:</bold>
A modified implementation of the algorithm in [
<xref ref-type="bibr" rid="CR13">13</xref>
] which considered the neighborhood of only
<italic>l</italic>
-mers whereas the modified version considers the neighborhood of all
<italic>k</italic>
-mers where
<italic>l</italic>
<italic>d</italic>
<italic>k</italic>
<italic>l</italic>
+
<italic>d</italic>
.</p>
</list-item>
<list-item>
<p>
<bold>EMS2:</bold>
A faster implementation of our sequential algorithm which uses tries for storing candidate motifs where each node of the trie stores an array of pointers to each children of the node. However, this makes the space required to store a tree node dependent on the size of the alphabet
<italic>Σ</italic>
.</p>
</list-item>
<list-item>
<p>
<bold>EMS2M:</bold>
A slightly slower but memory efficient implementation of our sequential algorithm where each node of the trie keeps two pointers: one to the leftmost child and the other to the immediate right sibling. Access to the other children are simulated using the sibling pointers.</p>
</list-item>
<list-item>
<p>
<bold>EMS2P:</bold>
Our parallel algorithm which uses arrays for storing motifs. We experimented with
<italic>p</italic>
=1,2,4,8,16 threads.</p>
</list-item>
</list>
</p>
<p>We run the four algorithms on the challenging instances (8,1), (12,2), (16,3) and on the instances (9,2), (11,3), (13,4) which are challenging for PMS and have been used for experimentation in [
<xref ref-type="bibr" rid="CR13">13</xref>
]. We report the runtime and the memory usage of the four algorithms in Table
<xref rid="Tab3" ref-type="table">3</xref>
.
<table-wrap id="Tab3">
<label>Table 3</label>
<caption>
<p>Comparison between EMS1 and three implementations of EMS2</p>
</caption>
<table frame="hsides" rules="groups">
<thead>
<tr>
<th align="left">Instance</th>
<th align="left">Metric</th>
<th align="left">EMS1</th>
<th align="left">EMS2</th>
<th align="left">EMS2M</th>
<th align="left" colspan="5">EMS2P threads</th>
</tr>
<tr>
<th align="left"></th>
<th align="left"></th>
<th align="left"></th>
<th align="left"></th>
<th align="left"></th>
<th align="left">1</th>
<th align="left">2</th>
<th align="left">4</th>
<th align="left">8</th>
<th align="left">16</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">(8,1)</td>
<td align="left">time</td>
<td align="left">0.11 s</td>
<td align="left">0.13 s</td>
<td align="left">0.12 s</td>
<td align="left">0.09 s</td>
<td align="left">0.08 s</td>
<td align="left">0.06 s</td>
<td align="left">0.05 s</td>
<td align="left">0.06 s</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">memory</td>
<td align="left">2.69 MB</td>
<td align="left">4.25 MB</td>
<td align="left">3.17 MB</td>
<td align="left">2.67 MB</td>
<td align="left">3.20 MB</td>
<td align="left">3.55 MB</td>
<td align="left">6.02 MB</td>
<td align="left">7.99 MB</td>
</tr>
<tr>
<td align="left">(12,2)</td>
<td align="left">time</td>
<td align="left">19.87 s</td>
<td align="left">15.60 s</td>
<td align="left">16.62 s</td>
<td align="left">2.71 s</td>
<td align="left">1.94 s</td>
<td align="left">1.44 s</td>
<td align="left">0.89 s</td>
<td align="left">0.55 s</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">memory</td>
<td align="left">34.28 MB</td>
<td align="left">210.47 MB</td>
<td align="left">126.91 MB</td>
<td align="left">84.98 MB</td>
<td align="left">104.60 MB</td>
<td align="left">125.18 MB</td>
<td align="left">142.82 MB</td>
<td align="left">150.23 MB</td>
</tr>
<tr>
<td align="left">(16,3)</td>
<td align="left">time</td>
<td align="left">1.74 h</td>
<td align="left">23.73 m</td>
<td align="left">26.79 m</td>
<td align="left">3.73 m</td>
<td align="left">2.32 m</td>
<td align="left">1.38 m</td>
<td align="left">48.58 s</td>
<td align="left">36.93 s</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">memory</td>
<td align="left">8.39 GB</td>
<td align="left">11.62 GB</td>
<td align="left">6.97 GB</td>
<td align="left">8.55 GB</td>
<td align="left">10.21 GB</td>
<td align="left">10.53 GB</td>
<td align="left">9.84 GB</td>
<td align="left">9.91 GB</td>
</tr>
<tr>
<td align="left">(9,2)</td>
<td align="left">time</td>
<td align="left">10.84 s</td>
<td align="left">1.72 s</td>
<td align="left">3.02 s</td>
<td align="left">1.12 s</td>
<td align="left">0.96 s</td>
<td align="left">0.78 s</td>
<td align="left">0.49 s</td>
<td align="left">0.35 s</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">memory</td>
<td align="left">3.44 MB</td>
<td align="left">26.67 MB</td>
<td align="left">17.04 MB</td>
<td align="left">42.86 MB</td>
<td align="left">57.76 MB</td>
<td align="left">54.77 MB</td>
<td align="left">59.85 MB</td>
<td align="left">66.53 MB</td>
</tr>
<tr>
<td align="left">(11,3)</td>
<td align="left">time</td>
<td align="left">33.48 m</td>
<td align="left">1.91 m</td>
<td align="left">3.57 m</td>
<td align="left">45.85 s</td>
<td align="left">30.78 s</td>
<td align="left">19.68 s</td>
<td align="left">13.49 s</td>
<td align="left">9.78 s</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">memory</td>
<td align="left">92.86 MB</td>
<td align="left">477.12 MB</td>
<td align="left">313.33 MB</td>
<td align="left">2.27 GB</td>
<td align="left">2.63 GB</td>
<td align="left">2.65 GB</td>
<td align="left">2.55 GB</td>
<td align="left">2.60 GB</td>
</tr>
<tr>
<td align="left">(13,4)</td>
<td align="left">time</td>
<td align="left">-</td>
<td align="left">1.08 h</td>
<td align="left">1.76 h</td>
<td align="left">44.03 m</td>
<td align="left">26.16 m</td>
<td align="left">14.51 m</td>
<td align="left">8.62 m</td>
<td align="left">6.82 m</td>
</tr>
<tr>
<td align="left"></td>
<td align="left">memory</td>
<td align="left">-</td>
<td align="left">8.26 GB</td>
<td align="left">5.58 GB</td>
<td align="left">149.60 GB</td>
<td align="left">179.66 GB</td>
<td align="left">180.13 GB</td>
<td align="left">168.80 GB</td>
<td align="left">172.74 GB</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p>Time is in seconds (s), minutes (m) or hours (h). An empty cell implies the algorithm did not complete in the stipulated 72 h</p>
</table-wrap-foot>
</table-wrap>
</p>
<p>Our efficient neighborhood generation enables our algorithm to solve instance (13,4) in less than two hours which EMS1 could not solve even in 3 days. The factor by which EMS2 takes more memory compared to EMS1 gradually decreases as instances become harder. As EMS2 stores 4 child pointers for
<italic>A,C,G,T</italic>
in each node of the motif trie whereas EMS2M simulates access to children using only 2 pointers, EMS2 is faster. Memory reduction in EMS2M is not exactly by a factor 2(=4/2) because we also keep a bit vector in each node to represent the subset of {
<italic>A,C,G,T</italic>
} a child corresponds to. The memory reduction would be significant for protein strings.</p>
<p>Our parallel algorithm EMS2P using one thread is significantly faster than the sequential algorithms EMS2 and EMS2M but uses more memory. This space-time trade off is due to the fact that the arrays are faster to access but the tries use lesser memory. Moreover, the repeated motifs are uniquely stored in a single leaf node in the trie but stored separately in the array. The scaling performance using multiple threads are shown in Fig.
<xref rid="Fig6" ref-type="fig">6</xref>
where we plot the ratio of time taken by
<italic>p</italic>
threads to the time taken by a single thread on the Y-axis. The time required for handling 16 threads turns out to be costlier than actually processing the motifs in the smallest instance (8,1). We observe speed up consistent across other bigger instances. For example, instance (16,3) takes about 224 s using 1 thread and 37 s using 16 threads. This gives more than 600 % scaling performance using 16 threads.
<fig id="Fig6">
<label>Fig. 6</label>
<caption>
<p>Scaling performance of our parallel algorithm</p>
</caption>
<graphic xlink:href="12864_2016_2789_Fig6_HTML" id="MO12"></graphic>
</fig>
</p>
</sec>
<sec id="Sec11" sec-type="conclusion">
<title>Conclusions</title>
<p>We presented several efficient sequential and parallel algorithms for the EMS problem. Our algorithms use some novel and elegant rules to explore the candidate motifs in such a way that only a small fraction of the candidate motifs are explored twice or more. In fact, we also proved that these rules are close to ideal in the sense that no candidate motif is explored twice if the characters in the input string are all distinct. This condition may not be practical and ideas from [
<xref ref-type="bibr" rid="CR14">14</xref>
] can be used when the characters in the input string are repeated. Nevertheless, the rules help because the instances are randomly generated and the
<italic>k</italic>
-mers in the input string are not much frequent. The second reason for the efficiency of our sequential algorithms is the use of a trie based data structure to compactly store the motifs. Our parallel algorithm stores candidate motifs in an array and uses a modified radix-sort based method for filtering out invalid candidate motifs.</p>
<p>Our algorithms pushed up the state-of-the-art of EMS solvers to a state where the challenging instance (16,3) is solved in slightly more than half a minute using 16 threads. Future work could be to solve harder instances, including those involving protein strings, and possibly using many-core distributed algorithms.</p>
</sec>
</body>
<back>
<app-group>
<app id="App1">
<sec id="Sec12">
<title>Additional file</title>
<p>
<media position="anchor" xlink:href="12864_2016_2789_MOESM1_ESM.pdf" id="MOESM1">
<label>Additional file 1</label>
<caption>
<p>Expected number of spurious motifs. This file gives the expression for the expected number of spurious (
<italic>l,d</italic>
)-motifs in
<italic>n</italic>
random strings of length
<italic>m</italic>
from the alphabet
<italic>Σ</italic>
. (PDF 143 kb)</p>
</caption>
</media>
</p>
</sec>
</app>
</app-group>
<ack>
<p>This work has been supported in part by the NIH grant R01-LM010101 and NSF grant 1447711.</p>
<sec id="d30e9526">
<title>Declarations</title>
<p>Publication of this article was funded by the NIH grant R01-LM010101 and NSF grant 1447711. This article has been published as part of BMC Genomics Vol 17 Suppl 4 2016: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2015: genomics. The full contents of the supplement are available online at
<ext-link ext-link-type="uri" xlink:href="https://github.com/soumitrakp/ems2.git">https://github.com/soumitrakp/ems2.git</ext-link>
.</p>
</sec>
<sec id="d30e9536">
<title>Availability</title>
<p>A C++ based implementation of our algorithm can be found at the following github public repository:</p>
<p>
<ext-link ext-link-type="uri" xlink:href="https://github.com/soumitrakp/ems2.git">https://github.com/soumitrakp/ems2.git</ext-link>
.</p>
</sec>
<sec id="d30e9547">
<title>Authors’ contributions</title>
<p>SP and SR conceived the study. SP implemented the algorithms and PX carried out the experiments. SP and SR analyzed the results and wrote the paper. All authors reviewed the manuscript. All authors read and approved the final manuscript.</p>
</sec>
<sec id="d30e9552">
<title>Competing interests</title>
<p>The authors declare that they have no competing interests.</p>
</sec>
</ack>
<ref-list id="Bib1">
<title>References</title>
<ref id="CR1">
<label>1</label>
<mixed-citation publication-type="other">Nicolae M, Rajasekaran S. qPMS9: An Efficient Algorithm for Quorum Planted Motif Search. Nat Sci Rep. 2015;5. doi:
<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.1038/srep07813">10.1038/srep07813</ext-link>
.</mixed-citation>
</ref>
<ref id="CR2">
<label>2</label>
<mixed-citation publication-type="other">Floratou A, Tata S, Patel JM. Efficient and Accurate Discovery of Patterns in Sequence Data Sets. IEEE Trans Knowl Data Eng. 2011; 23(8):1154–68.
<ext-link ext-link-type="uri" xlink:href="http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.69">http://doi.ieeecomputersociety.org/10.1109/TKDE.2011.69</ext-link>
.</mixed-citation>
</ref>
<ref id="CR3">
<label>3</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Nicolae</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Rajasekaran</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Efficient Sequential and Parallel Algorithms for Planted Motif Search</article-title>
<source>BMC Bioinformatics</source>
<year>2014</year>
<volume>15</volume>
<issue>1</issue>
<fpage>34</fpage>
<pub-id pub-id-type="doi">10.1186/1471-2105-15-34</pub-id>
<pub-id pub-id-type="pmid">24479443</pub-id>
</element-citation>
</ref>
<ref id="CR4">
<label>4</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Tanaka</surname>
<given-names>S</given-names>
</name>
</person-group>
<article-title>Improved Exact Enumerative Algorithms for the Planted (
<italic>l,d</italic>
)-motif Search Problem</article-title>
<source>IEEE/ACM Trans Comput Biol Bioinformatics (TCBB)</source>
<year>2014</year>
<volume>11</volume>
<issue>2</issue>
<fpage>361</fpage>
<lpage>74</lpage>
<pub-id pub-id-type="doi">10.1109/TCBB.2014.2306842</pub-id>
</element-citation>
</ref>
<ref id="CR5">
<label>5</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname>
<given-names>Q</given-names>
</name>
<name>
<surname>Huo</surname>
<given-names>H</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>Y</given-names>
</name>
<name>
<surname>Guo</surname>
<given-names>H</given-names>
</name>
</person-group>
<article-title>PairMotif: A new pattern-driven algorithm for planted (
<italic>l,d</italic>
) DNA motif search</article-title>
<source>PloS One</source>
<year>2012</year>
<volume>7</volume>
<issue>10</issue>
<fpage>48442</fpage>
<pub-id pub-id-type="doi">10.1371/journal.pone.0048442</pub-id>
</element-citation>
</ref>
<ref id="CR6">
<label>6</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Karlin</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Ost</surname>
<given-names>F</given-names>
</name>
<name>
<surname>Blaisdell</surname>
<given-names>BE</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Waterman</surname>
<given-names>MS</given-names>
</name>
</person-group>
<article-title>Patterns in DNA and Amino Acid Sequences and Their Statistical Significance</article-title>
<source>Mathematical Methods for DNA Sequences</source>
<year>1989</year>
<publisher-loc>Boca Raton, FL, USA</publisher-loc>
<publisher-name>CRC Press Inc</publisher-name>
</element-citation>
</ref>
<ref id="CR7">
<label>7</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Rocke</surname>
<given-names>E</given-names>
</name>
<name>
<surname>Tompa</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>An Algorithm for Finding Novel Gapped Motifs in DNA Sequences</article-title>
<source>Proceedings of the Second Annual International Conference on Computational Molecular Biology</source>
<year>1998</year>
<publisher-loc>New York, NY, USA</publisher-loc>
<publisher-name>ACM</publisher-name>
</element-citation>
</ref>
<ref id="CR8">
<label>8</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Sagot</surname>
<given-names>MF</given-names>
</name>
</person-group>
<article-title>Spelling Approximate Repeated or Common Motifs using a Suffix Tree</article-title>
<source>LATIN’98: Theoretical Informatics</source>
<year>1998</year>
<publisher-loc>Brazil</publisher-loc>
<publisher-name>Springer</publisher-name>
</element-citation>
</ref>
<ref id="CR9">
<label>9</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Lanctot</surname>
<given-names>JK</given-names>
</name>
<name>
<surname>Li</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Ma</surname>
<given-names>B</given-names>
</name>
<name>
<surname>Wang</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Zhang</surname>
<given-names>L</given-names>
</name>
</person-group>
<article-title>Distinguishing string selection problems</article-title>
<source>Inform Comput</source>
<year>2003</year>
<volume>185</volume>
<issue>1</issue>
<fpage>41</fpage>
<lpage>55</lpage>
<pub-id pub-id-type="doi">10.1016/S0890-5401(03)00057-9</pub-id>
</element-citation>
</ref>
<ref id="CR10">
<label>10</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Adebiyi</surname>
<given-names>EF</given-names>
</name>
<name>
<surname>Kaufmann</surname>
<given-names>M</given-names>
</name>
</person-group>
<person-group person-group-type="editor">
<name>
<surname>Guigó</surname>
<given-names>R</given-names>
</name>
<name>
<surname>Gusfield</surname>
<given-names>D</given-names>
</name>
</person-group>
<article-title>Extracting Common Motifs under the Levenshtein Measure: Theory and Experimentation</article-title>
<source>Algorithms in Bioinformatics: Second International Workshop, WABI 2002 Rome, Italy, September 17–21, 2002 Proceedings</source>
<year>2002</year>
<publisher-loc>Berlin, Heidelberg</publisher-loc>
<publisher-name>Springer Berlin Heidelberg</publisher-name>
</element-citation>
</ref>
<ref id="CR11">
<label>11</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wang</surname>
<given-names>X</given-names>
</name>
<name>
<surname>Miao</surname>
<given-names>Y</given-names>
</name>
</person-group>
<article-title>GAEM: A Hybrid Algorithm Incorporating GA with EM for Planted Edited Motif Finding Problem</article-title>
<source>Curr Bioinformatics</source>
<year>2014</year>
<volume>9</volume>
<issue>5</issue>
<fpage>463</fpage>
<lpage>9</lpage>
<pub-id pub-id-type="doi">10.2174/1574893609666140901222327</pub-id>
</element-citation>
</ref>
<ref id="CR12">
<label>12</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Rajasekaran</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Balla</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Huang</surname>
<given-names>CH</given-names>
</name>
<name>
<surname>Thapar</surname>
<given-names>V</given-names>
</name>
<name>
<surname>Gryk</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Maciejewski</surname>
<given-names>M</given-names>
</name>
<name>
<surname>Schiller</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>High-performance Exact Algorithms for Motif Search</article-title>
<source>J Clin Monitoring Comput</source>
<year>2005</year>
<volume>19</volume>
<issue>4–5</issue>
<fpage>319</fpage>
<lpage>28</lpage>
<pub-id pub-id-type="doi">10.1007/s10877-005-0677-y</pub-id>
</element-citation>
</ref>
<ref id="CR13">
<label>13</label>
<element-citation publication-type="journal">
<person-group person-group-type="author">
<name>
<surname>Pathak</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Rajasekaran</surname>
<given-names>S</given-names>
</name>
<name>
<surname>Nicolae</surname>
<given-names>M</given-names>
</name>
</person-group>
<article-title>EMS1: An Elegant Algorithm for Edit Distance Based Motif Search</article-title>
<source>Int J Foundations Comput Sci</source>
<year>2013</year>
<volume>24</volume>
<issue>04</issue>
<fpage>473</fpage>
<lpage>86</lpage>
<pub-id pub-id-type="doi">10.1142/S0129054113500159</pub-id>
</element-citation>
</ref>
<ref id="CR14">
<label>14</label>
<element-citation publication-type="book">
<person-group person-group-type="author">
<name>
<surname>Knuth</surname>
<given-names>DE</given-names>
</name>
</person-group>
<source>The Art of Computer Programming, Volume 4, Generating All Tuples and Permutations, Fascicle 2</source>
<year>2005</year>
<publisher-loc>New Jersey, USA</publisher-loc>
<publisher-name>Addison Wesley</publisher-name>
</element-citation>
</ref>
</ref-list>
</back>
</pmc>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 0002949 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 0002949 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     
   |texte=   
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021