Matching a set of strings with variable length don’t cares
Identifieur interne : 00C520 ( Main/Merge ); précédent : 00C519; suivant : 00C521Matching a set of strings with variable length don’t cares
Auteurs : G. Kucherov [France] ; M. Rusinowitch [France]Source :
- Theoretical computer science [ 0304-3975 ] ; 1997.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Copyright (c) 1997 Elsevier Science B.V. All rights reserved. Given an alphabet A, a pattern p is a word v1@⋯@vm, where vi∈A∗ and @¬∈A is a distinguished symbol called a variable length don’t care symbol. Pattern p matches a text t∈A∗ if t=u0v1u1…um-1vmum for some u0&z.upto;um∈A∗. We address the following problem: given a set P of patterns and a text t, test whether one of the patterns of P matches t. We describe an algorithm that solves the problem in time O((|t|+|P|)log |P|). In contrast to most of the existing string matching algorithms (such as that of Aho-Corasick) our algorithm is not composed of two successive stages - preprocessing the pattern (resp. the text) and reading through the text (resp. the pattern) - but has these two stages essentially interleaved. Our approach is based on using the DAWG (Directed Acyclic Word Graph), a data structure studied by A. Blumer J. Blumer, Haussler, Ehrenfeucht, Crochemore, Chen, Seiferas.
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000C58
- to stream PascalFrancis, to step Curation: 000C21
- to stream PascalFrancis, to step Checkpoint: 000C23
Links to Exploration step
Pascal:97-0339249Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Matching a set of strings with variable length don’t cares</title>
<author><name sortKey="Kucherov, G" sort="Kucherov, G" uniqKey="Kucherov G" first="G." last="Kucherov">G. Kucherov</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rusinowitch, M" sort="Rusinowitch, M" uniqKey="Rusinowitch M" first="M." last="Rusinowitch">M. Rusinowitch</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">97-0339249</idno>
<date when="1997">1997</date>
<idno type="stanalyst">PASCAL 97-0339249 Elsevier</idno>
<idno type="RBID">Pascal:97-0339249</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000C58</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000C21</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000C23</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000C23</idno>
<idno type="wicri:doubleKey">0304-3975:1997:Kucherov G:matching:a:set</idno>
<idno type="wicri:Area/Main/Merge">00C520</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Matching a set of strings with variable length don’t cares</title>
<author><name sortKey="Kucherov, G" sort="Kucherov, G" uniqKey="Kucherov G" first="G." last="Kucherov">G. Kucherov</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author><name sortKey="Rusinowitch, M" sort="Rusinowitch, M" uniqKey="Rusinowitch M" first="M." last="Rusinowitch">M. Rusinowitch</name>
<affiliation wicri:level="3"><inist:fA14 i1="01"><s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName><region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Theoretical computer science</title>
<title level="j" type="abbreviated">Theor. comput. sci.</title>
<idno type="ISSN">0304-3975</idno>
<imprint><date when="1997">1997</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Theoretical computer science</title>
<title level="j" type="abbreviated">Theor. comput. sci.</title>
<idno type="ISSN">0304-3975</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithm</term>
<term>Alphabet</term>
<term>Pattern matching</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Alphabet</term>
<term>Concordance forme</term>
<term>Algorithme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Copyright (c) 1997 Elsevier Science B.V. All rights reserved. Given an alphabet A, a pattern p is a word v<sub>1</sub>
@⋯@v<sub>m</sub>
, where v<sub>i</sub>
∈A∗ and @¬∈A is a distinguished symbol called a variable length don’t care symbol. Pattern p matches a text t∈A∗ if t=u<sub>0</sub>
v<sub>1</sub>
u<sub>1</sub>
…u<sub>m-1</sub>
v<sub>m</sub>
u<sub>m</sub>
for some u<sub>0</sub>
&z.upto;u<sub>m</sub>
∈A∗. We address the following problem: given a set P of patterns and a text t, test whether one of the patterns of P matches t. We describe an algorithm that solves the problem in time O((|t|+|P|)log |P|). In contrast to most of the existing string matching algorithms (such as that of Aho-Corasick) our algorithm is not composed of two successive stages - preprocessing the pattern (resp. the text) and reading through the text (resp. the pattern) - but has these two stages essentially interleaved. Our approach is based on using the DAWG (Directed Acyclic Word Graph), a data structure studied by A. Blumer J. Blumer, Haussler, Ehrenfeucht, Crochemore, Chen, Seiferas.</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement><li>Villers-lès-Nancy</li>
</settlement>
</list>
<tree><country name="France"><region name="Grand Est"><name sortKey="Kucherov, G" sort="Kucherov, G" uniqKey="Kucherov G" first="G." last="Kucherov">G. Kucherov</name>
</region>
<name sortKey="Rusinowitch, M" sort="Rusinowitch, M" uniqKey="Rusinowitch M" first="M." last="Rusinowitch">M. Rusinowitch</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 00C520 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 00C520 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Merge |type= RBID |clé= Pascal:97-0339249 |texte= Matching a set of strings with variable length don’t cares }}
![]() | This area was generated with Dilib version V0.6.33. | ![]() |