Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Matching a set of strings with variable length don’t cares

Identifieur interne : 00C520 ( Main/Merge ); précédent : 00C519; suivant : 00C521

Matching a set of strings with variable length don’t cares

Auteurs : G. Kucherov [France] ; M. Rusinowitch [France]

Source :

RBID : Pascal:97-0339249

Descripteurs français

English descriptors

Abstract

Copyright (c) 1997 Elsevier Science B.V. All rights reserved. Given an alphabet A, a pattern p is a word v1@⋯@vm, where vi∈A∗ and @¬∈A is a distinguished symbol called a variable length don’t care symbol. Pattern p matches a text t∈A∗ if t=u0v1u1…um-1vmum for some u0&z.upto;um∈A∗. We address the following problem: given a set P of patterns and a text t, test whether one of the patterns of P matches t. We describe an algorithm that solves the problem in time O((|t|+|P|)log |P|). In contrast to most of the existing string matching algorithms (such as that of Aho-Corasick) our algorithm is not composed of two successive stages - preprocessing the pattern (resp. the text) and reading through the text (resp. the pattern) - but has these two stages essentially interleaved. Our approach is based on using the DAWG (Directed Acyclic Word Graph), a data structure studied by A. Blumer J. Blumer, Haussler, Ehrenfeucht, Crochemore, Chen, Seiferas.

Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:97-0339249

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Matching a set of strings with variable length don’t cares</title>
<author>
<name sortKey="Kucherov, G" sort="Kucherov, G" uniqKey="Kucherov G" first="G." last="Kucherov">G. Kucherov</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rusinowitch, M" sort="Rusinowitch, M" uniqKey="Rusinowitch M" first="M." last="Rusinowitch">M. Rusinowitch</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">97-0339249</idno>
<date when="1997">1997</date>
<idno type="stanalyst">PASCAL 97-0339249 Elsevier</idno>
<idno type="RBID">Pascal:97-0339249</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000C58</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000C21</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000C23</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000C23</idno>
<idno type="wicri:doubleKey">0304-3975:1997:Kucherov G:matching:a:set</idno>
<idno type="wicri:Area/Main/Merge">00C520</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Matching a set of strings with variable length don’t cares</title>
<author>
<name sortKey="Kucherov, G" sort="Kucherov, G" uniqKey="Kucherov G" first="G." last="Kucherov">G. Kucherov</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rusinowitch, M" sort="Rusinowitch, M" uniqKey="Rusinowitch M" first="M." last="Rusinowitch">M. Rusinowitch</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>INRIA-Lorraine and CRIN/CNRS, Campus Scientifique, 615, rue du Jardin Botanique, BP 101</s1>
<s2>54602 Villers-lès-Nancy</s2>
<s3>FRA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
<settlement type="city">Villers-lès-Nancy</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Theoretical computer science</title>
<title level="j" type="abbreviated">Theor. comput. sci.</title>
<idno type="ISSN">0304-3975</idno>
<imprint>
<date when="1997">1997</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Theoretical computer science</title>
<title level="j" type="abbreviated">Theor. comput. sci.</title>
<idno type="ISSN">0304-3975</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithm</term>
<term>Alphabet</term>
<term>Pattern matching</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Alphabet</term>
<term>Concordance forme</term>
<term>Algorithme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Copyright (c) 1997 Elsevier Science B.V. All rights reserved. Given an alphabet A, a pattern p is a word v
<sub>1</sub>
@⋯@v
<sub>m</sub>
, where v
<sub>i</sub>
∈A∗ and @¬∈A is a distinguished symbol called a variable length don’t care symbol. Pattern p matches a text t∈A∗ if t=u
<sub>0</sub>
v
<sub>1</sub>
u
<sub>1</sub>
…u
<sub>m-1</sub>
v
<sub>m</sub>
u
<sub>m</sub>
for some u
<sub>0</sub>
&z.upto;u
<sub>m</sub>
∈A∗. We address the following problem: given a set P of patterns and a text t, test whether one of the patterns of P matches t. We describe an algorithm that solves the problem in time O((|t|+|P|)log |P|). In contrast to most of the existing string matching algorithms (such as that of Aho-Corasick) our algorithm is not composed of two successive stages - preprocessing the pattern (resp. the text) and reading through the text (resp. the pattern) - but has these two stages essentially interleaved. Our approach is based on using the DAWG (Directed Acyclic Word Graph), a data structure studied by A. Blumer J. Blumer, Haussler, Ehrenfeucht, Crochemore, Chen, Seiferas.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
<region>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Villers-lès-Nancy</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Grand Est">
<name sortKey="Kucherov, G" sort="Kucherov, G" uniqKey="Kucherov G" first="G." last="Kucherov">G. Kucherov</name>
</region>
<name sortKey="Rusinowitch, M" sort="Rusinowitch, M" uniqKey="Rusinowitch M" first="M." last="Rusinowitch">M. Rusinowitch</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 00C520 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 00C520 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     Pascal:97-0339249
   |texte=   Matching a set of strings with variable length don’t cares
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022