Serveur d'exploration sur SGML

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Algebras for Querying Text Regions: Expressive Power and Optimization

Identifieur interne : 002499 ( Istex/Corpus ); précédent : 002498; suivant : 002500

Algebras for Querying Text Regions: Expressive Power and Optimization

Auteurs : Mariano P. Consens ; Tova Milo

Source :

RBID : ISTEX:8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83

English descriptors

Abstract

Abstract: There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as with SGML). An important aspect of managing structured and semistructured textual data consists of supporting the efficient retrieval of text components based both on their content and on their structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. Theregion algebrastudied is a set-at-a-time algebra for manipulatingtext regions(substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in use in commercial text retrieval systems and can form the basis for supporting SQL-like access to textual data. By presenting a close relationship between the region algebra and the monadic first order theory of finite binary trees, we show that queries in the algebra can be optimized, in the sense that equivalence to less expensive expressions can be tested. This optimization can be difficult (co-NP-hard in the general case), but there is an important class of queries that can be optimized in polynomial time. On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language and consider one such example.

Url:
DOI: 10.1006/jcss.1998.1564

Links to Exploration step

ISTEX:8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Algebras for Querying Text Regions: Expressive Power and Optimization</title>
<author>
<name sortKey="Consens, Mariano P" sort="Consens, Mariano P" uniqKey="Consens M" first="Mariano P." last="Consens">Mariano P. Consens</name>
<affiliation>
<mods:affiliation>Department of Computer Science, University of Waterloo, Waterloo, Canada, N2L 3G1 f1 E-mail: mconsens@uwaterloo.caf1</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Milo, Tova" sort="Milo, Tova" uniqKey="Milo T" first="Tova" last="Milo">Tova Milo</name>
<affiliation>
<mods:affiliation>Department of Computer Science, Tel Aviv University, Tel Aviv, Israel, 69978 f2 E-mail: milo@math.tau.ac.ilf2</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1006/jcss.1998.1564</idno>
<idno type="url">https://api.istex.fr/ark:/67375/6H6-36VTK1LS-3/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002499</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">002499</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Algebras for Querying Text Regions: Expressive Power and Optimization</title>
<author>
<name sortKey="Consens, Mariano P" sort="Consens, Mariano P" uniqKey="Consens M" first="Mariano P." last="Consens">Mariano P. Consens</name>
<affiliation>
<mods:affiliation>Department of Computer Science, University of Waterloo, Waterloo, Canada, N2L 3G1 f1 E-mail: mconsens@uwaterloo.caf1</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Milo, Tova" sort="Milo, Tova" uniqKey="Milo T" first="Tova" last="Milo">Tova Milo</name>
<affiliation>
<mods:affiliation>Department of Computer Science, Tel Aviv University, Tel Aviv, Israel, 69978 f2 E-mail: milo@math.tau.ac.ilf2</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Journal of Computer and System Sciences</title>
<title level="j" type="abbrev">YJCSS</title>
<idno type="ISSN">0022-0000</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1998">1998</date>
<biblScope unit="volume">57</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="272">272</biblScope>
<biblScope unit="page" to="288">288</biblScope>
</imprint>
<idno type="ISSN">0022-0000</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0022-0000</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="Teeft" xml:lang="en">
<term>Academic press</term>
<term>Algebra</term>
<term>Algebra expression</term>
<term>Algebra expressions</term>
<term>Algebra queries</term>
<term>Anchor regions</term>
<term>Binary strings</term>
<term>Certain queries</term>
<term>Computer science</term>
<term>Consens</term>
<term>Corresponding formula</term>
<term>Corresponding region instances</term>
<term>Database</term>
<term>Direct inclusion</term>
<term>Direct inclusions</term>
<term>Direct prefix</term>
<term>Disjoint</term>
<term>Disjoint regions</term>
<term>Distinct word</term>
<term>Edges state</term>
<term>Emptiness testing</term>
<term>Exact position</term>
<term>Expressible</term>
<term>Expression proc</term>
<term>Expressive power</term>
<term>Fmft</term>
<term>Fmft formula</term>
<term>Fmft formulas</term>
<term>Fmft model</term>
<term>Fmft models</term>
<term>General case</term>
<term>Header</term>
<term>Hierarchical</term>
<term>Hierarchical instances</term>
<term>Hierarchical structure</term>
<term>Html</term>
<term>Html documents</term>
<term>Ieee data</term>
<term>Important class</term>
<term>Important properties</term>
<term>Inclusion</term>
<term>Inclusion expressions</term>
<term>Induction assumption</term>
<term>Induction step</term>
<term>Isomorphic</term>
<term>Lexicographical order</term>
<term>Longest path</term>
<term>Mapping</term>
<term>Mapping region</term>
<term>Milo</term>
<term>Monadic theory</term>
<term>Negative side</term>
<term>Next theorem</term>
<term>Node</term>
<term>Optimization</term>
<term>Optimized</term>
<term>Order relations</term>
<term>Order relationships</term>
<term>Other direction</term>
<term>Other hand</term>
<term>Other procedures</term>
<term>Other region</term>
<term>Other regions</term>
<term>Other word</term>
<term>Outermost</term>
<term>Outermost regions</term>
<term>Pattern language</term>
<term>Polynomial time</term>
<term>Possible extensions</term>
<term>Precedence relationship</term>
<term>Prefix</term>
<term>Prefix order</term>
<term>Previous section</term>
<term>Previous sections</term>
<term>Proc</term>
<term>Proc regions</term>
<term>Prog proc</term>
<term>Program files</term>
<term>Programming language</term>
<term>Query</term>
<term>Reduction mapping</term>
<term>Region</term>
<term>Region algebra</term>
<term>Region algebra expression</term>
<term>Region expressions</term>
<term>Region inclusion graph</term>
<term>Region index</term>
<term>Region index schema</term>
<term>Region instance</term>
<term>Region instances</term>
<term>Region name</term>
<term>Region names</term>
<term>Region order graph</term>
<term>Region sets</term>
<term>Regions names</term>
<term>Regular expressions</term>
<term>Relational algebra</term>
<term>Relative order</term>
<term>Resp</term>
<term>Retrieval</term>
<term>Same lines</term>
<term>Same name</term>
<term>Same region</term>
<term>Same result</term>
<term>Satisfying assignment</term>
<term>Satisfying requirements</term>
<term>Schema</term>
<term>Sgml documents</term>
<term>Similar approach</term>
<term>Source code</term>
<term>Source code regions</term>
<term>Successive region sets</term>
<term>Such example</term>
<term>Such regions</term>
<term>Text regions</term>
<term>Text search</term>
<term>Text structure</term>
<term>Textual data</term>
<term>Time polynomial</term>
<term>Total number</term>
<term>Unsatisfiable</term>
<term>Variable definitions</term>
<term>Word index</term>
<term>Word index name</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as with SGML). An important aspect of managing structured and semistructured textual data consists of supporting the efficient retrieval of text components based both on their content and on their structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. Theregion algebrastudied is a set-at-a-time algebra for manipulatingtext regions(substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in use in commercial text retrieval systems and can form the basis for supporting SQL-like access to textual data. By presenting a close relationship between the region algebra and the monadic first order theory of finite binary trees, we show that queries in the algebra can be optimized, in the sense that equivalence to less expensive expressions can be tested. This optimization can be difficult (co-NP-hard in the general case), but there is an important class of queries that can be optimized in polynomial time. On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language and consider one such example.</div>
</front>
</TEI>
<istex>
<corpusName>elsevier</corpusName>
<keywords>
<teeft>
<json:string>fmft</json:string>
<json:string>query</json:string>
<json:string>node</json:string>
<json:string>algebra</json:string>
<json:string>region algebra</json:string>
<json:string>prefix</json:string>
<json:string>database</json:string>
<json:string>direct inclusion</json:string>
<json:string>proc</json:string>
<json:string>text regions</json:string>
<json:string>fmft formulas</json:string>
<json:string>region index schema</json:string>
<json:string>disjoint</json:string>
<json:string>region name</json:string>
<json:string>word index</json:string>
<json:string>milo</json:string>
<json:string>consens</json:string>
<json:string>hierarchical</json:string>
<json:string>region instance</json:string>
<json:string>resp</json:string>
<json:string>induction step</json:string>
<json:string>html</json:string>
<json:string>inclusion</json:string>
<json:string>outermost</json:string>
<json:string>fmft model</json:string>
<json:string>optimized</json:string>
<json:string>isomorphic</json:string>
<json:string>unsatisfiable</json:string>
<json:string>algebra expression</json:string>
<json:string>variable definitions</json:string>
<json:string>order relations</json:string>
<json:string>optimization</json:string>
<json:string>lexicographical order</json:string>
<json:string>reduction mapping</json:string>
<json:string>region names</json:string>
<json:string>region algebra expression</json:string>
<json:string>region sets</json:string>
<json:string>disjoint regions</json:string>
<json:string>source code</json:string>
<json:string>fmft formula</json:string>
<json:string>region inclusion graph</json:string>
<json:string>polynomial time</json:string>
<json:string>pattern language</json:string>
<json:string>expressive power</json:string>
<json:string>computer science</json:string>
<json:string>induction assumption</json:string>
<json:string>emptiness testing</json:string>
<json:string>other hand</json:string>
<json:string>html documents</json:string>
<json:string>header</json:string>
<json:string>expressible</json:string>
<json:string>schema</json:string>
<json:string>retrieval</json:string>
<json:string>direct prefix</json:string>
<json:string>precedence relationship</json:string>
<json:string>similar approach</json:string>
<json:string>text structure</json:string>
<json:string>mapping region</json:string>
<json:string>textual data</json:string>
<json:string>relational algebra</json:string>
<json:string>regular expressions</json:string>
<json:string>time polynomial</json:string>
<json:string>hierarchical instances</json:string>
<json:string>previous sections</json:string>
<json:string>general case</json:string>
<json:string>region index</json:string>
<json:string>same result</json:string>
<json:string>proc regions</json:string>
<json:string>longest path</json:string>
<json:string>such regions</json:string>
<json:string>algebra expressions</json:string>
<json:string>region</json:string>
<json:string>such example</json:string>
<json:string>prefix order</json:string>
<json:string>sgml documents</json:string>
<json:string>certain queries</json:string>
<json:string>important class</json:string>
<json:string>hierarchical structure</json:string>
<json:string>fmft models</json:string>
<json:string>negative side</json:string>
<json:string>important properties</json:string>
<json:string>other word</json:string>
<json:string>exact position</json:string>
<json:string>order relationships</json:string>
<json:string>region instances</json:string>
<json:string>other procedures</json:string>
<json:string>distinct word</json:string>
<json:string>prog proc</json:string>
<json:string>text search</json:string>
<json:string>satisfying requirements</json:string>
<json:string>possible extensions</json:string>
<json:string>regions names</json:string>
<json:string>total number</json:string>
<json:string>edges state</json:string>
<json:string>corresponding formula</json:string>
<json:string>corresponding region instances</json:string>
<json:string>other region</json:string>
<json:string>algebra queries</json:string>
<json:string>region expressions</json:string>
<json:string>satisfying assignment</json:string>
<json:string>other direction</json:string>
<json:string>other regions</json:string>
<json:string>next theorem</json:string>
<json:string>academic press</json:string>
<json:string>source code regions</json:string>
<json:string>outermost regions</json:string>
<json:string>word index name</json:string>
<json:string>anchor regions</json:string>
<json:string>same name</json:string>
<json:string>same region</json:string>
<json:string>same lines</json:string>
<json:string>relative order</json:string>
<json:string>region order graph</json:string>
<json:string>monadic theory</json:string>
<json:string>inclusion expressions</json:string>
<json:string>previous section</json:string>
<json:string>expression proc</json:string>
<json:string>ieee data</json:string>
<json:string>direct inclusions</json:string>
<json:string>programming language</json:string>
<json:string>successive region sets</json:string>
<json:string>binary strings</json:string>
<json:string>program files</json:string>
<json:string>mapping</json:string>
</teeft>
</keywords>
<author>
<json:item>
<name>Mariano P. Consens</name>
<affiliations>
<json:string>Department of Computer Science, University of Waterloo, Waterloo, Canada, N2L 3G1 f1 E-mail: mconsens@uwaterloo.caf1</json:string>
</affiliations>
</json:item>
<json:item>
<name>Tova Milo</name>
<affiliations>
<json:string>Department of Computer Science, Tel Aviv University, Tel Aviv, Israel, 69978 f2 E-mail: milo@math.tau.ac.ilf2</json:string>
</affiliations>
</json:item>
</author>
<arkIstex>ark:/67375/6H6-36VTK1LS-3</arkIstex>
<language>
<json:string>eng</json:string>
</language>
<originalGenre>
<json:string>Full-length article</json:string>
</originalGenre>
<abstract>Abstract: There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as with SGML). An important aspect of managing structured and semistructured textual data consists of supporting the efficient retrieval of text components based both on their content and on their structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. Theregion algebrastudied is a set-at-a-time algebra for manipulatingtext regions(substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in use in commercial text retrieval systems and can form the basis for supporting SQL-like access to textual data. By presenting a close relationship between the region algebra and the monadic first order theory of finite binary trees, we show that queries in the algebra can be optimized, in the sense that equivalence to less expensive expressions can be tested. This optimization can be difficult (co-NP-hard in the general case), but there is an important class of queries that can be optimized in polynomial time. On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language and consider one such example.</abstract>
<qualityIndicators>
<score>10</score>
<pdfWordCount>14424</pdfWordCount>
<pdfCharCount>67697</pdfCharCount>
<pdfVersion>1.2</pdfVersion>
<pdfPageCount>17</pdfPageCount>
<pdfPageSize>533 x 722 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<abstractWordCount>284</abstractWordCount>
<abstractCharCount>1827</abstractCharCount>
<keywordCount>0</keywordCount>
</qualityIndicators>
<title>Algebras for Querying Text Regions: Expressive Power and Optimization</title>
<pii>
<json:string>S0022-0000(98)91564-1</json:string>
</pii>
<genre>
<json:string>research-article</json:string>
</genre>
<host>
<title>Journal of Computer and System Sciences</title>
<language>
<json:string>unknown</json:string>
</language>
<publicationDate>1998</publicationDate>
<issn>
<json:string>0022-0000</json:string>
</issn>
<pii>
<json:string>S0022-0000(00)X0022-0</json:string>
</pii>
<volume>57</volume>
<issue>3</issue>
<pages>
<first>272</first>
<last>288</last>
</pages>
<genre>
<json:string>journal</json:string>
</genre>
</host>
<namedEntities>
<unitex>
<date>
<json:string>1998</json:string>
</date>
<geogName>
<json:string>Href region</json:string>
<json:string>Proc region</json:string>
</geogName>
<orgName>
<json:string>University of Toronto</json:string>
<json:string>Department of Computer Science</json:string>
<json:string>Tova Milo Department of Computer Science, Tel Aviv University, Tel Aviv, Israel</json:string>
<json:string>Institute for Robotics and Intelligent Systems</json:string>
<json:string>University of Waterloo</json:string>
<json:string>Grant CRD 147259</json:string>
<json:string>Pat</json:string>
</orgName>
<orgName_funder>
<json:string>Grant CRD 147259</json:string>
<json:string>Pat</json:string>
</orgName_funder>
<orgName_provider></orgName_provider>
<persName>
<json:string>I. Induction</json:string>
<json:string>Frank Tompa</json:string>
<json:string>W. Let</json:string>
<json:string>MILO Consider</json:string>
<json:string>Gonzalo Navarro</json:string>
<json:string>Stephane Grumbach</json:string>
<json:string>A. Consider</json:string>
<json:string>I. Note</json:string>
<json:string>To</json:string>
<json:string>I. Proof</json:string>
<json:string>S. Note</json:string>
</persName>
<placeName></placeName>
<ref_url></ref_url>
<ref_bibl>
<json:string>[0, 1]</json:string>
</ref_bibl>
<bibl></bibl>
</unitex>
</namedEntities>
<ark>
<json:string>ark:/67375/6H6-36VTK1LS-3</json:string>
</ark>
<categories>
<wos>
<json:string>1 - science</json:string>
<json:string>2 - computer science, theory & methods</json:string>
<json:string>2 - computer science, hardware & architecture</json:string>
</wos>
<scienceMetrix>
<json:string>1 - applied sciences</json:string>
<json:string>2 - information & communication technologies</json:string>
<json:string>3 - computation theory & mathematics</json:string>
</scienceMetrix>
<scopus>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Mathematics</json:string>
<json:string>3 - Applied Mathematics</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Computer Science</json:string>
<json:string>3 - Computational Theory and Mathematics</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Computer Science</json:string>
<json:string>3 - Computer Networks and Communications</json:string>
<json:string>1 - Physical Sciences</json:string>
<json:string>2 - Mathematics</json:string>
<json:string>3 - Theoretical Computer Science</json:string>
</scopus>
<inist>
<json:string>1 - sciences humaines et sociales</json:string>
</inist>
</categories>
<publicationDate>1998</publicationDate>
<copyrightDate>1998</copyrightDate>
<doi>
<json:string>10.1006/jcss.1998.1564</json:string>
</doi>
<id>8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83</id>
<score>1</score>
<fulltext>
<json:item>
<extension>pdf</extension>
<original>true</original>
<mimetype>application/pdf</mimetype>
<uri>https://api.istex.fr/ark:/67375/6H6-36VTK1LS-3/fulltext.pdf</uri>
</json:item>
<json:item>
<extension>zip</extension>
<original>false</original>
<mimetype>application/zip</mimetype>
<uri>https://api.istex.fr/ark:/67375/6H6-36VTK1LS-3/bundle.zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/ark:/67375/6H6-36VTK1LS-3/fulltext.tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Algebras for Querying Text Regions: Expressive Power and Optimization</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher scheme="https://scientific-publisher.data.istex.fr">ELSEVIER</publisher>
<availability>
<licence>
<p>©1998 Academic Press</p>
</licence>
<p scheme="https://loaded-corpus.data.istex.fr/ark:/67375/XBH-HKKZVM7B-M">elsevier</p>
</availability>
<date>1998</date>
</publicationStmt>
<notesStmt>
<note type="research-article" scheme="https://content-type.data.istex.fr/ark:/67375/XTP-1JC4F85T-7">research-article</note>
<note type="journal" scheme="https://publication-type.data.istex.fr/ark:/67375/JMC-0GLKJH51-B">journal</note>
<note type="content">Section title: Regular Article</note>
</notesStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Algebras for Querying Text Regions: Expressive Power and Optimization</title>
<author xml:id="author-0000">
<persName>
<forename type="first">Mariano P.</forename>
<surname>Consens</surname>
</persName>
<email>mconsens@uwaterloo.caf1</email>
</author>
<author xml:id="author-0001">
<persName>
<forename type="first">Tova</forename>
<surname>Milo</surname>
</persName>
<email>milo@math.tau.ac.ilf2</email>
</author>
<idno type="istex">8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83</idno>
<idno type="ark">ark:/67375/6H6-36VTK1LS-3</idno>
<idno type="DOI">10.1006/jcss.1998.1564</idno>
<idno type="PII">S0022-0000(98)91564-1</idno>
</analytic>
<monogr>
<title level="j">Journal of Computer and System Sciences</title>
<title level="j" type="abbrev">YJCSS</title>
<idno type="pISSN">0022-0000</idno>
<idno type="PII">S0022-0000(00)X0022-0</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1998"></date>
<biblScope unit="volume">57</biblScope>
<biblScope unit="issue">3</biblScope>
<biblScope unit="page" from="272">272</biblScope>
<biblScope unit="page" to="288">288</biblScope>
</imprint>
</monogr>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>1998</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Abstract: There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as with SGML). An important aspect of managing structured and semistructured textual data consists of supporting the efficient retrieval of text components based both on their content and on their structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. Theregion algebrastudied is a set-at-a-time algebra for manipulatingtext regions(substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in use in commercial text retrieval systems and can form the basis for supporting SQL-like access to textual data. By presenting a close relationship between the region algebra and the monadic first order theory of finite binary trees, we show that queries in the algebra can be optimized, in the sense that equivalence to less expensive expressions can be tested. This optimization can be difficult (co-NP-hard in the general case), but there is an important class of queries that can be optimized in polynomial time. On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language and consider one such example.</p>
</abstract>
</profileDesc>
<revisionDesc>
<change when="1998">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<extension>txt</extension>
<original>false</original>
<mimetype>text/plain</mimetype>
<uri>https://api.istex.fr/ark:/67375/6H6-36VTK1LS-3/fulltext.txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="Elsevier, elements deleted: tail">
<istex:xmlDeclaration>version="1.0" encoding="utf-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//ES//DTD journal article DTD version 4.5.2//EN//XML" URI="art452.dtd" name="istex:docType"></istex:docType>
<istex:document>
<converted-article version="4.5.2" docsubtype="fla" xml:lang="en">
<item-info>
<jid>YJCSS</jid>
<aid>91564</aid>
<ce:pii>S0022-0000(98)91564-1</ce:pii>
<ce:doi>10.1006/jcss.1998.1564</ce:doi>
<ce:copyright type="full-transfer" year="1998">Academic Press</ce:copyright>
</item-info>
<head>
<ce:dochead>
<ce:textfn>Regular Article</ce:textfn>
</ce:dochead>
<ce:title>Algebras for Querying Text Regions: Expressive Power and Optimization</ce:title>
<ce:author-group>
<ce:author>
<ce:given-name>Mariano P.</ce:given-name>
<ce:surname>Consens</ce:surname>
<ce:cross-ref refid="SS981564A0">
<ce:sup>a</ce:sup>
</ce:cross-ref>
</ce:author>
<ce:author>
<ce:given-name>Tova</ce:given-name>
<ce:surname>Milo</ce:surname>
<ce:cross-ref refid="SS981564A1">
<ce:sup>b</ce:sup>
</ce:cross-ref>
</ce:author>
<ce:affiliation id="SS981564A0">
<ce:label>a</ce:label>
<ce:textfn>Department of Computer Science, University of Waterloo, Waterloo, Canada, N2L 3G1
<ce:footnote id="F1">
<ce:label>f1</ce:label>
<ce:note-para>E-mail: mconsens@uwaterloo.ca</ce:note-para>
</ce:footnote>
<ce:cross-ref refid="F1">
<ce:sup>f1</ce:sup>
</ce:cross-ref>
</ce:textfn>
</ce:affiliation>
<ce:affiliation id="SS981564A1">
<ce:label>b</ce:label>
<ce:textfn>Department of Computer Science, Tel Aviv University, Tel Aviv, Israel, 69978
<ce:footnote id="F2">
<ce:label>f2</ce:label>
<ce:note-para>E-mail: milo@math.tau.ac.il</ce:note-para>
</ce:footnote>
<ce:cross-ref refid="F2">
<ce:sup>f2</ce:sup>
</ce:cross-ref>
</ce:textfn>
</ce:affiliation>
</ce:author-group>
<ce:date-received day="20" month="12" year="1995"></ce:date-received>
<ce:abstract>
<ce:section-title>Abstract</ce:section-title>
<ce:abstract-sec>
<ce:simple-para>There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as with SGML). An important aspect of managing structured and semistructured textual data consists of supporting the efficient retrieval of text components based both on their content and on their structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. The
<ce:italic>region algebra</ce:italic>
studied is a set-at-a-time algebra for manipulating
<ce:italic>text regions</ce:italic>
(substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in use in commercial text retrieval systems and can form the basis for supporting SQL-like access to textual data. By presenting a close relationship between the region algebra and the monadic first order theory of finite binary trees, we show that queries in the algebra can be optimized, in the sense that equivalence to less expensive expressions can be tested. This optimization can be difficult (co-NP-hard in the general case), but there is an important class of queries that can be optimized in polynomial time. On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language and consider one such example.</ce:simple-para>
</ce:abstract-sec>
</ce:abstract>
</head>
</converted-article>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Algebras for Querying Text Regions: Expressive Power and Optimization</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA">
<title>Algebras for Querying Text Regions: Expressive Power and Optimization</title>
</titleInfo>
<name type="personal">
<namePart type="given">Mariano P.</namePart>
<namePart type="family">Consens</namePart>
<affiliation>Department of Computer Science, University of Waterloo, Waterloo, Canada, N2L 3G1 f1 E-mail: mconsens@uwaterloo.caf1</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tova</namePart>
<namePart type="family">Milo</namePart>
<affiliation>Department of Computer Science, Tel Aviv University, Tel Aviv, Israel, 69978 f2 E-mail: milo@math.tau.ac.ilf2</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="Full-length article" authority="ISTEX" authorityURI="https://content-type.data.istex.fr" valueURI="https://content-type.data.istex.fr/ark:/67375/XTP-1JC4F85T-7">research-article</genre>
<originInfo>
<publisher>ELSEVIER</publisher>
<dateIssued encoding="w3cdtf">1998</dateIssued>
<copyrightDate encoding="w3cdtf">1998</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<abstract lang="en">Abstract: There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as with SGML). An important aspect of managing structured and semistructured textual data consists of supporting the efficient retrieval of text components based both on their content and on their structure. In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. Theregion algebrastudied is a set-at-a-time algebra for manipulatingtext regions(substrings of the text) that supports finding out nesting and ordering properties of the text regions. This algebra is part of the language in use in commercial text retrieval systems and can form the basis for supporting SQL-like access to textual data. By presenting a close relationship between the region algebra and the monadic first order theory of finite binary trees, we show that queries in the algebra can be optimized, in the sense that equivalence to less expensive expressions can be tested. This optimization can be difficult (co-NP-hard in the general case), but there is an important class of queries that can be optimized in polynomial time. On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language and consider one such example.</abstract>
<note type="content">Section title: Regular Article</note>
<relatedItem type="host">
<titleInfo>
<title>Journal of Computer and System Sciences</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>YJCSS</title>
</titleInfo>
<genre type="journal" authority="ISTEX" authorityURI="https://publication-type.data.istex.fr" valueURI="https://publication-type.data.istex.fr/ark:/67375/JMC-0GLKJH51-B">journal</genre>
<originInfo>
<publisher>ELSEVIER</publisher>
<dateIssued encoding="w3cdtf">1998</dateIssued>
</originInfo>
<identifier type="ISSN">0022-0000</identifier>
<identifier type="PII">S0022-0000(00)X0022-0</identifier>
<part>
<date>1998</date>
<detail type="volume">
<number>57</number>
<caption>vol.</caption>
</detail>
<detail type="issue">
<number>3</number>
<caption>no.</caption>
</detail>
<extent unit="issue-pages">
<start>233</start>
<end>402</end>
</extent>
<extent unit="pages">
<start>272</start>
<end>288</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83</identifier>
<identifier type="ark">ark:/67375/6H6-36VTK1LS-3</identifier>
<identifier type="DOI">10.1006/jcss.1998.1564</identifier>
<identifier type="PII">S0022-0000(98)91564-1</identifier>
<accessCondition type="use and reproduction" contentType="copyright">©1998 Academic Press</accessCondition>
<recordInfo>
<recordContentSource authority="ISTEX" authorityURI="https://loaded-corpus.data.istex.fr" valueURI="https://loaded-corpus.data.istex.fr/ark:/67375/XBH-HKKZVM7B-M">elsevier</recordContentSource>
<recordOrigin>Academic Press, ©1998</recordOrigin>
</recordInfo>
</mods>
<json:item>
<extension>json</extension>
<original>false</original>
<mimetype>application/json</mimetype>
<uri>https://api.istex.fr/ark:/67375/6H6-36VTK1LS-3/record.json</uri>
</json:item>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002499 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 002499 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:8F238C02CAF9C38C8B20CBE1BE4A28F3F746BB83
   |texte=   Algebras for Querying Text Regions: Expressive Power and Optimization
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021