SgmlV1, PascalFrancis, Corpus, bibRecord, 000188

A unified framework for wrapping, mediating and restructuring information from the Web

Identifieur interne : 000188 ( PascalFrancis/Corpus ); précédent : 000187; suivant : 000189

A unified framework for wrapping, mediating and restructuring information from the Web

Auteurs : W. May ; R. Himmeröder ; G. Lausen ; B. Lud Scher

Source :

Lecture notes in computer science [ 0302-9743 ] ; 1999.

RBID : Pascal:00-0015422

Descripteurs français

Pascal (Inist)
- Système information, Base donnée orientée objet, Organisation information, Intégration information, Extraction information, Réseau WWW.

English descriptors

KwdEn :
- Information extraction, Information integration, Information organization, Information system, Object-oriented databases, World wide web.

Abstract

The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system [10].

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0302-9743`
A05				`@2 1727`
A08	`01`	`1`	`ENG`	`@1 A unified framework for wrapping, mediating and restructuring information from the Web`
A09	`01`	`1`	`ENG`	`@1 Advances in conceptual modeling : Paris, 15-18 November 1999`
A11	`01`	`1`		`@1 MAY (W.)`
A11	`02`	`1`		`@1 HIMMERÖDER (R.)`
A11	`03`	`1`		`@1 LAUSEN (G.)`
A11	`04`	`1`		`@1 LUDÄSCHER (B.)`
A12	`01`	`1`		`@1 CHEN (Peter P.) @9 ed.`
A12	`02`	`1`		`@1 EMBLEY (David W.) @9 ed.`
A12	`03`	`1`		`@1 KOULOUMDJIAN (Jacques) @9 ed.`
A12	`04`	`1`		`@1 LIDDLE (Stephen W.) @9 ed.`
A12	`05`	`1`		`@1 RODDICK (John F.) @9 ed.`
A14	`01`			`@1 Institut für Informatik, Universität Freiburg @3 DEU @Z 1 aut. @Z 2 aut. @Z 3 aut.`
A14	`02`			`@1 San Diego Supercomputing Center @3 USA @Z 4 aut.`
A20				`@1 307-320`
A21				`@1 1999`
A23	`01`			`@0 ENG`
A26	`01`			`@0 3-540-66653-2`
A43	`01`			`@1 INIST @2 16343 @5 354000080103660250`
A44				`@0 0000 @1 © 2000 INIST-CNRS. All rights reserved.`
A45				`@0 17 ref.`
A47	`01`	`1`		`@0 00-0015422`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Lecture notes in computer science`
A66	`01`			`@0 DEU`
C01	`01`		`ENG`	@0 The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system [10].
C02	`01`	`X`		`@0 001D02B07D`
C03	`01`	`X`	`FRE`	`@0 Système information @5 01`
C03	`01`	`X`	`ENG`	`@0 Information system @5 01`
C03	`01`	`X`	`SPA`	`@0 Sistema información @5 01`
C03	`02`	`3`	`FRE`	`@0 Base donnée orientée objet @5 02`
C03	`02`	`3`	`ENG`	`@0 Object-oriented databases @5 02`
C03	`03`	`X`	`FRE`	`@0 Organisation information @5 03`
C03	`03`	`X`	`ENG`	`@0 Information organization @5 03`
C03	`03`	`X`	`SPA`	`@0 Organización información @5 03`
C03	`04`	`X`	`FRE`	`@0 Intégration information @5 04`
C03	`04`	`X`	`ENG`	`@0 Information integration @5 04`
C03	`04`	`X`	`SPA`	`@0 Integración información @5 04`
C03	`05`	`X`	`FRE`	`@0 Extraction information @5 05`
C03	`05`	`X`	`ENG`	`@0 Information extraction @5 05`
C03	`05`	`X`	`SPA`	`@0 Extractión información @5 05`
C03	`06`	`X`	`FRE`	`@0 Réseau WWW @5 06`
C03	`06`	`X`	`ENG`	`@0 World wide web @5 06`
C03	`06`	`X`	`SPA`	`@0 Red WWW @5 06`
N21				`@1 010`

A30	`01`	`1`	`ENG`	`@1 ER '99 workshops on evolution and change in data management, reverse engineering in information systems, and the world wide web and conceptual modeling @3 Paris FRA @4 1999-11-15`

Format Inist (serveur)

NO :	PASCAL 00-0015422 INIST
ET :	A unified framework for wrapping, mediating and restructuring information from the Web
AU :	MAY (W.); HIMMERÖDER (R.); LAUSEN (G.); LUDÄSCHER (B.); CHEN (Peter P.); EMBLEY (David W.); KOULOUMDJIAN (Jacques); LIDDLE (Stephen W.); RODDICK (John F.)
AF :	Institut für Informatik, Universität Freiburg/Allemagne (1 aut., 2 aut., 3 aut.); San Diego Supercomputing Center/Etats-Unis (4 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 1999; Vol. 1727; Pp. 307-320; Bibl. 17 ref.
LA :	Anglais
EA :	The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system [10].
CC :	001D02B07D
FD :	Système information; Base donnée orientée objet; Organisation information; Intégration information; Extraction information; Réseau WWW
ED :	Information system; Object-oriented databases; Information organization; Information integration; Information extraction; World wide web
SD :	Sistema información; Organización información; Integración información; Extractión información; Red WWW
LO :	INIST-16343.354000080103660250
ID :	00-0015422

Links to Exploration step

Pascal:00-0015422

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">A unified framework for wrapping, mediating and restructuring information from the Web</title>
<author><name sortKey="May, W" sort="May, W" uniqKey="May W" first="W." last="May">W. May</name>
<affiliation><inist:fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Himmeroder, R" sort="Himmeroder, R" uniqKey="Himmeroder R" first="R." last="Himmeröder">R. Himmeröder</name>
<affiliation><inist:fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Lausen, G" sort="Lausen, G" uniqKey="Lausen G" first="G." last="Lausen">G. Lausen</name>
<affiliation><inist:fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Lud Scher, B" sort="Lud Scher, B" uniqKey="Lud Scher B" first="B." last="Lud Scher">B. Lud Scher</name>
<affiliation><inist:fA14 i1="02"><s1>San Diego Supercomputing Center</s1>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">00-0015422</idno>
<date when="1999">1999</date>
<idno type="stanalyst">PASCAL 00-0015422 INIST</idno>
<idno type="RBID">Pascal:00-0015422</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000188</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">A unified framework for wrapping, mediating and restructuring information from the Web</title>
<author><name sortKey="May, W" sort="May, W" uniqKey="May W" first="W." last="May">W. May</name>
<affiliation><inist:fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Himmeroder, R" sort="Himmeroder, R" uniqKey="Himmeroder R" first="R." last="Himmeröder">R. Himmeröder</name>
<affiliation><inist:fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Lausen, G" sort="Lausen, G" uniqKey="Lausen G" first="G." last="Lausen">G. Lausen</name>
<affiliation><inist:fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Lud Scher, B" sort="Lud Scher, B" uniqKey="Lud Scher B" first="B." last="Lud Scher">B. Lud Scher</name>
<affiliation><inist:fA14 i1="02"><s1>San Diego Supercomputing Center</s1>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Information extraction</term>
<term>Information integration</term>
<term>Information organization</term>
<term>Information system</term>
<term>Object-oriented databases</term>
<term>World wide web</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Système information</term>
<term>Base donnée orientée objet</term>
<term>Organisation information</term>
<term>Intégration information</term>
<term>Extraction information</term>
<term>Réseau WWW</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system [10].</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0302-9743</s0>
</fA01>
<fA05><s2>1727</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>A unified framework for wrapping, mediating and restructuring information from the Web</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Advances in conceptual modeling : Paris, 15-18 November 1999</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>MAY (W.)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>HIMMERÖDER (R.)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>LAUSEN (G.)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>LUDÄSCHER (B.)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>CHEN (Peter P.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>EMBLEY (David W.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>KOULOUMDJIAN (Jacques)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="04" i2="1"><s1>LIDDLE (Stephen W.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="05" i2="1"><s1>RODDICK (John F.)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Institut für Informatik, Universität Freiburg</s1>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA14 i1="02"><s1>San Diego Supercomputing Center</s1>
<s3>USA</s3>
<sZ>4 aut.</sZ>
</fA14>
<fA20><s1>307-320</s1>
</fA20>
<fA21><s1>1999</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA26 i1="01"><s0>3-540-66653-2</s0>
</fA26>
<fA43 i1="01"><s1>INIST</s1>
<s2>16343</s2>
<s5>354000080103660250</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2000 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>17 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>00-0015422</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Lecture notes in computer science</s0>
</fA64>
<fA66 i1="01"><s0>DEU</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system [10].</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02B07D</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Système information</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Information system</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Sistema información</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE"><s0>Base donnée orientée objet</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG"><s0>Object-oriented databases</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Organisation information</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Information organization</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Organización información</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Intégration information</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Information integration</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Integración información</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Extraction information</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Information extraction</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Extractión información</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Réseau WWW</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>World wide web</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Red WWW</s0>
<s5>06</s5>
</fC03>
<fN21><s1>010</s1>
</fN21>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>ER '99 workshops on evolution and change in data management, reverse engineering in information systems, and the world wide web and conceptual modeling</s1>
<s3>Paris FRA</s3>
<s4>1999-11-15</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 00-0015422 INIST</NO>
<ET>A unified framework for wrapping, mediating and restructuring information from the Web</ET>
<AU>MAY (W.); HIMMERÖDER (R.); LAUSEN (G.); LUDÄSCHER (B.); CHEN (Peter P.); EMBLEY (David W.); KOULOUMDJIAN (Jacques); LIDDLE (Stephen W.); RODDICK (John F.)</AU>
<AF>Institut für Informatik, Universität Freiburg/Allemagne (1 aut., 2 aut., 3 aut.); San Diego Supercomputing Center/Etats-Unis (4 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 1999; Vol. 1727; Pp. 307-320; Bibl. 17 ref.</SO>
<LA>Anglais</LA>
<EA>The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which represents both the Web structure and the data of the application domain. Wrappers and mediators are written in a rule-based object-oriented language which is augmented with features for Web access and structured document analysis, i.e., pattern matching by regular expressions and SGML parsing. In this paper, we develop generic, reusable rule patterns for typical extraction, integration, and restructuring tasks using this framework. We show the practicability of our approach by using the FLORID system [10].</EA>
<CC>001D02B07D</CC>
<FD>Système information; Base donnée orientée objet; Organisation information; Intégration information; Extraction information; Réseau WWW</FD>
<ED>Information system; Object-oriented databases; Information organization; Information integration; Information extraction; World wide web</ED>
<SD>Sistema información; Organización información; Integración información; Extractión información; Red WWW</SD>
<LO>INIST-16343.354000080103660250</LO>
<ID>00-0015422</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000188 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000188 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:00-0015422
   |texte=   A unified framework for wrapping, mediating and restructuring information from the Web
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021

	Serveur d'exploration sur SGML
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur SGML

A unified framework for wrapping, mediating and restructuring information from the Web

A unified framework for wrapping, mediating and restructuring information from the Web

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri