Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Lixto Project: Exploring New Frontiers of Web Data Extraction

Identifieur interne : 001738 ( Istex/Corpus ); précédent : 001737; suivant : 001739

The Lixto Project: Exploring New Frontiers of Web Data Extraction

Auteurs : Julien Carme ; Michal Ceresna ; Oliver Frölich ; Georg Gottlob ; Tamir Hassan ; Marcus Herzog ; Wolfgang Holzinger ; Bernhard Krüpl

Source :

RBID : ISTEX:0C718A8F5FAB0E25106D6113A0357246B7356F14

Abstract

Abstract: The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.

Url:
DOI: 10.1007/11788911_1

Links to Exploration step

ISTEX:0C718A8F5FAB0E25106D6113A0357246B7356F14

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
<author>
<name sortKey="Carme, Julien" sort="Carme, Julien" uniqKey="Carme J" first="Julien" last="Carme">Julien Carme</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ceresna, Michal" sort="Ceresna, Michal" uniqKey="Ceresna M" first="Michal" last="Ceresna">Michal Ceresna</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Frolich, Oliver" sort="Frolich, Oliver" uniqKey="Frolich O" first="Oliver" last="Frölich">Oliver Frölich</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gottlob, Georg" sort="Gottlob, Georg" uniqKey="Gottlob G" first="Georg" last="Gottlob">Georg Gottlob</name>
<affiliation>
<mods:affiliation>Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, United Kingdom</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Hassan, Tamir" sort="Hassan, Tamir" uniqKey="Hassan T" first="Tamir" last="Hassan">Tamir Hassan</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Herzog, Marcus" sort="Herzog, Marcus" uniqKey="Herzog M" first="Marcus" last="Herzog">Marcus Herzog</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Holzinger, Wolfgang" sort="Holzinger, Wolfgang" uniqKey="Holzinger W" first="Wolfgang" last="Holzinger">Wolfgang Holzinger</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Krupl, Bernhard" sort="Krupl, Bernhard" uniqKey="Krupl B" first="Bernhard" last="Krüpl">Bernhard Krüpl</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:0C718A8F5FAB0E25106D6113A0357246B7356F14</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11788911_1</idno>
<idno type="url">https://api.istex.fr/document/0C718A8F5FAB0E25106D6113A0357246B7356F14/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001738</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
<author>
<name sortKey="Carme, Julien" sort="Carme, Julien" uniqKey="Carme J" first="Julien" last="Carme">Julien Carme</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Ceresna, Michal" sort="Ceresna, Michal" uniqKey="Ceresna M" first="Michal" last="Ceresna">Michal Ceresna</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Frolich, Oliver" sort="Frolich, Oliver" uniqKey="Frolich O" first="Oliver" last="Frölich">Oliver Frölich</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Gottlob, Georg" sort="Gottlob, Georg" uniqKey="Gottlob G" first="Georg" last="Gottlob">Georg Gottlob</name>
<affiliation>
<mods:affiliation>Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, United Kingdom</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Hassan, Tamir" sort="Hassan, Tamir" uniqKey="Hassan T" first="Tamir" last="Hassan">Tamir Hassan</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Herzog, Marcus" sort="Herzog, Marcus" uniqKey="Herzog M" first="Marcus" last="Herzog">Marcus Herzog</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Holzinger, Wolfgang" sort="Holzinger, Wolfgang" uniqKey="Holzinger W" first="Wolfgang" last="Holzinger">Wolfgang Holzinger</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
<author>
<name sortKey="Krupl, Bernhard" sort="Krupl, Bernhard" uniqKey="Krupl B" first="Bernhard" last="Krüpl">Bernhard Krüpl</name>
<affiliation>
<mods:affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">0C718A8F5FAB0E25106D6113A0357246B7356F14</idno>
<idno type="DOI">10.1007/11788911_1</idno>
<idno type="ChapterID">1</idno>
<idno type="ChapterID">Chap1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.</div>
</front>
</TEI>
<istex>
<corpusName>springer</corpusName>
<author>
<json:item>
<name>Julien Carme</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
<json:item>
<name>Michal Ceresna</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
<json:item>
<name>Oliver Frölich</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
<json:item>
<name>Georg Gottlob</name>
<affiliations>
<json:string>Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, United Kingdom</json:string>
</affiliations>
</json:item>
<json:item>
<name>Tamir Hassan</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
<json:item>
<name>Marcus Herzog</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
<json:item>
<name>Wolfgang Holzinger</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
<json:item>
<name>Bernhard Krüpl</name>
<affiliations>
<json:string>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</json:string>
</affiliations>
</json:item>
</author>
<language>
<json:string>eng</json:string>
</language>
<abstract>Abstract: The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.</abstract>
<qualityIndicators>
<score>6.08</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>430 x 660 pts</pdfPageSize>
<refBibsNative>false</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>596</abstractCharCount>
<pdfWordCount>5805</pdfWordCount>
<pdfCharCount>33279</pdfCharCount>
<pdfPageCount>15</pdfPageCount>
<abstractWordCount>90</abstractWordCount>
</qualityIndicators>
<title>The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
<genre.original>
<json:string>OriginalPaper</json:string>
</genre.original>
<chapterId>
<json:string>1</json:string>
<json:string>Chap1</json:string>
</chapterId>
<genre>
<json:string>conference [eBooks]</json:string>
</genre>
<serie>
<editor>
<json:item>
<name>David Hutchison</name>
<affiliations>
<json:string>Lancaster University, UK</json:string>
</affiliations>
</json:item>
<json:item>
<name>Takeo Kanade</name>
<affiliations>
<json:string>Carnegie Mellon University, Pittsburgh, PA, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Josef Kittler</name>
<affiliations>
<json:string>University of Surrey, Guildford, UK</json:string>
</affiliations>
</json:item>
<json:item>
<name>Jon M. Kleinberg</name>
<affiliations>
<json:string>Cornell University, Ithaca, NY, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Friedemann Mattern</name>
<affiliations>
<json:string>ETH Zurich, Switzerland</json:string>
</affiliations>
</json:item>
<json:item>
<name>John C. Mitchell</name>
<affiliations>
<json:string>Stanford University, CA, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Moni Naor</name>
<affiliations>
<json:string>Weizmann Institute of Science, Rehovot, Israel</json:string>
</affiliations>
</json:item>
<json:item>
<name>Oscar Nierstrasz</name>
<affiliations>
<json:string>University of Bern, Switzerland</json:string>
</affiliations>
</json:item>
<json:item>
<name>C. Pandu Rangan</name>
<affiliations>
<json:string>Indian Institute of Technology, Madras, India</json:string>
</affiliations>
</json:item>
<json:item>
<name>Bernhard Steffen</name>
<affiliations>
<json:string>University of Dortmund, Germany</json:string>
</affiliations>
</json:item>
<json:item>
<name>Madhu Sudan</name>
<affiliations>
<json:string>Massachusetts Institute of Technology, MA, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Demetri Terzopoulos</name>
<affiliations>
<json:string>University of California, Los Angeles, CA, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Dough Tygar</name>
<affiliations>
<json:string>University of California, Berkeley, CA, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Moshe Y. Vardi</name>
<affiliations>
<json:string>Rice University, Houston, TX, USA</json:string>
</affiliations>
</json:item>
<json:item>
<name>Gerhard Weikum</name>
<affiliations>
<json:string>Max-Planck Institute of Computer Science, Saarbruecken, Germany</json:string>
</affiliations>
</json:item>
</editor>
<issn>
<json:string>0302-9743</json:string>
</issn>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1611-3349</json:string>
</eissn>
<title>Lecture Notes in Computer Science</title>
<copyrightDate>2006</copyrightDate>
</serie>
<host>
<editor>
<json:item>
<name>David A. Bell</name>
<affiliations>
<json:string>The School of Electronics, Electrical, Engineering and Computer Science, Queen’s University Belfast, BT7 1NN N.I., Belfast, UK</json:string>
<json:string>E-mail: da.bell@qub.ac.uk</json:string>
</affiliations>
</json:item>
<json:item>
<name>Jun Hong</name>
<affiliations>
<json:string>School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT7 1NN, Belfast, UK</json:string>
<json:string>E-mail: j.hong@qub.ac.uk</json:string>
</affiliations>
</json:item>
</editor>
<subject>
<json:item>
<value>Computer Science</value>
</json:item>
<json:item>
<value>Computer Science</value>
</json:item>
<json:item>
<value>Database Management</value>
</json:item>
<json:item>
<value>Information Storage and Retrieval</value>
</json:item>
<json:item>
<value>Information Systems Applications (incl.Internet)</value>
</json:item>
</subject>
<isbn>
<json:string>978-3-540-35969-2</json:string>
</isbn>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1611-3349</json:string>
</eissn>
<title>Flexible and Efficient Information Handling</title>
<genre.original>
<json:string>Proceedings</json:string>
</genre.original>
<bookId>
<json:string>978-3-540-35971-5</json:string>
</bookId>
<volume>4042</volume>
<pages>
<last>15</last>
<first>1</first>
</pages>
<issn>
<json:string>0302-9743</json:string>
</issn>
<genre>
<json:string>Book Series</json:string>
</genre>
<eisbn>
<json:string>978-3-540-35971-5</json:string>
</eisbn>
<copyrightDate>2006</copyrightDate>
<doi>
<json:string>10.1007/11788911</json:string>
</doi>
</host>
<publicationDate>2006</publicationDate>
<copyrightDate>2006</copyrightDate>
<doi>
<json:string>10.1007/11788911_1</json:string>
</doi>
<id>0C718A8F5FAB0E25106D6113A0357246B7356F14</id>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/0C718A8F5FAB0E25106D6113A0357246B7356F14/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/0C718A8F5FAB0E25106D6113A0357246B7356F14/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/0C718A8F5FAB0E25106D6113A0357246B7356F14/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
<respStmt xml:id="ISTEX-API" resp="Références bibliographiques récupérées via GROBID" name="ISTEX-API (INIST-CNRS)"></respStmt>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<availability>
<p>SPRINGER</p>
</availability>
<date>2006</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
<author>
<persName>
<forename type="first">Julien</forename>
<surname>Carme</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
<author>
<persName>
<forename type="first">Michal</forename>
<surname>Ceresna</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
<author>
<persName>
<forename type="first">Oliver</forename>
<surname>Frölich</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
<author>
<persName>
<forename type="first">Georg</forename>
<surname>Gottlob</surname>
</persName>
<affiliation>Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, United Kingdom</affiliation>
</author>
<author>
<persName>
<forename type="first">Tamir</forename>
<surname>Hassan</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
<author>
<persName>
<forename type="first">Marcus</forename>
<surname>Herzog</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
<author>
<persName>
<forename type="first">Wolfgang</forename>
<surname>Holzinger</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
<author>
<persName>
<forename type="first">Bernhard</forename>
<surname>Krüpl</surname>
</persName>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
</author>
</analytic>
<monogr>
<title level="m">Flexible and Efficient Information Handling</title>
<title level="m" type="sub">23rd British National Conference on Databases, BNCOD 23, Belfast, Northern Ireland, UK, July 18-20, 2006. Proceedings</title>
<idno type="pISBN">978-3-540-35969-2</idno>
<idno type="eISBN">978-3-540-35971-5</idno>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="DOI">10.1007/11788911</idno>
<idno type="BookID">978-3-540-35971-5</idno>
<idno type="BookTitleID">139983</idno>
<idno type="BookSequenceNumber">4042</idno>
<idno type="BookVolumeNumber">4042</idno>
<idno type="BookChapterCount">33</idno>
<editor>
<persName>
<forename type="first">David</forename>
<forename type="first">A.</forename>
<surname>Bell</surname>
</persName>
<email>da.bell@qub.ac.uk</email>
<affiliation>The School of Electronics, Electrical, Engineering and Computer Science, Queen’s University Belfast, BT7 1NN N.I., Belfast, UK</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Jun</forename>
<surname>Hong</surname>
</persName>
<email>j.hong@qub.ac.uk</email>
<affiliation>School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT7 1NN, Belfast, UK</affiliation>
</editor>
<imprint>
<publisher>Springer Berlin Heidelberg</publisher>
<pubPlace>Berlin, Heidelberg</pubPlace>
<date type="published" when="2006"></date>
<biblScope unit="volume">4042</biblScope>
<biblScope unit="page" from="1">1</biblScope>
<biblScope unit="page" to="15">15</biblScope>
</imprint>
</monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<editor>
<persName>
<forename type="first">David</forename>
<surname>Hutchison</surname>
</persName>
<affiliation>Lancaster University, UK</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Takeo</forename>
<surname>Kanade</surname>
</persName>
<affiliation>Carnegie Mellon University, Pittsburgh, PA, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Josef</forename>
<surname>Kittler</surname>
</persName>
<affiliation>University of Surrey, Guildford, UK</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Jon</forename>
<forename type="first">M.</forename>
<surname>Kleinberg</surname>
</persName>
<affiliation>Cornell University, Ithaca, NY, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Friedemann</forename>
<surname>Mattern</surname>
</persName>
<affiliation>ETH Zurich, Switzerland</affiliation>
</editor>
<editor>
<persName>
<forename type="first">John</forename>
<forename type="first">C.</forename>
<surname>Mitchell</surname>
</persName>
<affiliation>Stanford University, CA, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Moni</forename>
<surname>Naor</surname>
</persName>
<affiliation>Weizmann Institute of Science, Rehovot, Israel</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Oscar</forename>
<surname>Nierstrasz</surname>
</persName>
<affiliation>University of Bern, Switzerland</affiliation>
</editor>
<editor>
<persName>
<forename type="first">C.</forename>
<surname>Pandu Rangan</surname>
</persName>
<affiliation>Indian Institute of Technology, Madras, India</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Bernhard</forename>
<surname>Steffen</surname>
</persName>
<affiliation>University of Dortmund, Germany</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Madhu</forename>
<surname>Sudan</surname>
</persName>
<affiliation>Massachusetts Institute of Technology, MA, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Demetri</forename>
<surname>Terzopoulos</surname>
</persName>
<affiliation>University of California, Los Angeles, CA, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Dough</forename>
<surname>Tygar</surname>
</persName>
<affiliation>University of California, Berkeley, CA, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Moshe</forename>
<forename type="first">Y.</forename>
<surname>Vardi</surname>
</persName>
<affiliation>Rice University, Houston, TX, USA</affiliation>
</editor>
<editor>
<persName>
<forename type="first">Gerhard</forename>
<surname>Weikum</surname>
</persName>
<affiliation>Max-Planck Institute of Computer Science, Saarbruecken, Germany</affiliation>
</editor>
<biblScope>
<date>2006</date>
</biblScope>
<idno type="pISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="seriesId">558</idno>
</series>
<idno type="istex">0C718A8F5FAB0E25106D6113A0357246B7356F14</idno>
<idno type="DOI">10.1007/11788911_1</idno>
<idno type="ChapterID">1</idno>
<idno type="ChapterID">Chap1</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>2006</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>Abstract: The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.</p>
</abstract>
<textClass>
<keywords scheme="Book Subject Collection">
<list>
<label>SUCO11645</label>
<item>
<term>Computer Science</term>
</item>
</list>
</keywords>
</textClass>
<textClass>
<keywords scheme="Book Subject Group">
<list>
<label>I</label>
<label>I18024</label>
<label>I18032</label>
<label>I18040</label>
<item>
<term>Computer Science</term>
</item>
<item>
<term>Database Management</term>
</item>
<item>
<term>Information Storage and Retrieval</term>
</item>
<item>
<term>Information Systems Applications (incl.Internet)</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc>
<change when="2006">Published</change>
<change xml:id="refBibs-istex" who="#ISTEX-API" when="2016-3-19">References added</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/0C718A8F5FAB0E25106D6113A0357246B7356F14/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="Springer, Publisher found" wicri:toSee="no header">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//Springer-Verlag//DTD A++ V2.4//EN" URI="http://devel.springer.de/A++/V2.4/DTD/A++V2.4.dtd" name="istex:docType"></istex:docType>
<istex:document>
<Publisher>
<PublisherInfo>
<PublisherName>Springer Berlin Heidelberg</PublisherName>
<PublisherLocation>Berlin, Heidelberg</PublisherLocation>
</PublisherInfo>
<Series>
<SeriesInfo SeriesType="Series" TocLevels="0">
<SeriesID>558</SeriesID>
<SeriesPrintISSN>0302-9743</SeriesPrintISSN>
<SeriesElectronicISSN>1611-3349</SeriesElectronicISSN>
<SeriesTitle Language="En">Lecture Notes in Computer Science</SeriesTitle>
</SeriesInfo>
<SeriesHeader>
<EditorGroup>
<Editor AffiliationIDS="Aff1">
<EditorName DisplayOrder="Western">
<GivenName>David</GivenName>
<FamilyName>Hutchison</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff2">
<EditorName DisplayOrder="Western">
<GivenName>Takeo</GivenName>
<FamilyName>Kanade</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff3">
<EditorName DisplayOrder="Western">
<GivenName>Josef</GivenName>
<FamilyName>Kittler</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff4">
<EditorName DisplayOrder="Western">
<GivenName>Jon</GivenName>
<GivenName>M.</GivenName>
<FamilyName>Kleinberg</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff5">
<EditorName DisplayOrder="Western">
<GivenName>Friedemann</GivenName>
<FamilyName>Mattern</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff6">
<EditorName DisplayOrder="Western">
<GivenName>John</GivenName>
<GivenName>C.</GivenName>
<FamilyName>Mitchell</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff7">
<EditorName DisplayOrder="Western">
<GivenName>Moni</GivenName>
<FamilyName>Naor</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff8">
<EditorName DisplayOrder="Western">
<GivenName>Oscar</GivenName>
<FamilyName>Nierstrasz</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff9">
<EditorName DisplayOrder="Western">
<GivenName>C.</GivenName>
<FamilyName>Pandu Rangan</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff10">
<EditorName DisplayOrder="Western">
<GivenName>Bernhard</GivenName>
<FamilyName>Steffen</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff11">
<EditorName DisplayOrder="Western">
<GivenName>Madhu</GivenName>
<FamilyName>Sudan</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff12">
<EditorName DisplayOrder="Western">
<GivenName>Demetri</GivenName>
<FamilyName>Terzopoulos</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff13">
<EditorName DisplayOrder="Western">
<GivenName>Dough</GivenName>
<FamilyName>Tygar</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff14">
<EditorName DisplayOrder="Western">
<GivenName>Moshe</GivenName>
<GivenName>Y.</GivenName>
<FamilyName>Vardi</FamilyName>
</EditorName>
</Editor>
<Editor AffiliationIDS="Aff15">
<EditorName DisplayOrder="Western">
<GivenName>Gerhard</GivenName>
<FamilyName>Weikum</FamilyName>
</EditorName>
</Editor>
<Affiliation ID="Aff1">
<OrgName>Lancaster University</OrgName>
<OrgAddress>
<Country>UK</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff2">
<OrgName>Carnegie Mellon University</OrgName>
<OrgAddress>
<City>Pittsburgh</City>
<State>PA</State>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff3">
<OrgName>University of Surrey</OrgName>
<OrgAddress>
<City>Guildford</City>
<Country>UK</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff4">
<OrgName>Cornell University</OrgName>
<OrgAddress>
<City>Ithaca</City>
<State>NY</State>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff5">
<OrgName>ETH Zurich</OrgName>
<OrgAddress>
<Country>Switzerland</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff6">
<OrgName>Stanford University</OrgName>
<OrgAddress>
<City>CA</City>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff7">
<OrgName>Weizmann Institute of Science</OrgName>
<OrgAddress>
<City>Rehovot</City>
<Country>Israel</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff8">
<OrgName>University of Bern</OrgName>
<OrgAddress>
<Country>Switzerland</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff9">
<OrgName>Indian Institute of Technology</OrgName>
<OrgAddress>
<City>Madras</City>
<Country>India</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff10">
<OrgName>University of Dortmund</OrgName>
<OrgAddress>
<Country>Germany</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff11">
<OrgName>Massachusetts Institute of Technology</OrgName>
<OrgAddress>
<City>MA</City>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff12">
<OrgName>University of California</OrgName>
<OrgAddress>
<City>Los Angeles</City>
<State>CA</State>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff13">
<OrgName>University of California</OrgName>
<OrgAddress>
<City>Berkeley</City>
<State>CA</State>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff14">
<OrgName>Rice University</OrgName>
<OrgAddress>
<City>Houston</City>
<State>TX</State>
<Country>USA</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff15">
<OrgName>Max-Planck Institute of Computer Science</OrgName>
<OrgAddress>
<City>Saarbruecken</City>
<Country>Germany</Country>
</OrgAddress>
</Affiliation>
</EditorGroup>
</SeriesHeader>
<Book Language="En">
<BookInfo BookProductType="Proceedings" ContainsESM="No" Language="En" MediaType="eBook" NumberingDepth="2" NumberingStyle="ContentOnly" OutputMedium="All" TocLevels="0">
<BookID>978-3-540-35971-5</BookID>
<BookTitle>Flexible and Efficient Information Handling</BookTitle>
<BookSubTitle>23rd British National Conference on Databases, BNCOD 23, Belfast, Northern Ireland, UK, July 18-20, 2006. Proceedings</BookSubTitle>
<BookVolumeNumber>4042</BookVolumeNumber>
<BookSequenceNumber>4042</BookSequenceNumber>
<BookDOI>10.1007/11788911</BookDOI>
<BookTitleID>139983</BookTitleID>
<BookPrintISBN>978-3-540-35969-2</BookPrintISBN>
<BookElectronicISBN>978-3-540-35971-5</BookElectronicISBN>
<BookChapterCount>33</BookChapterCount>
<BookCopyright>
<CopyrightHolderName>Springer-Verlag Berlin Heidelberg</CopyrightHolderName>
<CopyrightYear>2006</CopyrightYear>
</BookCopyright>
<BookSubjectGroup>
<BookSubject Code="I" Type="Primary">Computer Science</BookSubject>
<BookSubject Code="I18024" Priority="1" Type="Secondary">Database Management</BookSubject>
<BookSubject Code="I18032" Priority="2" Type="Secondary">Information Storage and Retrieval</BookSubject>
<BookSubject Code="I18040" Priority="3" Type="Secondary">Information Systems Applications (incl.Internet)</BookSubject>
<SubjectCollection Code="SUCO11645">Computer Science</SubjectCollection>
</BookSubjectGroup>
<BookContext>
<SeriesID>558</SeriesID>
</BookContext>
</BookInfo>
<BookHeader>
<EditorGroup>
<Editor AffiliationIDS="Aff16">
<EditorName DisplayOrder="Western">
<GivenName>David</GivenName>
<GivenName>A.</GivenName>
<FamilyName>Bell</FamilyName>
</EditorName>
<Contact>
<Email>da.bell@qub.ac.uk</Email>
</Contact>
</Editor>
<Editor AffiliationIDS="Aff17">
<EditorName DisplayOrder="Western">
<GivenName>Jun</GivenName>
<FamilyName>Hong</FamilyName>
</EditorName>
<Contact>
<Email>j.hong@qub.ac.uk</Email>
</Contact>
</Editor>
<Affiliation ID="Aff16">
<OrgDivision>The School of Electronics, Electrical, Engineering and Computer Science</OrgDivision>
<OrgName>Queen’s University Belfast</OrgName>
<OrgAddress>
<Postcode>BT7 1NN N.I.</Postcode>
<City>Belfast</City>
<Country>UK</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff17">
<OrgDivision>School of Electronics, Electrical Engineering and Computer Science</OrgDivision>
<OrgName>Queen’s University Belfast</OrgName>
<OrgAddress>
<Postcode>BT7 1NN</Postcode>
<City>Belfast</City>
<Country>UK</Country>
</OrgAddress>
</Affiliation>
</EditorGroup>
</BookHeader>
<Part ID="Part1">
<PartInfo TocLevels="0">
<PartID>1</PartID>
<PartSequenceNumber>1</PartSequenceNumber>
<PartTitle>Invited Papers</PartTitle>
<PartChapterCount>2</PartChapterCount>
<PartContext>
<SeriesID>558</SeriesID>
<BookTitle>Flexible and Efficient Information Handling</BookTitle>
</PartContext>
</PartInfo>
<Chapter ID="Chap1" Language="En">
<ChapterInfo ChapterType="OriginalPaper" ContainsESM="No" NumberingDepth="2" NumberingStyle="ContentOnly" TocLevels="0">
<ChapterID>1</ChapterID>
<ChapterDOI>10.1007/11788911_1</ChapterDOI>
<ChapterSequenceNumber>1</ChapterSequenceNumber>
<ChapterTitle Language="En">The Lixto Project: Exploring New Frontiers of Web Data Extraction</ChapterTitle>
<ChapterFirstPage>1</ChapterFirstPage>
<ChapterLastPage>15</ChapterLastPage>
<ChapterCopyright>
<CopyrightHolderName>Springer-Verlag Berlin Heidelberg</CopyrightHolderName>
<CopyrightYear>2006</CopyrightYear>
</ChapterCopyright>
<ChapterGrants Type="Regular">
<MetadataGrant Grant="OpenAccess"></MetadataGrant>
<AbstractGrant Grant="OpenAccess"></AbstractGrant>
<BodyPDFGrant Grant="Restricted"></BodyPDFGrant>
<BodyHTMLGrant Grant="Restricted"></BodyHTMLGrant>
<BibliographyGrant Grant="Restricted"></BibliographyGrant>
<ESMGrant Grant="Restricted"></ESMGrant>
</ChapterGrants>
<ChapterContext>
<SeriesID>558</SeriesID>
<PartID>1</PartID>
<BookID>978-3-540-35971-5</BookID>
<BookTitle>Flexible and Efficient Information Handling</BookTitle>
</ChapterContext>
</ChapterInfo>
<ChapterHeader>
<AuthorGroup>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Julien</GivenName>
<FamilyName>Carme</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Michal</GivenName>
<FamilyName>Ceresna</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Oliver</GivenName>
<FamilyName>Frölich</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff19">
<AuthorName DisplayOrder="Western">
<GivenName>Georg</GivenName>
<FamilyName>Gottlob</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Tamir</GivenName>
<FamilyName>Hassan</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Marcus</GivenName>
<FamilyName>Herzog</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Wolfgang</GivenName>
<FamilyName>Holzinger</FamilyName>
</AuthorName>
</Author>
<Author AffiliationIDS="Aff18">
<AuthorName DisplayOrder="Western">
<GivenName>Bernhard</GivenName>
<FamilyName>Krüpl</FamilyName>
</AuthorName>
</Author>
<Affiliation ID="Aff18">
<OrgDivision>Database and Artificial Intelligence Group</OrgDivision>
<OrgName>Vienna University of Technology</OrgName>
<OrgAddress>
<Street>Favoritenstraße 9-11</Street>
<Postcode>A-1040</Postcode>
<City>Wien</City>
<Country>Austria</Country>
</OrgAddress>
</Affiliation>
<Affiliation ID="Aff19">
<OrgName>Oxford University Computing Laboratory</OrgName>
<OrgAddress>
<Street>Wolfson Building, Parks Road</Street>
<City>Oxford</City>
<Postcode>OX1 3QD</Postcode>
<Country>United Kingdom</Country>
</OrgAddress>
</Affiliation>
</AuthorGroup>
<Abstract ID="Abs1" Language="En">
<Heading>Abstract</Heading>
<Para>The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.</Para>
</Abstract>
<ArticleNote Type="Misc">
<SimplePara>This work is funded in part by the Austrian Federal Ministry for Transport, Innovation and Technology under the FIT-IT Semantic Systems program.</SimplePara>
</ArticleNote>
</ChapterHeader>
<NoBody></NoBody>
</Chapter>
</Part>
</Book>
</Series>
</Publisher>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en">
<title>The Lixto Project: Exploring New Frontiers of Web Data Extraction</title>
</titleInfo>
<name type="personal">
<namePart type="given">Julien</namePart>
<namePart type="family">Carme</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Michal</namePart>
<namePart type="family">Ceresna</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Oliver</namePart>
<namePart type="family">Frölich</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Georg</namePart>
<namePart type="family">Gottlob</namePart>
<affiliation>Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, United Kingdom</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tamir</namePart>
<namePart type="family">Hassan</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Marcus</namePart>
<namePart type="family">Herzog</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Wolfgang</namePart>
<namePart type="family">Holzinger</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bernhard</namePart>
<namePart type="family">Krüpl</namePart>
<affiliation>Database and Artificial Intelligence Group, Vienna University of Technology, Favoritenstraße 9-11, A-1040, Wien, Austria</affiliation>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="conference [eBooks]" displayLabel="OriginalPaper"></genre>
<originInfo>
<publisher>Springer Berlin Heidelberg</publisher>
<place>
<placeTerm type="text">Berlin, Heidelberg</placeTerm>
</place>
<dateIssued encoding="w3cdtf">2006</dateIssued>
<copyrightDate encoding="w3cdtf">2006</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">Abstract: The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.</abstract>
<relatedItem type="host">
<titleInfo>
<title>Flexible and Efficient Information Handling</title>
<subTitle>23rd British National Conference on Databases, BNCOD 23, Belfast, Northern Ireland, UK, July 18-20, 2006. Proceedings</subTitle>
</titleInfo>
<name type="personal">
<namePart type="given">David</namePart>
<namePart type="given">A.</namePart>
<namePart type="family">Bell</namePart>
<affiliation>The School of Electronics, Electrical, Engineering and Computer Science, Queen’s University Belfast, BT7 1NN N.I., Belfast, UK</affiliation>
<affiliation>E-mail: da.bell@qub.ac.uk</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jun</namePart>
<namePart type="family">Hong</namePart>
<affiliation>School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, BT7 1NN, Belfast, UK</affiliation>
<affiliation>E-mail: j.hong@qub.ac.uk</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<genre type="Book Series" displayLabel="Proceedings"></genre>
<originInfo>
<copyrightDate encoding="w3cdtf">2006</copyrightDate>
<issuance>monographic</issuance>
</originInfo>
<subject>
<genre>Book Subject Collection</genre>
<topic authority="SpringerSubjectCodes" authorityURI="SUCO11645">Computer Science</topic>
</subject>
<subject>
<genre>Book Subject Group</genre>
<topic authority="SpringerSubjectCodes" authorityURI="I">Computer Science</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18024">Database Management</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18032">Information Storage and Retrieval</topic>
<topic authority="SpringerSubjectCodes" authorityURI="I18040">Information Systems Applications (incl.Internet)</topic>
</subject>
<identifier type="DOI">10.1007/11788911</identifier>
<identifier type="ISBN">978-3-540-35969-2</identifier>
<identifier type="eISBN">978-3-540-35971-5</identifier>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="BookTitleID">139983</identifier>
<identifier type="BookID">978-3-540-35971-5</identifier>
<identifier type="BookChapterCount">33</identifier>
<identifier type="BookVolumeNumber">4042</identifier>
<identifier type="BookSequenceNumber">4042</identifier>
<identifier type="PartChapterCount">2</identifier>
<part>
<date>2006</date>
<detail type="part">
<title>Invited Papers</title>
</detail>
<detail type="volume">
<number>4042</number>
<caption>vol.</caption>
</detail>
<extent unit="pages">
<start>1</start>
<end>15</end>
</extent>
</part>
<recordInfo>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2006</recordOrigin>
</recordInfo>
</relatedItem>
<relatedItem type="series">
<titleInfo>
<title>Lecture Notes in Computer Science</title>
</titleInfo>
<name type="personal">
<namePart type="given">David</namePart>
<namePart type="family">Hutchison</namePart>
<affiliation>Lancaster University, UK</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Takeo</namePart>
<namePart type="family">Kanade</namePart>
<affiliation>Carnegie Mellon University, Pittsburgh, PA, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Josef</namePart>
<namePart type="family">Kittler</namePart>
<affiliation>University of Surrey, Guildford, UK</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Jon</namePart>
<namePart type="given">M.</namePart>
<namePart type="family">Kleinberg</namePart>
<affiliation>Cornell University, Ithaca, NY, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Friedemann</namePart>
<namePart type="family">Mattern</namePart>
<affiliation>ETH Zurich, Switzerland</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">John</namePart>
<namePart type="given">C.</namePart>
<namePart type="family">Mitchell</namePart>
<affiliation>Stanford University, CA, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Moni</namePart>
<namePart type="family">Naor</namePart>
<affiliation>Weizmann Institute of Science, Rehovot, Israel</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Oscar</namePart>
<namePart type="family">Nierstrasz</namePart>
<affiliation>University of Bern, Switzerland</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">C.</namePart>
<namePart type="family">Pandu Rangan</namePart>
<affiliation>Indian Institute of Technology, Madras, India</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Bernhard</namePart>
<namePart type="family">Steffen</namePart>
<affiliation>University of Dortmund, Germany</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Madhu</namePart>
<namePart type="family">Sudan</namePart>
<affiliation>Massachusetts Institute of Technology, MA, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Demetri</namePart>
<namePart type="family">Terzopoulos</namePart>
<affiliation>University of California, Los Angeles, CA, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Dough</namePart>
<namePart type="family">Tygar</namePart>
<affiliation>University of California, Berkeley, CA, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Moshe</namePart>
<namePart type="given">Y.</namePart>
<namePart type="family">Vardi</namePart>
<affiliation>Rice University, Houston, TX, USA</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Gerhard</namePart>
<namePart type="family">Weikum</namePart>
<affiliation>Max-Planck Institute of Computer Science, Saarbruecken, Germany</affiliation>
<role>
<roleTerm type="text">editor</roleTerm>
</role>
</name>
<originInfo>
<copyrightDate encoding="w3cdtf">2006</copyrightDate>
<issuance>serial</issuance>
</originInfo>
<identifier type="ISSN">0302-9743</identifier>
<identifier type="eISSN">1611-3349</identifier>
<identifier type="SeriesID">558</identifier>
<recordInfo>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2006</recordOrigin>
</recordInfo>
</relatedItem>
<identifier type="istex">0C718A8F5FAB0E25106D6113A0357246B7356F14</identifier>
<identifier type="DOI">10.1007/11788911_1</identifier>
<identifier type="ChapterID">1</identifier>
<identifier type="ChapterID">Chap1</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Springer-Verlag Berlin Heidelberg, 2006</accessCondition>
<recordInfo>
<recordContentSource>SPRINGER</recordContentSource>
<recordOrigin>Springer-Verlag Berlin Heidelberg, 2006</recordOrigin>
</recordInfo>
</mods>
</metadata>
<enrichments>
<istex:refBibTEI uri="https://api.istex.fr/document/0C718A8F5FAB0E25106D6113A0357246B7356F14/enrichments/refBib">
<teiHeader></teiHeader>
<text>
<front></front>
<body></body>
<back>
<listBibl>
<biblStruct xml:id="b0">
<analytic>
<title level="a" type="main">M: Document understanding for a broad class of documents</title>
<author>
<persName>
<forename type="first">M</forename>
<surname>Aiello</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<surname>Monz</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">L</forename>
<surname>Todoran</surname>
</persName>
</author>
<author>
<persName>
<surname>Worring</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Int. J. of Document Anal. and Recog</title>
<imprint>
<biblScope unit="volume">5</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="1" to="16"></biblScope>
<date type="published" when="2002"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b1">
<analytic>
<title level="a" type="main">Transforming Paper Documents into XML Format with WISDOM++</title>
<author>
<persName>
<forename type="first">O</forename>
<surname>Altamura</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">F</forename>
<surname>Esposito</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">D</forename>
<surname>Malerba</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Intl. J. of Doc. Anal. and Recog</title>
<imprint>
<biblScope unit="volume">4</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="2" to="17"></biblScope>
<date type="published" when="2001"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b2">
<analytic>
<title level="a" type="main">Visual Web Information Extraction with Lixto</title>
<author>
<persName>
<forename type="first">R</forename>
<surname>Baumgartner</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Flesca</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">G</forename>
<surname>Gottlob</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the 27th International Conference on Very Large Data Bases</title>
<meeting>the 27th International Conference on Very Large Data Bases
<address>
<addrLine>Rome, Italy</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2001"></date>
<biblScope unit="page" from="119" to="128"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b3">
<analytic>
<title level="a" type="main">Automating Web Navigation in Web Data Extraction</title>
<author>
<persName>
<forename type="first">R</forename>
<surname>Baumgartner</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<surname>Ceresna</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">G</forename>
<surname>Ledermüller</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of International Conference on Intelligent Agents, Web Technology and Internet Commerce</title>
<meeting>International Conference on Intelligent Agents, Web Technology and Internet Commerce
<address>
<addrLine>Vienna, Austria</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2005"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b4">
<analytic>
<title level="a" type="main">Learnability and the Vapnik-Chervonenkis dimension</title>
<author>
<persName>
<forename type="first">A</forename>
<surname>Blumer</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">A</forename>
<surname>Ehrenfeucht</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">D</forename>
<surname>Haussler</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<forename type="middle">K</forename>
<surname>Warmuth</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">J. ACM</title>
<imprint>
<biblScope unit="volume">36</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="929" to="965"></biblScope>
<date type="published" when="1989"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b5">
<analytic>
<title level="a" type="main">Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery</title>
<author>
<persName>
<forename type="first">S</forename>
<surname>Chakrabarti</surname>
</persName>
</author>
<author>
<persName>
<surname>Van Den</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<surname>Berg</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">B</forename>
<surname>Dom</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Computer Networks</title>
<imprint>
<biblScope unit="volume">31</biblScope>
<biblScope unit="page" from="11" to="16"></biblScope>
<date type="published" when="1999"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b6">
<analytic>
<title level="a" type="main">Query Based Learning of XPath Fragments</title>
<author>
<persName>
<forename type="first">M</forename>
<surname>Ceresna</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">G</forename>
<surname>Gottlob</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of Dagstuhl Seminar on Machine Learning for the Semantic Web (05071)</title>
<meeting>Dagstuhl Seminar on Machine Learning for the Semantic Web (05071)
<address>
<addrLine>Dagstuhl, Germany</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2005"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b7">
<analytic>
<title level="a" type="main">Toward Semantic Understanding – An Approach Based on Information Extraction Ontologies</title>
<author>
<persName>
<forename type="first">D</forename>
<forename type="middle">W</forename>
<surname>Embley</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the Fifteenth Australasian Database Conference</title>
<meeting>the Fifteenth Australasian Database Conference
<address>
<addrLine>Dunedin, New Zealand</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2004"></date>
<biblScope unit="page">3</biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b8">
<analytic>
<title level="a" type="main">A Formal Comparison of Visual Web Wrapper Generators Theory and Practice of Computer Science</title>
<author>
<persName>
<forename type="first">G</forename>
<surname>Gottlob</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<surname>Koch</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">32nd Conference on Current Trends in Theory and Practice of Computer Science</title>
<meeting>
<address>
<addrLine>Merín, Czech Republic</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2006"></date>
<biblScope unit="page" from="30" to="48"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b9">
<analytic>
<title level="a" type="main">Monadic datalog and the expressive power of languages for Web information extraction</title>
<author>
<persName>
<forename type="first">G</forename>
<surname>Gottlob</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<surname>Koch</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">J. ACM</title>
<imprint>
<biblScope unit="volume">51</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="74" to="113"></biblScope>
<date type="published" when="2004"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b10">
<analytic>
<title level="a" type="main">The Lixto Data Extraction Project -Back and Forth between Theory and Practice</title>
<author>
<persName>
<forename type="first">G</forename>
<surname>Gottlob</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<surname>Koch</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">R</forename>
<surname>Baumgartner</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">M</forename>
<surname>Herzog</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Flesca</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGAR Symposium on Principles of Database Systems</title>
<meeting>the Twenty-third ACM SIGACT-SIGMOD-SIGAR Symposium on Principles of Database Systems
<address>
<addrLine>Paris, France</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2004"></date>
<biblScope unit="page" from="1" to="12"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b11">
<analytic>
<title level="a" type="main">Efficient algorithms for processing XPath queries</title>
<author>
<persName>
<forename type="first">G</forename>
<surname>Gottlob</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">C</forename>
<surname>Koch</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">R</forename>
<surname>Pichler</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">ACM Trans. Database Syst</title>
<imprint>
<biblScope unit="volume">30</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="444" to="491"></biblScope>
<date type="published" when="2005"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b12">
<analytic>
<title level="a" type="main">Using Graph Matching Techniques to Wrap Data from PDF Documents</title>
<author>
<persName>
<forename type="first">T</forename>
<surname>Hassan</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">R</forename>
<surname>Baumgartner</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">To appear in Proceedings of the 15th International World Wide Web Conference (Poster Track)</title>
<meeting>
<address>
<addrLine>Edinburgh, UK</addrLine>
</address>
</meeting>
<imprint>
<date type="published" when="2006"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b13">
<monogr>
<title level="m" type="main">The Interpretation of Tables in Texts</title>
<author>
<persName>
<forename type="first">M</forename>
<surname>Hurst</surname>
</persName>
</author>
<imprint>
<date type="published" when="2000"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b14">
<analytic>
<title level="a" type="main">Binary Codes Capable of Correcting Spurious Insertions and Deletions of Ones</title>
<author>
<persName>
<forename type="first">V</forename>
<forename type="middle">I</forename>
<surname>Levenshtein</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Russian Problemy Peredachi Informatsii</title>
<imprint>
<biblScope unit="volume">1</biblScope>
<biblScope unit="page" from="12" to="25"></biblScope>
<date type="published" when="1965"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b15">
<analytic>
<title level="a" type="main">Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs</title>
<author>
<persName>
<forename type="first">J</forename>
<surname>Llados</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">E</forename>
<surname>Marti</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">J</forename>
<forename type="middle">J</forename>
<surname>Villanueva</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">IEEE Tran. on Pattern Anal. and Mach. Intel</title>
<imprint>
<biblScope unit="volume">23</biblScope>
<biblScope unit="issue">10</biblScope>
<biblScope unit="page" from="1137" to="1143"></biblScope>
<date type="published" when="2001"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b16">
<analytic>
<title level="a" type="main">The Anatomy of a Large-Scale Hypertextual Web Search Engine</title>
<author>
<persName>
<forename type="first">L</forename>
<surname>Page</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">S</forename>
<surname>Brin</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="j">Computer Networks</title>
<imprint>
<biblScope unit="volume">30</biblScope>
<biblScope unit="page" from="1" to="7"></biblScope>
<date type="published" when="1998"></date>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b17">
<analytic>
<title level="a" type="main">Automatic Selection of Table Areas in Documents for Information Extraction</title>
<author>
<persName>
<forename type="first">A</forename>
<forename type="middle">C</forename>
<surname>Silva</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">J</forename>
<surname>Alipio</surname>
</persName>
</author>
<author>
<persName>
<forename type="first">L</forename>
<surname>Torgo</surname>
</persName>
</author>
</analytic>
<monogr>
<title level="m">11th Protuguese Conference on Artificial Intelligence</title>
<imprint>
<date type="published" when="2003"></date>
<biblScope unit="page" from="460" to="465"></biblScope>
</imprint>
</monogr>
</biblStruct>
<biblStruct xml:id="b18">
<analytic>
<title level="a" type="main">Version 1</title>
</analytic>
<monogr>
<title level="j">XML Path Language (XPath)</title>
<imprint></imprint>
</monogr>
</biblStruct>
</listBibl>
</back>
</text>
</istex:refBibTEI>
</enrichments>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001738 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 001738 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:0C718A8F5FAB0E25106D6113A0357246B7356F14
   |texte=   The Lixto Project: Exploring New Frontiers of Web Data Extraction
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024