Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Pattern-Based Approach to Table Extraction

Identifieur interne : 003A51 ( Istex/Curation ); précédent : 003A50; suivant : 003A52

Pattern-Based Approach to Table Extraction

Auteurs : K. C. Santosh [France] ; Abdel Belaïd [France]

Source :

RBID : ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0

Abstract

Abstract: In this paper, we address a client-driven approach to automatically extract information content within the table in document images. We start with a graph-based representation of a set of key-fields selected by clients and perform graph mining in a document in order to learn them to produce a model. Such models are aimed to use to extract information content in the absence of clients. To avoid NP-hard general problem, our graph matching is based on relation assignment to see whether pairs of nodes are semantically identical. We have validated the concept by using a real-world industrial problem.

Url:
DOI: 10.1007/978-3-642-38628-2_91

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Pattern-Based Approach to Table Extraction</title>
<author>
<name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
<affiliation wicri:level="1">
<mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: santosh.kc@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="1">
<mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: abdel.belaid@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0</idno>
<date when="2013" year="2013">2013</date>
<idno type="doi">10.1007/978-3-642-38628-2_91</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-6BW59TPP-8/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003A95</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">003A95</idno>
<idno type="wicri:Area/Istex/Curation">003A51</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Pattern-Based Approach to Table Extraction</title>
<author>
<name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
<affiliation wicri:level="1">
<mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: santosh.kc@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="1">
<mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: abdel.belaid@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In this paper, we address a client-driven approach to automatically extract information content within the table in document images. We start with a graph-based representation of a set of key-fields selected by clients and perform graph mining in a document in order to learn them to produce a model. Such models are aimed to use to extract information content in the absence of clients. To avoid NP-hard general problem, our graph matching is based on relation assignment to see whether pairs of nodes are semantically identical. We have validated the concept by using a real-world industrial problem.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003A51 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 003A51 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0
   |texte=   Pattern-Based Approach to Table Extraction
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022