Pattern-Based Approach to Table Extraction
Identifieur interne : 003A51 ( Istex/Curation ); précédent : 003A50; suivant : 003A52Pattern-Based Approach to Table Extraction
Auteurs : K. C. Santosh [France] ; Abdel Belaïd [France]Source :
- Lecture Notes in Computer Science [ 0302-9743 ]
Abstract
Abstract: In this paper, we address a client-driven approach to automatically extract information content within the table in document images. We start with a graph-based representation of a set of key-fields selected by clients and perform graph mining in a document in order to learn them to produce a model. Such models are aimed to use to extract information content in the absence of clients. To avoid NP-hard general problem, our graph matching is based on relation assignment to see whether pairs of nodes are semantically identical. We have validated the concept by using a real-world industrial problem.
Url:
DOI: 10.1007/978-3-642-38628-2_91
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: Pour aller vers cette notice dans l'étape Curation :003A95
Links to Exploration step
ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Pattern-Based Approach to Table Extraction</title>
<author><name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
<affiliation wicri:level="1"><mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: santosh.kc@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="1"><mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: abdel.belaid@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0</idno>
<date when="2013" year="2013">2013</date>
<idno type="doi">10.1007/978-3-642-38628-2_91</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-6BW59TPP-8/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">003A95</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">003A95</idno>
<idno type="wicri:Area/Istex/Curation">003A51</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Pattern-Based Approach to Table Extraction</title>
<author><name sortKey="Santosh, K C" sort="Santosh, K C" uniqKey="Santosh K" first="K. C." last="Santosh">K. C. Santosh</name>
<affiliation wicri:level="1"><mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: santosh.kc@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Belaid, Abdel" sort="Belaid, Abdel" uniqKey="Belaid A" first="Abdel" last="Belaïd">Abdel Belaïd</name>
<affiliation wicri:level="1"><mods:affiliation>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Loria – Université de Lorraine, Loria Campus Scientifique, BP 239, 54506, Nancy Cedex</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: abdel.belaid@loria.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper, we address a client-driven approach to automatically extract information content within the table in document images. We start with a graph-based representation of a set of key-fields selected by clients and perform graph mining in a document in order to learn them to produce a model. Such models are aimed to use to extract information content in the absence of clients. To avoid NP-hard general problem, our graph matching is based on relation assignment to see whether pairs of nodes are semantically identical. We have validated the concept by using a real-world industrial problem.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003A51 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 003A51 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Istex |étape= Curation |type= RBID |clé= ISTEX:F624E5693265B0B7AB42EEE1FB3BF2E3FE659AF0 |texte= Pattern-Based Approach to Table Extraction }}
This area was generated with Dilib version V0.6.33. |