Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Text extraction from graphical document images using sparse representation

Identifieur interne : 004A71 ( Hal/Curation ); précédent : 004A70; suivant : 004A72

Text extraction from graphical document images using sparse representation

Auteurs : Thai V. Hoang [France] ; Salvatore Tabbone [France]

Source :

RBID : Hal:inria-00494513

Abstract

A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-dimensional signals by which text and graphics have different morphological characteristics. The proposed algorithm relies upon a sparse representation framework with two appropriately chosen discriminative overcomplete dictionaries, each one gives sparse representation over one type of signal and non-sparse representation over the other. Separation of text and graphics components is obtained by promoting sparse representation of input images in these two dictionaries. Some heuristic rules are used for grouping text components into text strings in post-processing steps. The proposed method overcomes the problem of touching between text and graphics. Preliminary experiments show some promising results on different types of document.

Url:
DOI: 10.1145/1815330.1815349

Links toward previous steps (curation, corpus...)


Links to Exploration step

Hal:inria-00494513

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Text extraction from graphical document images using sparse representation</title>
<author>
<name sortKey="Hoang, Thai V" sort="Hoang, Thai V" uniqKey="Hoang T" first="Thai V." last="Hoang">Thai V. Hoang</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-119680" status="OLD">
<orgName>Querying Graphics through Analysis and Recognition</orgName>
<orgName type="acronym">QGAR</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://qgar.loria.fr</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
<author>
<name sortKey="Tabbone, Salvatore" sort="Tabbone, Salvatore" uniqKey="Tabbone S" first="Salvatore" last="Tabbone">Salvatore Tabbone</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-119680" status="OLD">
<orgName>Querying Graphics through Analysis and Recognition</orgName>
<orgName type="acronym">QGAR</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://qgar.loria.fr</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:inria-00494513</idno>
<idno type="halId">inria-00494513</idno>
<idno type="halUri">https://hal.inria.fr/inria-00494513</idno>
<idno type="url">https://hal.inria.fr/inria-00494513</idno>
<idno type="doi">10.1145/1815330.1815349</idno>
<date when="2010-06-09">2010-06-09</date>
<idno type="wicri:Area/Hal/Corpus">004A71</idno>
<idno type="wicri:Area/Hal/Curation">004A71</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Text extraction from graphical document images using sparse representation</title>
<author>
<name sortKey="Hoang, Thai V" sort="Hoang, Thai V" uniqKey="Hoang T" first="Thai V." last="Hoang">Thai V. Hoang</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-119680" status="OLD">
<orgName>Querying Graphics through Analysis and Recognition</orgName>
<orgName type="acronym">QGAR</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://qgar.loria.fr</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
<author>
<name sortKey="Tabbone, Salvatore" sort="Tabbone, Salvatore" uniqKey="Tabbone S" first="Salvatore" last="Tabbone">Salvatore Tabbone</name>
<affiliation wicri:level="1">
<hal:affiliation type="researchteam" xml:id="struct-119680" status="OLD">
<orgName>Querying Graphics through Analysis and Recognition</orgName>
<orgName type="acronym">QGAR</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://qgar.loria.fr</ref>
</desc>
<listRelation>
<relation active="#struct-160" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-300291" type="indirect"></relation>
<relation active="#struct-300292" type="indirect"></relation>
<relation active="#struct-300293" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-160" type="direct">
<org type="laboratory" xml:id="struct-160" status="OLD">
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<desc>
<address>
<addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
<relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-300291" type="direct"></relation>
<relation active="#struct-300292" type="direct"></relation>
<relation active="#struct-300293" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect">
<org type="institution" xml:id="struct-441569" status="VALID">
<idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect">
<org type="institution" xml:id="struct-300009" status="VALID">
<orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc>
<address>
<addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300291" type="indirect">
<org type="institution" xml:id="struct-300291" status="OLD">
<orgName>Université Henri Poincaré - Nancy 1</orgName>
<orgName type="acronym">UHP</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>24-30 rue Lionnois, BP 60120, 54 003 NANCY cedex, France</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300292" type="indirect">
<org type="institution" xml:id="struct-300292" status="OLD">
<orgName>Université Nancy 2</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<addrLine>91 avenue de la Libération, BP 454, 54001 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle active="#struct-300293" type="indirect">
<org type="institution" xml:id="struct-300293" status="OLD">
<orgName>Institut National Polytechnique de Lorraine</orgName>
<orgName type="acronym">INPL</orgName>
<date type="end">2011-12-31</date>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université Nancy 2</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
<placeName>
<settlement type="city">Nancy</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Institut national polytechnique de Lorraine</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lorraine</orgName>
</affiliation>
</author>
</analytic>
<idno type="DOI">10.1145/1815330.1815349</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-dimensional signals by which text and graphics have different morphological characteristics. The proposed algorithm relies upon a sparse representation framework with two appropriately chosen discriminative overcomplete dictionaries, each one gives sparse representation over one type of signal and non-sparse representation over the other. Separation of text and graphics components is obtained by promoting sparse representation of input images in these two dictionaries. Some heuristic rules are used for grouping text components into text strings in post-processing steps. The proposed method overcomes the problem of touching between text and graphics. Preliminary experiments show some promising results on different types of document.</div>
</front>
</TEI>
<hal api="V3">
<titleStmt>
<title xml:lang="en">Text extraction from graphical document images using sparse representation</title>
<author role="aut">
<persName>
<forename type="first">Thai V.</forename>
<surname>Hoang</surname>
</persName>
<email>vanthai.hoang@loria.fr</email>
<idno type="idhal">hoang</idno>
<idno type="halauthor">446005</idno>
<affiliation ref="#struct-119680"></affiliation>
</author>
<author role="aut">
<persName>
<forename type="first">Salvatore</forename>
<surname>Tabbone</surname>
</persName>
<email>tabbone@loria.fr</email>
<idno type="halauthor">59043</idno>
<affiliation ref="#struct-119680"></affiliation>
</author>
<editor role="depositor">
<persName>
<forename>Thai V.</forename>
<surname>Hoang</surname>
</persName>
<email>hoang@igbmc.fr</email>
</editor>
</titleStmt>
<editionStmt>
<edition n="v1" type="current">
<date type="whenSubmitted">2010-06-23 14:23:01</date>
<date type="whenModified">2016-05-19 01:04:38</date>
<date type="whenReleased">2010-06-23 17:19:19</date>
<date type="whenProduced">2010-06-09</date>
<date type="whenEndEmbargoed">2010-06-23</date>
<ref type="file" target="https://hal.inria.fr/inria-00494513/document">
<date notBefore="2010-06-23"></date>
</ref>
<ref type="file" subtype="author" n="1" target="https://hal.inria.fr/inria-00494513/file/DAS2010.pdf">
<date notBefore="2010-06-23"></date>
</ref>
</edition>
<respStmt>
<resp>contributor</resp>
<name key="145187">
<persName>
<forename>Thai V.</forename>
<surname>Hoang</surname>
</persName>
<email>hoang@igbmc.fr</email>
</name>
</respStmt>
</editionStmt>
<publicationStmt>
<distributor>CCSD</distributor>
<idno type="halId">inria-00494513</idno>
<idno type="halUri">https://hal.inria.fr/inria-00494513</idno>
<idno type="halBibtex">hoang:inria-00494513</idno>
<idno type="halRefHtml">International Workshop on Document Analysis Systems - DAS'2010, Jun 2010, Boston, United States. ACM, pp.143-150, 2010, ACM International Conference Proceeding Series; Proceedings of the 8th IAPR International Workshop on Document Analysis Systems. <10.1145/1815330.1815349></idno>
<idno type="halRef">International Workshop on Document Analysis Systems - DAS'2010, Jun 2010, Boston, United States. ACM, pp.143-150, 2010, ACM International Conference Proceeding Series; Proceedings of the 8th IAPR International Workshop on Document Analysis Systems. <10.1145/1815330.1815349></idno>
</publicationStmt>
<seriesStmt>
<idno type="stamp" n="CNRS">CNRS - Centre national de la recherche scientifique</idno>
<idno type="stamp" n="INRIA">INRIA - Institut National de Recherche en Informatique et en Automatique</idno>
<idno type="stamp" n="INPL">Institut National Polytechnique de Lorraine</idno>
<idno type="stamp" n="LORIA2">Publications du LORIA</idno>
<idno type="stamp" n="LABO-LORIA-SET" p="LORIA">LABO-LORIA-SET</idno>
<idno type="stamp" n="LORIA">LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications</idno>
<idno type="stamp" n="LORIA-TALC" p="LORIA">Traitement automatique des langues et des connaissances</idno>
<idno type="stamp" n="UNIV-LORRAINE">Université de Lorraine</idno>
</seriesStmt>
<notesStmt>
<note type="audience" n="2">International</note>
<note type="invited" n="0">No</note>
<note type="popular" n="0">No</note>
<note type="peer" n="1">Yes</note>
<note type="proceedings" n="1">Yes</note>
</notesStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Text extraction from graphical document images using sparse representation</title>
<author role="aut">
<persName>
<forename type="first">Thai V.</forename>
<surname>Hoang</surname>
</persName>
<email>vanthai.hoang@loria.fr</email>
<idno type="idHal">hoang</idno>
<idno type="halAuthorId">446005</idno>
<affiliation ref="#struct-119680"></affiliation>
</author>
<author role="aut">
<persName>
<forename type="first">Salvatore</forename>
<surname>Tabbone</surname>
</persName>
<email>tabbone@loria.fr</email>
<idno type="halAuthorId">59043</idno>
<affiliation ref="#struct-119680"></affiliation>
</author>
</analytic>
<monogr>
<meeting>
<title>International Workshop on Document Analysis Systems - DAS'2010</title>
<date type="start">2010-06-09</date>
<date type="end">2010-06-11</date>
<settlement>Boston</settlement>
<country key="US">United States</country>
</meeting>
<imprint>
<publisher>ACM</publisher>
<biblScope unit="serie"></biblScope>
<biblScope unit="pp">143-150</biblScope>
<date type="datePub">2010-06-09</date>
</imprint>
</monogr>
<idno type="doi">10.1145/1815330.1815349</idno>
</biblStruct>
</sourceDesc>
<profileDesc>
<langUsage>
<language ident="en">English</language>
</langUsage>
<textClass>
<classCode scheme="halDomain" n="info.info-ti">Computer Science [cs]/Image Processing</classCode>
<classCode scheme="halTypology" n="COMM">Conference papers</classCode>
</textClass>
<abstract xml:lang="en">A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-dimensional signals by which text and graphics have different morphological characteristics. The proposed algorithm relies upon a sparse representation framework with two appropriately chosen discriminative overcomplete dictionaries, each one gives sparse representation over one type of signal and non-sparse representation over the other. Separation of text and graphics components is obtained by promoting sparse representation of input images in these two dictionaries. Some heuristic rules are used for grouping text components into text strings in post-processing steps. The proposed method overcomes the problem of touching between text and graphics. Preliminary experiments show some promising results on different types of document.</abstract>
</profileDesc>
</hal>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Hal/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 004A71 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Hal/Curation/biblio.hfd -nk 004A71 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Hal
   |étape=   Curation
   |type=    RBID
   |clé=     Hal:inria-00494513
   |texte=   Text extraction from graphical document images using sparse representation
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022