Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures

Identifieur interne : 002366 ( Main/Merge ); précédent : 002365; suivant : 002367

Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures

Auteurs : Hirobumi Nishida [Japon]

Source :

RBID : ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26B

Abstract

Recognition of documents of poor image quality is a challenging and important problem from a practical point of view. In traditional approaches, features such as center lines of strokes or contours are extracted from binary images obtained by thresholding the gray-scale intensity images. Wang and Pavlidis (IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1993, 1053–1067) have recently pointed out that effective features for recognition should be extracted directly from original gray-scale intensity images in order to avoid a significant amount of information loss caused by binarization. In this paper, a novel method is presented for extracting closed boundaries of document components such as characters and symbols directly from gray-scale document images, based on the surface data structures and structural features. The gray-scale document image can be treated as a surface defined over a two-dimensional space by regarding intensity values associated with pixels as height. This method is based on a simple model that assumes a closed boundary of document components can be approximated as a series of horizontal (parallel to the image plane) line segments and can be extracted by linking surface components with steep gradients based on configurations of intersections of horizontal planes and surface components. Furthermore, the gray-scale image can be converted into a binary image based on extracted boundaries so that any recognition system can accept output of the proposed algorithm as input. The performance of the proposed algorithm is compared with some binarization algorithms based on global and local thresholding of intensity values and is shown to be effective for improving recognition accuracy for very poor quality data.

Url:
DOI: 10.1006/gmip.1997.0452

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26B

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures</title>
<author>
<name sortKey="Nishida, Hirobumi" sort="Nishida, Hirobumi" uniqKey="Nishida H" first="Hirobumi" last="Nishida">Hirobumi Nishida</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26B</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1006/gmip.1997.0452</idno>
<idno type="url">https://api.istex.fr/document/5BB038E1D76DFE40A97E5310AAB865174355D26B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000382</idno>
<idno type="wicri:Area/Istex/Curation">000377</idno>
<idno type="wicri:Area/Istex/Checkpoint">001753</idno>
<idno type="wicri:doubleKey">1077-3169:1998:Nishida H:boundary:extraction:from</idno>
<idno type="wicri:Area/Main/Merge">002366</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures</title>
<author>
<name sortKey="Nishida, Hirobumi" sort="Nishida, Hirobumi" uniqKey="Nishida H" first="Hirobumi" last="Nishida">Hirobumi Nishida</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Software Research Center, Ricoh Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo, 112</wicri:regionArea>
<wicri:noRegion>112</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Graphical Models and Image Processing</title>
<title level="j" type="abbrev">YGMIP</title>
<idno type="ISSN">1077-3169</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">60</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="35">35</biblScope>
<biblScope unit="page" to="45">45</biblScope>
</imprint>
<idno type="ISSN">1077-3169</idno>
</series>
<idno type="istex">5BB038E1D76DFE40A97E5310AAB865174355D26B</idno>
<idno type="DOI">10.1006/gmip.1997.0452</idno>
<idno type="PII">S1077-3169(97)90452-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1077-3169</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Recognition of documents of poor image quality is a challenging and important problem from a practical point of view. In traditional approaches, features such as center lines of strokes or contours are extracted from binary images obtained by thresholding the gray-scale intensity images. Wang and Pavlidis (IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1993, 1053–1067) have recently pointed out that effective features for recognition should be extracted directly from original gray-scale intensity images in order to avoid a significant amount of information loss caused by binarization. In this paper, a novel method is presented for extracting closed boundaries of document components such as characters and symbols directly from gray-scale document images, based on the surface data structures and structural features. The gray-scale document image can be treated as a surface defined over a two-dimensional space by regarding intensity values associated with pixels as height. This method is based on a simple model that assumes a closed boundary of document components can be approximated as a series of horizontal (parallel to the image plane) line segments and can be extracted by linking surface components with steep gradients based on configurations of intersections of horizontal planes and surface components. Furthermore, the gray-scale image can be converted into a binary image based on extracted boundaries so that any recognition system can accept output of the proposed algorithm as input. The performance of the proposed algorithm is compared with some binarization algorithms based on global and local thresholding of intensity values and is shown to be effective for improving recognition accuracy for very poor quality data.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002366 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002366 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26B
   |texte=   Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024