Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures
Identifieur interne : 002366 ( Main/Merge ); précédent : 002365; suivant : 002367Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures
Auteurs : Hirobumi Nishida [Japon]Source :
- Graphical Models and Image Processing [ 1077-3169 ] ; 1997.
Abstract
Recognition of documents of poor image quality is a challenging and important problem from a practical point of view. In traditional approaches, features such as center lines of strokes or contours are extracted from binary images obtained by thresholding the gray-scale intensity images. Wang and Pavlidis (IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1993, 1053–1067) have recently pointed out that effective features for recognition should be extracted directly from original gray-scale intensity images in order to avoid a significant amount of information loss caused by binarization. In this paper, a novel method is presented for extracting closed boundaries of document components such as characters and symbols directly from gray-scale document images, based on the surface data structures and structural features. The gray-scale document image can be treated as a surface defined over a two-dimensional space by regarding intensity values associated with pixels as height. This method is based on a simple model that assumes a closed boundary of document components can be approximated as a series of horizontal (parallel to the image plane) line segments and can be extracted by linking surface components with steep gradients based on configurations of intersections of horizontal planes and surface components. Furthermore, the gray-scale image can be converted into a binary image based on extracted boundaries so that any recognition system can accept output of the proposed algorithm as input. The performance of the proposed algorithm is compared with some binarization algorithms based on global and local thresholding of intensity values and is shown to be effective for improving recognition accuracy for very poor quality data.
Url:
DOI: 10.1006/gmip.1997.0452
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000382
- to stream Istex, to step Curation: 000377
- to stream Istex, to step Checkpoint: 001753
Links to Exploration step
ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26BLe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures</title>
<author><name sortKey="Nishida, Hirobumi" sort="Nishida, Hirobumi" uniqKey="Nishida H" first="Hirobumi" last="Nishida">Hirobumi Nishida</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26B</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1006/gmip.1997.0452</idno>
<idno type="url">https://api.istex.fr/document/5BB038E1D76DFE40A97E5310AAB865174355D26B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000382</idno>
<idno type="wicri:Area/Istex/Curation">000377</idno>
<idno type="wicri:Area/Istex/Checkpoint">001753</idno>
<idno type="wicri:doubleKey">1077-3169:1998:Nishida H:boundary:extraction:from</idno>
<idno type="wicri:Area/Main/Merge">002366</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures</title>
<author><name sortKey="Nishida, Hirobumi" sort="Nishida, Hirobumi" uniqKey="Nishida H" first="Hirobumi" last="Nishida">Hirobumi Nishida</name>
<affiliation wicri:level="1"><country xml:lang="fr">Japon</country>
<wicri:regionArea>Software Research Center, Ricoh Co., Ltd. 1-1-17 Koishikawa, Bunkyo-ku, Tokyo, 112</wicri:regionArea>
<wicri:noRegion>112</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Graphical Models and Image Processing</title>
<title level="j" type="abbrev">YGMIP</title>
<idno type="ISSN">1077-3169</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">60</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="35">35</biblScope>
<biblScope unit="page" to="45">45</biblScope>
</imprint>
<idno type="ISSN">1077-3169</idno>
</series>
<idno type="istex">5BB038E1D76DFE40A97E5310AAB865174355D26B</idno>
<idno type="DOI">10.1006/gmip.1997.0452</idno>
<idno type="PII">S1077-3169(97)90452-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1077-3169</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Recognition of documents of poor image quality is a challenging and important problem from a practical point of view. In traditional approaches, features such as center lines of strokes or contours are extracted from binary images obtained by thresholding the gray-scale intensity images. Wang and Pavlidis (IEEE Trans. Pattern Anal. Machine Intell. 15(10), 1993, 1053–1067) have recently pointed out that effective features for recognition should be extracted directly from original gray-scale intensity images in order to avoid a significant amount of information loss caused by binarization. In this paper, a novel method is presented for extracting closed boundaries of document components such as characters and symbols directly from gray-scale document images, based on the surface data structures and structural features. The gray-scale document image can be treated as a surface defined over a two-dimensional space by regarding intensity values associated with pixels as height. This method is based on a simple model that assumes a closed boundary of document components can be approximated as a series of horizontal (parallel to the image plane) line segments and can be extracted by linking surface components with steep gradients based on configurations of intersections of horizontal planes and surface components. Furthermore, the gray-scale image can be converted into a binary image based on extracted boundaries so that any recognition system can accept output of the proposed algorithm as input. The performance of the proposed algorithm is compared with some binarization algorithms based on global and local thresholding of intensity values and is shown to be effective for improving recognition accuracy for very poor quality data.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002366 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002366 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:5BB038E1D76DFE40A97E5310AAB865174355D26B |texte= Boundary Extraction from Gray-Scale Document Images Based on Surface Data Structures }}
![]() | This area was generated with Dilib version V0.6.32. | ![]() |