Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Chinese Handwritten character segmentation in form documents

Identifieur interne : 002176 ( Main/Merge ); précédent : 002175; suivant : 002177

Chinese Handwritten character segmentation in form documents

Auteurs : J.-L. Chen [Taïwan] ; C.-H. Wu [Taïwan] ; H.-J. Lee [Taïwan]

Source :

RBID : Pascal:99-0486879

Descripteurs français

English descriptors

Abstract

This paper presents a projection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves stroke connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types: mark, half-word, single-word, and two-word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinesecharactersare tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.

Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:99-0486879

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Chinese Handwritten character segmentation in form documents</title>
<author>
<name sortKey="Chen, J L" sort="Chen, J L" uniqKey="Chen J" first="J.-L." last="Chen">J.-L. Chen</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wu, C H" sort="Wu, C H" uniqKey="Wu C" first="C.-H." last="Wu">C.-H. Wu</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Lee, H J" sort="Lee, H J" uniqKey="Lee H" first="H.-J." last="Lee">H.-J. Lee</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">99-0486879</idno>
<date when="1999">1999</date>
<idno type="stanalyst">PASCAL 99-0486879 INIST</idno>
<idno type="RBID">Pascal:99-0486879</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000802</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B92</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000795</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Chen J:chinese:handwritten:character</idno>
<idno type="wicri:Area/Main/Merge">002176</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Chinese Handwritten character segmentation in form documents</title>
<author>
<name sortKey="Chen, J L" sort="Chen, J L" uniqKey="Chen J" first="J.-L." last="Chen">J.-L. Chen</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wu, C H" sort="Wu, C H" uniqKey="Wu C" first="C.-H." last="Wu">C.-H. Wu</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Lee, H J" sort="Lee, H J" uniqKey="Lee H" first="H.-J." last="Lee">H.-J. Lee</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint>
<date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Chinese</term>
<term>Document analysis</term>
<term>Document image processing</term>
<term>Handwriting recognition</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse documentaire</term>
<term>Traitement image document</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance écriture</term>
<term>Segmentation</term>
<term>Chinois</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents a projection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves stroke connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types: mark, half-word, single-word, and two-word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinesecharactersare tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Taïwan</li>
</country>
</list>
<tree>
<country name="Taïwan">
<noRegion>
<name sortKey="Chen, J L" sort="Chen, J L" uniqKey="Chen J" first="J.-L." last="Chen">J.-L. Chen</name>
</noRegion>
<name sortKey="Lee, H J" sort="Lee, H J" uniqKey="Lee H" first="H.-J." last="Lee">H.-J. Lee</name>
<name sortKey="Wu, C H" sort="Wu, C H" uniqKey="Wu C" first="C.-H." last="Wu">C.-H. Wu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002176 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002176 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     Pascal:99-0486879
   |texte=   Chinese Handwritten character segmentation in form documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024