Chinese Handwritten character segmentation in form documents
Identifieur interne : 002176 ( Main/Merge ); précédent : 002175; suivant : 002177Chinese Handwritten character segmentation in form documents
Auteurs : J.-L. Chen [Taïwan] ; C.-H. Wu [Taïwan] ; H.-J. Lee [Taïwan]Source :
- Lecture notes in computer science [ 0302-9743 ] ; 1999.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
This paper presents a projection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves stroke connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types: mark, half-word, single-word, and two-word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinesecharactersare tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000802
- to stream PascalFrancis, to step Curation: 000B92
- to stream PascalFrancis, to step Checkpoint: 000795
Links to Exploration step
Pascal:99-0486879Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Chinese Handwritten character segmentation in form documents</title>
<author><name sortKey="Chen, J L" sort="Chen, J L" uniqKey="Chen J" first="J.-L." last="Chen">J.-L. Chen</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wu, C H" sort="Wu, C H" uniqKey="Wu C" first="C.-H." last="Wu">C.-H. Wu</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lee, H J" sort="Lee, H J" uniqKey="Lee H" first="H.-J." last="Lee">H.-J. Lee</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">99-0486879</idno>
<date when="1999">1999</date>
<idno type="stanalyst">PASCAL 99-0486879 INIST</idno>
<idno type="RBID">Pascal:99-0486879</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000802</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B92</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000795</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Chen J:chinese:handwritten:character</idno>
<idno type="wicri:Area/Main/Merge">002176</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Chinese Handwritten character segmentation in form documents</title>
<author><name sortKey="Chen, J L" sort="Chen, J L" uniqKey="Chen J" first="J.-L." last="Chen">J.-L. Chen</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wu, C H" sort="Wu, C H" uniqKey="Wu C" first="C.-H." last="Wu">C.-H. Wu</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lee, H J" sort="Lee, H J" uniqKey="Lee H" first="H.-J." last="Lee">H.-J. Lee</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Information Engineering, National Chiao Tung University</s1>
<s2>Hsinchu, 30050</s2>
<s3>TWN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Taïwan</country>
<wicri:noRegion>Hsinchu, 30050</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Chinese</term>
<term>Document analysis</term>
<term>Document image processing</term>
<term>Handwriting recognition</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Analyse documentaire</term>
<term>Traitement image document</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance écriture</term>
<term>Segmentation</term>
<term>Chinois</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a projection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves stroke connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types: mark, half-word, single-word, and two-word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinesecharactersare tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.</div>
</front>
</TEI>
<affiliations><list><country><li>Taïwan</li>
</country>
</list>
<tree><country name="Taïwan"><noRegion><name sortKey="Chen, J L" sort="Chen, J L" uniqKey="Chen J" first="J.-L." last="Chen">J.-L. Chen</name>
</noRegion>
<name sortKey="Lee, H J" sort="Lee, H J" uniqKey="Lee H" first="H.-J." last="Lee">H.-J. Lee</name>
<name sortKey="Wu, C H" sort="Wu, C H" uniqKey="Wu C" first="C.-H." last="Wu">C.-H. Wu</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002176 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002176 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= Pascal:99-0486879 |texte= Chinese Handwritten character segmentation in form documents }}
This area was generated with Dilib version V0.6.32. |