Chinese Handwritten Character Segmentation in Form Documents
Identifieur interne : 002110 ( Main/Merge ); précédent : 002109; suivant : 002111Chinese Handwritten Character Segmentation in Form Documents
Auteurs : Jiun-Lin Chen [Taïwan] ; Chi-Hong Wu [Taïwan] ; Hsi-Jian Lee [Taïwan]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 1999.
Abstract
Abstract: This paper presents a pojection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves strike connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types; mark, half-word, single-word, and two word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinese characters are tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.
Url:
DOI: 10.1007/3-540-48172-9_28
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000090
- to stream Istex, to step Curation: 000088
- to stream Istex, to step Checkpoint: 001552
Links to Exploration step
ISTEX:F710E043F28CD45F3198B3485EF75D8F3CD4FA1ALe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Chinese Handwritten Character Segmentation in Form Documents</title>
<author><name sortKey="Chen, Jiun Lin" sort="Chen, Jiun Lin" uniqKey="Chen J" first="Jiun-Lin" last="Chen">Jiun-Lin Chen</name>
</author>
<author><name sortKey="Wu, Chi Hong" sort="Wu, Chi Hong" uniqKey="Wu C" first="Chi-Hong" last="Wu">Chi-Hong Wu</name>
</author>
<author><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F710E043F28CD45F3198B3485EF75D8F3CD4FA1A</idno>
<date when="1999" year="1999">1999</date>
<idno type="doi">10.1007/3-540-48172-9_28</idno>
<idno type="url">https://api.istex.fr/document/F710E043F28CD45F3198B3485EF75D8F3CD4FA1A/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000090</idno>
<idno type="wicri:Area/Istex/Curation">000088</idno>
<idno type="wicri:Area/Istex/Checkpoint">001552</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Chen J:chinese:handwritten:character</idno>
<idno type="wicri:Area/Main/Merge">002110</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Chinese Handwritten Character Segmentation in Form Documents</title>
<author><name sortKey="Chen, Jiun Lin" sort="Chen, Jiun Lin" uniqKey="Chen J" first="Jiun-Lin" last="Chen">Jiun-Lin Chen</name>
<affiliation wicri:level="1"><country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Hsinchu</wicri:regionArea>
<wicri:noRegion>Hsinchu</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Wu, Chi Hong" sort="Wu, Chi Hong" uniqKey="Wu C" first="Chi-Hong" last="Wu">Chi-Hong Wu</name>
<affiliation wicri:level="1"><country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Hsinchu</wicri:regionArea>
<wicri:noRegion>Hsinchu</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
<affiliation wicri:level="1"><country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Hsinchu</wicri:regionArea>
<wicri:noRegion>Hsinchu</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>1999</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">F710E043F28CD45F3198B3485EF75D8F3CD4FA1A</idno>
<idno type="DOI">10.1007/3-540-48172-9_28</idno>
<idno type="ChapterID">28</idno>
<idno type="ChapterID">Chap28</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper presents a pojection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves strike connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types; mark, half-word, single-word, and two word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinese characters are tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002110 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002110 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:F710E043F28CD45F3198B3485EF75D8F3CD4FA1A |texte= Chinese Handwritten Character Segmentation in Form Documents }}
This area was generated with Dilib version V0.6.32. |