Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Chinese Handwritten Character Segmentation in Form Documents

Identifieur interne : 002001 ( Main/Exploration ); précédent : 002000; suivant : 002002

Chinese Handwritten Character Segmentation in Form Documents

Auteurs : Jiun-Lin Chen [Taïwan] ; Chi-Hong Wu [Taïwan] ; Hsi-Jian Lee [Taïwan]

Source :

RBID : ISTEX:F710E043F28CD45F3198B3485EF75D8F3CD4FA1A

Descripteurs français

English descriptors

Abstract

Abstract: This paper presents a pojection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves strike connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types; mark, half-word, single-word, and two word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinese characters are tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.

Url:
DOI: 10.1007/3-540-48172-9_28


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Chinese Handwritten Character Segmentation in Form Documents</title>
<author>
<name sortKey="Chen, Jiun Lin" sort="Chen, Jiun Lin" uniqKey="Chen J" first="Jiun-Lin" last="Chen">Jiun-Lin Chen</name>
</author>
<author>
<name sortKey="Wu, Chi Hong" sort="Wu, Chi Hong" uniqKey="Wu C" first="Chi-Hong" last="Wu">Chi-Hong Wu</name>
</author>
<author>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F710E043F28CD45F3198B3485EF75D8F3CD4FA1A</idno>
<date when="1999" year="1999">1999</date>
<idno type="doi">10.1007/3-540-48172-9_28</idno>
<idno type="url">https://api.istex.fr/document/F710E043F28CD45F3198B3485EF75D8F3CD4FA1A/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000090</idno>
<idno type="wicri:Area/Istex/Curation">000088</idno>
<idno type="wicri:Area/Istex/Checkpoint">001552</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Chen J:chinese:handwritten:character</idno>
<idno type="wicri:Area/Main/Merge">002110</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:99-0486879</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000802</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B92</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000795</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Chen J:chinese:handwritten:character</idno>
<idno type="wicri:Area/Main/Merge">002176</idno>
<idno type="wicri:Area/Main/Curation">002001</idno>
<idno type="wicri:Area/Main/Exploration">002001</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Chinese Handwritten Character Segmentation in Form Documents</title>
<author>
<name sortKey="Chen, Jiun Lin" sort="Chen, Jiun Lin" uniqKey="Chen J" first="Jiun-Lin" last="Chen">Jiun-Lin Chen</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Hsinchu</wicri:regionArea>
<wicri:noRegion>Hsinchu</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Wu, Chi Hong" sort="Wu, Chi Hong" uniqKey="Wu C" first="Chi-Hong" last="Wu">Chi-Hong Wu</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Hsinchu</wicri:regionArea>
<wicri:noRegion>Hsinchu</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Taïwan</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, 30050, Hsinchu</wicri:regionArea>
<wicri:noRegion>Hsinchu</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>1999</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">F710E043F28CD45F3198B3485EF75D8F3CD4FA1A</idno>
<idno type="DOI">10.1007/3-540-48172-9_28</idno>
<idno type="ChapterID">28</idno>
<idno type="ChapterID">Chap28</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Chinese</term>
<term>Document analysis</term>
<term>Document image processing</term>
<term>Handwriting recognition</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse documentaire</term>
<term>Chinois</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance écriture</term>
<term>Segmentation</term>
<term>Traitement image document</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: This paper presents a pojection based method for segmenting handwritten Chinese characters in form documents with known structures. In the preprocessing phase, a noise removal method is proposed that preserves strike connections and character edge points. In the character segmentation phase, the projection profile analysis method is used to segment a text line image into projection blocks. In addition, projection blocks are classified into one of four types; mark, half-word, single-word, and two word. Large blocks are then split and small blocks are merged. In addition, an OCR system is adopted to eliminate errors resulting from the inappropriate merging of Chinese numerical characters with other characters. As for 1319 Chinese characters are tested during our experiments, the correct segmentation rates of 92.34% and 91.76% are obtained with and without the OCR module.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Taïwan</li>
</country>
</list>
<tree>
<country name="Taïwan">
<noRegion>
<name sortKey="Chen, Jiun Lin" sort="Chen, Jiun Lin" uniqKey="Chen J" first="Jiun-Lin" last="Chen">Jiun-Lin Chen</name>
</noRegion>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
<name sortKey="Wu, Chi Hong" sort="Wu, Chi Hong" uniqKey="Wu C" first="Chi-Hong" last="Wu">Chi-Hong Wu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002001 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002001 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:F710E043F28CD45F3198B3485EF75D8F3CD4FA1A
   |texte=   Chinese Handwritten Character Segmentation in Form Documents
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024