Chinese text distinction and font identification by recognizing most frequently used characters
Identifieur interne : 001C50 ( Main/Merge ); précédent : 001C49; suivant : 001C51Chinese text distinction and font identification by recognizing most frequently used characters
Auteurs : Chi-Fang Lin [Taïwan, République populaire de Chine] ; Yu-Fan Fang [République populaire de Chine] ; Yau-Tarng Juang [République populaire de Chine]Source :
- Image and Vision Computing [ 0262-8856 ] ; 2000.
English descriptors
- KwdEn :
Abstract
In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.
Url:
DOI: 10.1016/S0262-8856(00)00082-2
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000057
- to stream Istex, to step Curation: 000056
- to stream Istex, to step Checkpoint: 001211
Links to Exploration step
ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
</author>
<author><name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
</author>
<author><name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0262-8856(00)00082-2</idno>
<idno type="url">https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000057</idno>
<idno type="wicri:Area/Istex/Curation">000056</idno>
<idno type="wicri:Area/Istex/Checkpoint">001211</idno>
<idno type="wicri:doubleKey">0262-8856:2001:Lin C:chinese:text:distinction</idno>
<idno type="wicri:Area/Main/Merge">001C50</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
<affiliation wicri:level="1"><country wicri:rule="url">Taïwan</country>
</affiliation>
<affiliation wicri:level="1"><country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<affiliation wicri:level="1"><country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
<affiliation wicri:level="1"><country xml:lang="fr" wicri:curation="lc">République populaire de Chine</country>
<wicri:regionArea>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan</wicri:regionArea>
<wicri:noRegion>Taiwan</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="329">329</biblScope>
<biblScope unit="page" to="338">338</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<idno type="DOI">10.1016/S0262-8856(00)00082-2</idno>
<idno type="PII">S0262-8856(00)00082-2</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Feature extraction</term>
<term>Font identification</term>
<term>Template matching</term>
<term>Text distinction</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001C50 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001C50 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3 |texte= Chinese text distinction and font identification by recognizing most frequently used characters }}
This area was generated with Dilib version V0.6.32. |