OcrV1, Istex, Corpus, bibRecord, 000057

Chinese text distinction and font identification by recognizing most frequently used characters

Identifieur interne : 000057 ( Istex/Corpus ); précédent : 000056; suivant : 000058

Chinese text distinction and font identification by recognizing most frequently used characters

Auteurs : Chi-Fang Lin ; Yu-Fan Fang ; Yau-Tarng Juang

Source :

Image and Vision Computing [ 0262-8856 ] ; 2000.

RBID : ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3

English descriptors

KwdEn :
- Character recognition, Feature extraction, Font identification, Template matching, Text distinction.

Abstract

In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.

Url:

https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf

DOI: 10.1016/S0262-8856(00)00082-2

Links to Exploration step

ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
<affiliation><mods:affiliation>E-mail: cscflin@cs.yzu.edu.tw</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan, ROC</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<affiliation><mods:affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
<affiliation><mods:affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0262-8856(00)00082-2</idno>
<idno type="url">https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000057</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><name sortKey="Lin, Chi Fang" sort="Lin, Chi Fang" uniqKey="Lin C" first="Chi-Fang" last="Lin">Chi-Fang Lin</name>
<affiliation><mods:affiliation>E-mail: cscflin@cs.yzu.edu.tw</mods:affiliation>
</affiliation>
<affiliation><mods:affiliation>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan, ROC</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Fang, Yu Fan" sort="Fang, Yu Fan" uniqKey="Fang Y" first="Yu-Fan" last="Fang">Yu-Fan Fang</name>
<affiliation><mods:affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</mods:affiliation>
</affiliation>
</author>
<author><name sortKey="Juang, Yau Tarng" sort="Juang, Yau Tarng" uniqKey="Juang Y" first="Yau-Tarng" last="Juang">Yau-Tarng Juang</name>
<affiliation><mods:affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="ISSN">0262-8856</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="329">329</biblScope>
<biblScope unit="page" to="338">338</biblScope>
</imprint>
<idno type="ISSN">0262-8856</idno>
</series>
<idno type="istex">4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<idno type="DOI">10.1016/S0262-8856(00)00082-2</idno>
<idno type="PII">S0262-8856(00)00082-2</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0262-8856</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Feature extraction</term>
<term>Font identification</term>
<term>Template matching</term>
<term>Text distinction</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</div>
</front>
</TEI>
<istex><corpusName>elsevier</corpusName>
<author><json:item><name>Chi-Fang Lin</name>
<affiliations><json:string>E-mail: cscflin@cs.yzu.edu.tw</json:string>
<json:string>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan, ROC</json:string>
</affiliations>
</json:item>
<json:item><name>Yu-Fan Fang</name>
<affiliations><json:string>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</json:string>
</affiliations>
</json:item>
<json:item><name>Yau-Tarng Juang</name>
<affiliations><json:string>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</json:string>
</affiliations>
</json:item>
</author>
<subject><json:item><lang><json:string>eng</json:string>
</lang>
<value>Feature extraction</value>
</json:item>
<json:item><lang><json:string>eng</json:string>
</lang>
<value>Template matching</value>
</json:item>
<json:item><lang><json:string>eng</json:string>
</lang>
<value>Character recognition</value>
</json:item>
<json:item><lang><json:string>eng</json:string>
</lang>
<value>Font identification</value>
</json:item>
<json:item><lang><json:string>eng</json:string>
</lang>
<value>Text distinction</value>
</json:item>
</subject>
<language><json:string>eng</json:string>
</language>
<abstract>In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</abstract>
<qualityIndicators><score>7.93</score>
<pdfVersion>1.2</pdfVersion>
<pdfPageSize>595 x 794 pts</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>5</keywordCount>
<abstractCharCount>1666</abstractCharCount>
<pdfWordCount>4930</pdfWordCount>
<pdfCharCount>28717</pdfCharCount>
<pdfPageCount>10</pdfPageCount>
<abstractWordCount>254</abstractWordCount>
</qualityIndicators>
<title>Chinese text distinction and font identification by recognizing most frequently used characters</title>
<pii><json:string>S0262-8856(00)00082-2</json:string>
</pii>
<genre><json:string>research-article</json:string>
</genre>
<host><volume>19</volume>
<pii><json:string>S0262-8856(00)X0074-1</json:string>
</pii>
<pages><last>338</last>
<first>329</first>
</pages>
<issn><json:string>0262-8856</json:string>
</issn>
<issue>6</issue>
<genre><json:string>Journal</json:string>
</genre>
<language><json:string>unknown</json:string>
</language>
<title>Image and Vision Computing</title>
<publicationDate>2001</publicationDate>
</host>
<categories><wos><json:string>COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE</json:string>
<json:string>COMPUTER SCIENCE, SOFTWARE ENGINEERING</json:string>
<json:string>COMPUTER SCIENCE, THEORY & METHODS</json:string>
<json:string>ENGINEERING, ELECTRICAL & ELECTRONIC</json:string>
<json:string>OPTICS</json:string>
</wos>
</categories>
<publicationDate>2000</publicationDate>
<copyrightDate>2001</copyrightDate>
<doi><json:string>10.1016/S0262-8856(00)00082-2</json:string>
</doi>
<id>4A8175B424D8D0E33BD442A591B43A5C1A0428A3</id>
<fulltext><json:item><original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/pdf</uri>
</json:item>
<json:item><original>true</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/txt</uri>
</json:item>
<json:item><original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/fulltext/tei"><teiHeader><fileDesc><titleStmt><title level="a" type="main" xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
</titleStmt>
<publicationStmt><authority>ISTEX</authority>
<publisher>ELSEVIER</publisher>
<availability><p>ELSEVIER</p>
</availability>
<date>2001</date>
</publicationStmt>
<notesStmt><note type="content">Fig. 1: The flowchart of the proposed system.</note>
<note type="content">Fig. 2: The result of character segmentation. (a) The original image. (b) The segmentation result.</note>
<note type="content">Fig. 3: An illustration for the method of projection profile coding.</note>
<note type="content">Fig. 4: The thinning result. (a) The original character. (b) The skeleton template after the removal of thin strokes.</note>
<note type="content">Fig. 5: Test document I.</note>
<note type="content">Fig. 6: Test document J.</note>
<note type="content">Fig. 7: The chart of the recognition rate and the number of MFU characters selected.</note>
<note type="content">Fig. 8: The test image used for measuring the performance of our method.</note>
<note type="content">Fig. 9: The top-40 MFU characters detected for the test image shown in Fig. 6.</note>
<note type="content">Table 1: The frequency table of the top-50 MFU Chinese characters</note>
<note type="content">Table 2: The list of the selected top-40 MFU characters</note>
<note type="content">Table 3: The summarized results for test images A–H</note>
<note type="content">Table 4: The summarized results for the five sets of test images</note>
</notesStmt>
<sourceDesc><biblStruct type="inbook"><analytic><title level="a" type="main" xml:lang="en">Chinese text distinction and font identification by recognizing most frequently used characters</title>
<author><persName><forename type="first">Chi-Fang</forename>
<surname>Lin</surname>
</persName>
<email>cscflin@cs.yzu.edu.tw</email>
<note type="correspondence"><p>Corresponding author. Tel.: +886-3-463-8800; fax: +886-3-463-8850</p>
</note>
<affiliation>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan, ROC</affiliation>
</author>
<author><persName><forename type="first">Yu-Fan</forename>
<surname>Fang</surname>
</persName>
<affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</affiliation>
</author>
<author><persName><forename type="first">Yau-Tarng</forename>
<surname>Juang</surname>
</persName>
<affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</affiliation>
</author>
</analytic>
<monogr><title level="j">Image and Vision Computing</title>
<title level="j" type="abbrev">IMAVIS</title>
<idno type="pISSN">0262-8856</idno>
<idno type="PII">S0262-8856(00)X0074-1</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="2000"></date>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">6</biblScope>
<biblScope unit="page" from="329">329</biblScope>
<biblScope unit="page" to="338">338</biblScope>
</imprint>
</monogr>
<idno type="istex">4A8175B424D8D0E33BD442A591B43A5C1A0428A3</idno>
<idno type="DOI">10.1016/S0262-8856(00)00082-2</idno>
<idno type="PII">S0262-8856(00)00082-2</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><creation><date>2001</date>
</creation>
<langUsage><language ident="en">en</language>
</langUsage>
<abstract xml:lang="en"><p>In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</p>
</abstract>
<textClass xml:lang="en"><keywords scheme="keyword"><list><head>Keywords</head>
<item><term>Feature extraction</term>
</item>
<item><term>Template matching</term>
</item>
<item><term>Character recognition</term>
</item>
<item><term>Font identification</term>
</item>
<item><term>Text distinction</term>
</item>
</list>
</keywords>
</textClass>
</profileDesc>
<revisionDesc><change when="2000-09-26">Registration</change>
<change when="2000-08-09">Modified</change>
<change when="2000">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
</fulltext>
<metadata><istex:metadataXml wicri:clean="Elsevier, elements deleted: ce:floats; body; tail"><istex:xmlDeclaration>version="1.0" encoding="utf-8"</istex:xmlDeclaration>
<istex:docType PUBLIC="-//ES//DTD journal article DTD version 4.5.2//EN//XML" URI="art452.dtd" name="istex:docType"><istex:entity SYSTEM="fx1" NDATA="IMAGE" name="fx1"></istex:entity>
<istex:entity SYSTEM="fx26" NDATA="IMAGE" name="fx26"></istex:entity>
<istex:entity SYSTEM="fx2" NDATA="IMAGE" name="fx2"></istex:entity>
<istex:entity SYSTEM="fx27" NDATA="IMAGE" name="fx27"></istex:entity>
<istex:entity SYSTEM="fx3" NDATA="IMAGE" name="fx3"></istex:entity>
<istex:entity SYSTEM="fx28" NDATA="IMAGE" name="fx28"></istex:entity>
<istex:entity SYSTEM="fx4" NDATA="IMAGE" name="fx4"></istex:entity>
<istex:entity SYSTEM="fx29" NDATA="IMAGE" name="fx29"></istex:entity>
<istex:entity SYSTEM="fx5" NDATA="IMAGE" name="fx5"></istex:entity>
<istex:entity SYSTEM="fx30" NDATA="IMAGE" name="fx30"></istex:entity>
<istex:entity SYSTEM="fx6" NDATA="IMAGE" name="fx6"></istex:entity>
<istex:entity SYSTEM="fx31" NDATA="IMAGE" name="fx31"></istex:entity>
<istex:entity SYSTEM="fx7" NDATA="IMAGE" name="fx7"></istex:entity>
<istex:entity SYSTEM="fx32" NDATA="IMAGE" name="fx32"></istex:entity>
<istex:entity SYSTEM="fx8" NDATA="IMAGE" name="fx8"></istex:entity>
<istex:entity SYSTEM="fx33" NDATA="IMAGE" name="fx33"></istex:entity>
<istex:entity SYSTEM="fx9" NDATA="IMAGE" name="fx9"></istex:entity>
<istex:entity SYSTEM="fx34" NDATA="IMAGE" name="fx34"></istex:entity>
<istex:entity SYSTEM="fx10" NDATA="IMAGE" name="fx10"></istex:entity>
<istex:entity SYSTEM="fx35" NDATA="IMAGE" name="fx35"></istex:entity>
<istex:entity SYSTEM="fx11" NDATA="IMAGE" name="fx11"></istex:entity>
<istex:entity SYSTEM="fx36" NDATA="IMAGE" name="fx36"></istex:entity>
<istex:entity SYSTEM="fx12" NDATA="IMAGE" name="fx12"></istex:entity>
<istex:entity SYSTEM="fx37" NDATA="IMAGE" name="fx37"></istex:entity>
<istex:entity SYSTEM="fx13" NDATA="IMAGE" name="fx13"></istex:entity>
<istex:entity SYSTEM="fx38" NDATA="IMAGE" name="fx38"></istex:entity>
<istex:entity SYSTEM="fx14" NDATA="IMAGE" name="fx14"></istex:entity>
<istex:entity SYSTEM="fx39" NDATA="IMAGE" name="fx39"></istex:entity>
<istex:entity SYSTEM="fx15" NDATA="IMAGE" name="fx15"></istex:entity>
<istex:entity SYSTEM="fx40" NDATA="IMAGE" name="fx40"></istex:entity>
<istex:entity SYSTEM="fx16" NDATA="IMAGE" name="fx16"></istex:entity>
<istex:entity SYSTEM="fx41" NDATA="IMAGE" name="fx41"></istex:entity>
<istex:entity SYSTEM="fx17" NDATA="IMAGE" name="fx17"></istex:entity>
<istex:entity SYSTEM="fx42" NDATA="IMAGE" name="fx42"></istex:entity>
<istex:entity SYSTEM="fx18" NDATA="IMAGE" name="fx18"></istex:entity>
<istex:entity SYSTEM="fx43" NDATA="IMAGE" name="fx43"></istex:entity>
<istex:entity SYSTEM="fx19" NDATA="IMAGE" name="fx19"></istex:entity>
<istex:entity SYSTEM="fx44" NDATA="IMAGE" name="fx44"></istex:entity>
<istex:entity SYSTEM="fx20" NDATA="IMAGE" name="fx20"></istex:entity>
<istex:entity SYSTEM="fx45" NDATA="IMAGE" name="fx45"></istex:entity>
<istex:entity SYSTEM="fx21" NDATA="IMAGE" name="fx21"></istex:entity>
<istex:entity SYSTEM="fx46" NDATA="IMAGE" name="fx46"></istex:entity>
<istex:entity SYSTEM="fx22" NDATA="IMAGE" name="fx22"></istex:entity>
<istex:entity SYSTEM="fx47" NDATA="IMAGE" name="fx47"></istex:entity>
<istex:entity SYSTEM="fx23" NDATA="IMAGE" name="fx23"></istex:entity>
<istex:entity SYSTEM="fx48" NDATA="IMAGE" name="fx48"></istex:entity>
<istex:entity SYSTEM="fx24" NDATA="IMAGE" name="fx24"></istex:entity>
<istex:entity SYSTEM="fx49" NDATA="IMAGE" name="fx49"></istex:entity>
<istex:entity SYSTEM="fx25" NDATA="IMAGE" name="fx25"></istex:entity>
<istex:entity SYSTEM="fx50" NDATA="IMAGE" name="fx50"></istex:entity>
<istex:entity SYSTEM="gr1" NDATA="IMAGE" name="gr1"></istex:entity>
<istex:entity SYSTEM="gr2" NDATA="IMAGE" name="gr2"></istex:entity>
<istex:entity SYSTEM="gr3" NDATA="IMAGE" name="gr3"></istex:entity>
<istex:entity SYSTEM="gr4" NDATA="IMAGE" name="gr4"></istex:entity>
<istex:entity SYSTEM="gr5" NDATA="IMAGE" name="gr5"></istex:entity>
<istex:entity SYSTEM="gr6" NDATA="IMAGE" name="gr6"></istex:entity>
<istex:entity SYSTEM="gr7" NDATA="IMAGE" name="gr7"></istex:entity>
<istex:entity SYSTEM="gr8" NDATA="IMAGE" name="gr8"></istex:entity>
<istex:entity SYSTEM="gr9" NDATA="IMAGE" name="gr9"></istex:entity>
</istex:docType>
<istex:document><converted-article version="4.5.2" docsubtype="fla" xml:lang="en"><item-info><jid>IMAVIS</jid>
<aid>1766</aid>
<ce:pii>S0262-8856(00)00082-2</ce:pii>
<ce:doi>10.1016/S0262-8856(00)00082-2</ce:doi>
<ce:copyright type="full-transfer" year="2001">Elsevier Science B.V.</ce:copyright>
</item-info>
<head><ce:title>Chinese text distinction and font identification by recognizing most frequently used characters</ce:title>
<ce:author-group><ce:author><ce:given-name>Chi-Fang</ce:given-name>
<ce:surname>Lin</ce:surname>
<ce:cross-ref refid="AFF1"><ce:sup>a</ce:sup>
</ce:cross-ref>
<ce:cross-ref refid="CORR1">*</ce:cross-ref>
<ce:e-address>cscflin@cs.yzu.edu.tw</ce:e-address>
</ce:author>
<ce:author><ce:given-name>Yu-Fan</ce:given-name>
<ce:surname>Fang</ce:surname>
<ce:cross-ref refid="AFF2"><ce:sup>b</ce:sup>
</ce:cross-ref>
</ce:author>
<ce:author><ce:given-name>Yau-Tarng</ce:given-name>
<ce:surname>Juang</ce:surname>
<ce:cross-ref refid="AFF2"><ce:sup>b</ce:sup>
</ce:cross-ref>
</ce:author>
<ce:affiliation id="AFF1"><ce:label>a</ce:label>
<ce:textfn>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan, ROC</ce:textfn>
</ce:affiliation>
<ce:affiliation id="AFF2"><ce:label>b</ce:label>
<ce:textfn>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</ce:textfn>
</ce:affiliation>
<ce:correspondence id="CORR1"><ce:label>*</ce:label>
<ce:text>Corresponding author. Tel.: +886-3-463-8800; fax: +886-3-463-8850</ce:text>
</ce:correspondence>
</ce:author-group>
<ce:date-received day="17" month="2" year="1999"></ce:date-received>
<ce:date-revised day="9" month="8" year="2000"></ce:date-revised>
<ce:date-accepted day="26" month="9" year="2000"></ce:date-accepted>
<ce:abstract><ce:section-title>Abstract</ce:section-title>
<ce:abstract-sec><ce:simple-para>In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time.</ce:simple-para>
<ce:simple-para>The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</ce:simple-para>
</ce:abstract-sec>
</ce:abstract>
<ce:keywords class="keyword" xml:lang="en"><ce:section-title>Keywords</ce:section-title>
<ce:keyword><ce:text>Feature extraction</ce:text>
</ce:keyword>
<ce:keyword><ce:text>Template matching</ce:text>
</ce:keyword>
<ce:keyword><ce:text>Character recognition</ce:text>
</ce:keyword>
<ce:keyword><ce:text>Font identification</ce:text>
</ce:keyword>
<ce:keyword><ce:text>Text distinction</ce:text>
</ce:keyword>
</ce:keywords>
</head>
</converted-article>
</istex:document>
</istex:metadataXml>
<mods version="3.6"><titleInfo lang="en"><title>Chinese text distinction and font identification by recognizing most frequently used characters</title>
</titleInfo>
<titleInfo type="alternative" lang="en" contentType="CDATA"><title>Chinese text distinction and font identification by recognizing most frequently used characters</title>
</titleInfo>
<name type="personal"><namePart type="given">Chi-Fang</namePart>
<namePart type="family">Lin</namePart>
<affiliation>E-mail: cscflin@cs.yzu.edu.tw</affiliation>
<affiliation>Department of Computer Engineering and Science, Yuan-Ze University, Chung-Li 320, Taiwan, ROC</affiliation>
<description>Corresponding author. Tel.: +886-3-463-8800; fax: +886-3-463-8850</description>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Yu-Fan</namePart>
<namePart type="family">Fang</namePart>
<affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<name type="personal"><namePart type="given">Yau-Tarng</namePart>
<namePart type="family">Juang</namePart>
<affiliation>Institute of Computer Science and Electronic Engineering, National Center University, Chung-Li 320, Taiwan, ROC</affiliation>
<role><roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="research-article" displayLabel="Full-length article"></genre>
<originInfo><publisher>ELSEVIER</publisher>
<dateIssued encoding="w3cdtf">2000</dateIssued>
<dateValid encoding="w3cdtf">2000-09-26</dateValid>
<dateModified encoding="w3cdtf">2000-08-09</dateModified>
<copyrightDate encoding="w3cdtf">2001</copyrightDate>
</originInfo>
<language><languageTerm type="code" authority="iso639-2b">eng</languageTerm>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
</language>
<physicalDescription><internetMediaType>text/html</internetMediaType>
</physicalDescription>
<abstract lang="en">In this study, the method of implementing the three functions that can offer great help for a traditional OCCR (Optical Chinese Character Recognition) system is proposed: (1) to identify the font used in a document; (2) to detect and recognize the most frequently used (MFU) characters; and (3) to distinguish between the machine-printed and hand-written characters. According to the study investigated by Chang and Chen (Proceedings of the ICCC, 1994, pp. 310–316), about 20% of Chinese characters in a text document are predominated by the top-40 MFU characters. If those MFU characters in a text document can be detected before adopting the traditional OCCR method, there will be great savings in computation time. The proposed method for character detection consists of the following three stages: the stage of segmentation, the stage of feature extraction, and the stage of classification. In the first stage, based on the concept of projection profile, the method presented by Wang et al. (Pattern Recognition 30 (1997) 1213) is utilized to segment characters individually from the input text document. In the second stage, three different types of features are introduced, including the density of black pixels, the projection profile code, and the modified skeleton template. These features are used to check whether the segmented character is semi-matched or fully-matched with the MFU template. Finally, in the last stage, based on the matching result, three different algorithms for implementing the aforementioned functions are provided. Experimental results are given in this study to demonstrate the practicality and superiority of the proposed method.</abstract>
<note type="content">Fig. 1: The flowchart of the proposed system.</note>
<note type="content">Fig. 2: The result of character segmentation. (a) The original image. (b) The segmentation result.</note>
<note type="content">Fig. 3: An illustration for the method of projection profile coding.</note>
<note type="content">Fig. 4: The thinning result. (a) The original character. (b) The skeleton template after the removal of thin strokes.</note>
<note type="content">Fig. 5: Test document I.</note>
<note type="content">Fig. 6: Test document J.</note>
<note type="content">Fig. 7: The chart of the recognition rate and the number of MFU characters selected.</note>
<note type="content">Fig. 8: The test image used for measuring the performance of our method.</note>
<note type="content">Fig. 9: The top-40 MFU characters detected for the test image shown in Fig. 6.</note>
<note type="content">Table 1: The frequency table of the top-50 MFU Chinese characters</note>
<note type="content">Table 2: The list of the selected top-40 MFU characters</note>
<note type="content">Table 3: The summarized results for test images A–H</note>
<note type="content">Table 4: The summarized results for the five sets of test images</note>
<subject lang="en"><genre>Keywords</genre>
<topic>Feature extraction</topic>
<topic>Template matching</topic>
<topic>Character recognition</topic>
<topic>Font identification</topic>
<topic>Text distinction</topic>
</subject>
<relatedItem type="host"><titleInfo><title>Image and Vision Computing</title>
</titleInfo>
<titleInfo type="abbreviated"><title>IMAVIS</title>
</titleInfo>
<genre type="Journal">journal</genre>
<originInfo><dateIssued encoding="w3cdtf">20010415</dateIssued>
</originInfo>
<identifier type="ISSN">0262-8856</identifier>
<identifier type="PII">S0262-8856(00)X0074-1</identifier>
<part><date>20010415</date>
<detail type="volume"><number>19</number>
<caption>vol.</caption>
</detail>
<detail type="issue"><number>6</number>
<caption>no.</caption>
</detail>
<extent unit="issue pages"><start>317</start>
<end>412</end>
</extent>
<extent unit="pages"><start>329</start>
<end>338</end>
</extent>
</part>
</relatedItem>
<identifier type="istex">4A8175B424D8D0E33BD442A591B43A5C1A0428A3</identifier>
<identifier type="DOI">10.1016/S0262-8856(00)00082-2</identifier>
<identifier type="PII">S0262-8856(00)00082-2</identifier>
<accessCondition type="use and reproduction" contentType="">© 2001Elsevier Science B.V.</accessCondition>
<recordInfo><recordContentSource>ELSEVIER</recordContentSource>
<recordOrigin>Elsevier Science B.V., ©2001</recordOrigin>
</recordInfo>
</mods>
</metadata>
<enrichments><istex:catWosTEI uri="https://api.istex.fr/document/4A8175B424D8D0E33BD442A591B43A5C1A0428A3/enrichments/catWos"><teiHeader><profileDesc><textClass><classCode scheme="WOS">COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE</classCode>
<classCode scheme="WOS">COMPUTER SCIENCE, SOFTWARE ENGINEERING</classCode>
<classCode scheme="WOS">COMPUTER SCIENCE, THEORY & METHODS</classCode>
<classCode scheme="WOS">ENGINEERING, ELECTRICAL & ELECTRONIC</classCode>
<classCode scheme="WOS">OPTICS</classCode>
</textClass>
</profileDesc>
</teiHeader>
</istex:catWosTEI>
</enrichments>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000057 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000057 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:4A8175B424D8D0E33BD442A591B43A5C1A0428A3
   |texte=   Chinese text distinction and font identification by recognizing most frequently used characters
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Chinese text distinction and font identification by recognizing most frequently used characters

Chinese text distinction and font identification by recognizing most frequently used characters

Source :

English descriptors

Abstract

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri