Visual similarity analysis of Chinese characters and its uses in Japanese OCR
Identifieur interne : 002B77 ( Main/Exploration ); précédent : 002B76; suivant : 002B78Visual similarity analysis of Chinese characters and its uses in Japanese OCR
Auteurs : TAO HONG [États-Unis] ; S. W. Lam [États-Unis] ; J. J. Hull ; S. N. Srihari [États-Unis]Source :
- SPIE proceedings series [ 1017-2653 ] ; 1995.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Document.
English descriptors
- KwdEn :
Abstract
Traditionally, a Chinese or Japanese Optical Character Reader (OCR) has to represent each character category individually as one or more feature prototypes, or a structural description which is a composition of manually derived components such as radicals. Here we propose a new approach in which various kinds of visual similarities between different Chinese characters are analyzed automatically at the feature level. Using this method, character categories will be related to each other by training on fonts; and character images from a text page can be related to each other based on visual similarities they share. This method provides a way to interpret character images from a text page systematically, instead of a sequence of isolated character recognitions. The use of the method for postprocessing in Japanese text recognition will also be discussed.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000960
- to stream PascalFrancis, to step Curation: 000A39
- to stream PascalFrancis, to step Checkpoint: 000981
- to stream Main, to step Merge: 002D34
- to stream Main, to step Curation: 002B77
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Visual similarity analysis of Chinese characters and its uses in Japanese OCR</title>
<author><name sortKey="Tao Hong" sort="Tao Hong" uniqKey="Tao Hong" last="Tao Hong">TAO HONG</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Lam, S W" sort="Lam, S W" uniqKey="Lam S" first="S. W." last="Lam">S. W. Lam</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Hull, J J" sort="Hull, J J" uniqKey="Hull J" first="J. J." last="Hull">J. J. Hull</name>
</author>
<author><name sortKey="Srihari, S N" sort="Srihari, S N" uniqKey="Srihari S" first="S. N." last="Srihari">S. N. Srihari</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">97-0121468</idno>
<date when="1995">1995</date>
<idno type="stanalyst">PASCAL 97-0121468 INIST</idno>
<idno type="RBID">Pascal:97-0121468</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000960</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000A39</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000981</idno>
<idno type="wicri:doubleKey">1017-2653:1995:Tao Hong:visual:similarity:analysis</idno>
<idno type="wicri:Area/Main/Merge">002D34</idno>
<idno type="wicri:Area/Main/Curation">002B77</idno>
<idno type="wicri:Area/Main/Exploration">002B77</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Visual similarity analysis of Chinese characters and its uses in Japanese OCR</title>
<author><name sortKey="Tao Hong" sort="Tao Hong" uniqKey="Tao Hong" last="Tao Hong">TAO HONG</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Lam, S W" sort="Lam, S W" uniqKey="Lam S" first="S. W." last="Lam">S. W. Lam</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Hull, J J" sort="Hull, J J" uniqKey="Hull J" first="J. J." last="Hull">J. J. Hull</name>
</author>
<author><name sortKey="Srihari, S N" sort="Srihari, S N" uniqKey="Srihari S" first="S. N." last="Srihari">S. N. Srihari</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="1995">1995</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Chinese</term>
<term>Document</term>
<term>Japanese</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Printed character</term>
<term>Similarity</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance optique caractère</term>
<term>Chinois</term>
<term>Japonais</term>
<term>Document</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Caractère imprimé</term>
<term>Similarité</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Document</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Traditionally, a Chinese or Japanese Optical Character Reader (OCR) has to represent each character category individually as one or more feature prototypes, or a structural description which is a composition of manually derived components such as radicals. Here we propose a new approach in which various kinds of visual similarities between different Chinese characters are analyzed automatically at the feature level. Using this method, character categories will be related to each other by training on fonts; and character images from a text page can be related to each other based on visual similarities they share. This method provides a way to interpret character images from a text page systematically, instead of a sequence of isolated character recognitions. The use of the method for postprocessing in Japanese text recognition will also be discussed.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
</list>
<tree><noCountry><name sortKey="Hull, J J" sort="Hull, J J" uniqKey="Hull J" first="J. J." last="Hull">J. J. Hull</name>
</noCountry>
<country name="États-Unis"><region name="État de New York"><name sortKey="Tao Hong" sort="Tao Hong" uniqKey="Tao Hong" last="Tao Hong">TAO HONG</name>
</region>
<name sortKey="Lam, S W" sort="Lam, S W" uniqKey="Lam S" first="S. W." last="Lam">S. W. Lam</name>
<name sortKey="Srihari, S N" sort="Srihari, S N" uniqKey="Srihari S" first="S. N." last="Srihari">S. N. Srihari</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002B77 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002B77 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:97-0121468 |texte= Visual similarity analysis of Chinese characters and its uses in Japanese OCR }}
This area was generated with Dilib version V0.6.32. |