Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Visual similarity analysis of Chinese characters and its uses in Japanese OCR

Identifieur interne : 002B77 ( Main/Exploration ); précédent : 002B76; suivant : 002B78

Visual similarity analysis of Chinese characters and its uses in Japanese OCR

Auteurs : TAO HONG [États-Unis] ; S. W. Lam [États-Unis] ; J. J. Hull ; S. N. Srihari [États-Unis]

Source :

RBID : Pascal:97-0121468

Descripteurs français

English descriptors

Abstract

Traditionally, a Chinese or Japanese Optical Character Reader (OCR) has to represent each character category individually as one or more feature prototypes, or a structural description which is a composition of manually derived components such as radicals. Here we propose a new approach in which various kinds of visual similarities between different Chinese characters are analyzed automatically at the feature level. Using this method, character categories will be related to each other by training on fonts; and character images from a text page can be related to each other based on visual similarities they share. This method provides a way to interpret character images from a text page systematically, instead of a sequence of isolated character recognitions. The use of the method for postprocessing in Japanese text recognition will also be discussed.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Visual similarity analysis of Chinese characters and its uses in Japanese OCR</title>
<author>
<name sortKey="Tao Hong" sort="Tao Hong" uniqKey="Tao Hong" last="Tao Hong">TAO HONG</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lam, S W" sort="Lam, S W" uniqKey="Lam S" first="S. W." last="Lam">S. W. Lam</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Hull, J J" sort="Hull, J J" uniqKey="Hull J" first="J. J." last="Hull">J. J. Hull</name>
</author>
<author>
<name sortKey="Srihari, S N" sort="Srihari, S N" uniqKey="Srihari S" first="S. N." last="Srihari">S. N. Srihari</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">97-0121468</idno>
<date when="1995">1995</date>
<idno type="stanalyst">PASCAL 97-0121468 INIST</idno>
<idno type="RBID">Pascal:97-0121468</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000960</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000A39</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000981</idno>
<idno type="wicri:doubleKey">1017-2653:1995:Tao Hong:visual:similarity:analysis</idno>
<idno type="wicri:Area/Main/Merge">002D34</idno>
<idno type="wicri:Area/Main/Curation">002B77</idno>
<idno type="wicri:Area/Main/Exploration">002B77</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Visual similarity analysis of Chinese characters and its uses in Japanese OCR</title>
<author>
<name sortKey="Tao Hong" sort="Tao Hong" uniqKey="Tao Hong" last="Tao Hong">TAO HONG</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lam, S W" sort="Lam, S W" uniqKey="Lam S" first="S. W." last="Lam">S. W. Lam</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Hull, J J" sort="Hull, J J" uniqKey="Hull J" first="J. J." last="Hull">J. J. Hull</name>
</author>
<author>
<name sortKey="Srihari, S N" sort="Srihari, S N" uniqKey="Srihari S" first="S. N." last="Srihari">S. N. Srihari</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Center of Excellence for Document Analysis and Recognition (CEDAR), State University of New York at Buffalo, The UB Commons, 520 Lee Entrance, Suite 202</s1>
<s2>Amherst, NY 14228-2567</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="1995">1995</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Chinese</term>
<term>Document</term>
<term>Japanese</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Printed character</term>
<term>Similarity</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance optique caractère</term>
<term>Chinois</term>
<term>Japonais</term>
<term>Document</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Caractère imprimé</term>
<term>Similarité</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Document</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Traditionally, a Chinese or Japanese Optical Character Reader (OCR) has to represent each character category individually as one or more feature prototypes, or a structural description which is a composition of manually derived components such as radicals. Here we propose a new approach in which various kinds of visual similarities between different Chinese characters are analyzed automatically at the feature level. Using this method, character categories will be related to each other by training on fonts; and character images from a text page can be related to each other based on visual similarities they share. This method provides a way to interpret character images from a text page systematically, instead of a sequence of isolated character recognitions. The use of the method for postprocessing in Japanese text recognition will also be discussed.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>État de New York</li>
</region>
</list>
<tree>
<noCountry>
<name sortKey="Hull, J J" sort="Hull, J J" uniqKey="Hull J" first="J. J." last="Hull">J. J. Hull</name>
</noCountry>
<country name="États-Unis">
<region name="État de New York">
<name sortKey="Tao Hong" sort="Tao Hong" uniqKey="Tao Hong" last="Tao Hong">TAO HONG</name>
</region>
<name sortKey="Lam, S W" sort="Lam, S W" uniqKey="Lam S" first="S. W." last="Lam">S. W. Lam</name>
<name sortKey="Srihari, S N" sort="Srihari, S N" uniqKey="Srihari S" first="S. N." last="Srihari">S. N. Srihari</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002B77 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002B77 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:97-0121468
   |texte=   Visual similarity analysis of Chinese characters and its uses in Japanese OCR
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024