Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Two template matching approaches to arabic, amharic and latin isolated characters recognition

Identifieur interne : 000395 ( PascalFrancis/Corpus ); précédent : 000394; suivant : 000396

Two template matching approaches to arabic, amharic and latin isolated characters recognition

Auteurs : John Cowell ; Fiaz Hussain

Source :

RBID : Pascal:06-0200198

Descripteurs français

English descriptors

Abstract

With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 1230-0535
A03   1    @0 Mach. graph. vis.
A05       @2 14
A06       @2 2
A08 01  1  ENG  @1 Two template matching approaches to arabic, amharic and latin isolated characters recognition
A11 01  1    @1 COWELL (John)
A11 02  1    @1 HUSSAIN (Fiaz)
A14 01      @1 Centre for Computational Intelligence, De Montfort University, The Gateway @2 Leicester, LE1 9BH, England @3 GBR @Z 1 aut.
A14 02      @1 Dept. of Computing Information Systems, University of Luton,Park Square @2 Luton, LU1 3JU,England @3 GBR @Z 2 aut.
A20       @1 213-232
A21       @1 2005
A23 01      @0 ENG
A43 01      @1 INIST @2 27544 @5 354000134694410060
A44       @0 0000 @1 © 2006 INIST-CNRS. All rights reserved.
A45       @0 27 ref.
A47 01  1    @0 06-0200198
A60       @1 P
A61       @0 A
A64 01  1    @0 Machine graphics & vision
A66 01      @0 POL
C01 01    ENG  @0 With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.
C02 01  X    @0 001D02C03
C03 01  X  FRE  @0 Concordance forme @5 06
C03 01  X  ENG  @0 Pattern matching @5 06
C03 02  3  FRE  @0 Appariement image @5 07
C03 02  3  ENG  @0 Image matching @5 07
C03 03  X  FRE  @0 Reconnaissance caractère @5 08
C03 03  X  ENG  @0 Character recognition @5 08
C03 03  X  SPA  @0 Reconocimiento carácter @5 08
C03 04  X  FRE  @0 Reconnaissance forme @5 09
C03 04  X  ENG  @0 Pattern recognition @5 09
C03 04  X  SPA  @0 Reconocimiento patrón @5 09
C03 05  X  FRE  @0 Reconnaissance optique caractère @5 10
C03 05  X  ENG  @0 Optical character recognition @5 10
C03 05  X  SPA  @0 Reconocimento óptico de caracteres @5 10
C03 06  X  FRE  @0 Texte @5 11
C03 06  X  ENG  @0 Text @5 11
C03 06  X  SPA  @0 Texto @5 11
C03 07  X  FRE  @0 Chinois @5 12
C03 07  X  ENG  @0 Chinese @5 12
C03 07  X  SPA  @0 Chino @5 12
C03 08  X  FRE  @0 Signature électronique @5 13
C03 08  X  ENG  @0 Digital signature @5 13
C03 08  X  SPA  @0 Firma numérica @5 13
C03 09  X  FRE  @0 Similitude @5 14
C03 09  X  ENG  @0 Similarity @5 14
C03 09  X  SPA  @0 Similitud @5 14
C03 10  X  FRE  @0 Arabe @5 18
C03 10  X  ENG  @0 Arabic @5 18
C03 10  X  SPA  @0 Árabe @5 18
C03 11  X  FRE  @0 Japonais @5 19
C03 11  X  ENG  @0 Japanese @5 19
C03 11  X  SPA  @0 Japonés @5 19
C03 12  X  FRE  @0 Jeu caractère @5 20
C03 12  X  ENG  @0 Character set @5 20
C03 12  X  SPA  @0 Juego caracter @5 20
N21       @1 128
N44 01      @1 OTO
N82       @1 OTO

Format Inist (serveur)

NO : PASCAL 06-0200198 INIST
ET : Two template matching approaches to arabic, amharic and latin isolated characters recognition
AU : COWELL (John); HUSSAIN (Fiaz)
AF : Centre for Computational Intelligence, De Montfort University, The Gateway/Leicester, LE1 9BH, England/Royaume-Uni (1 aut.); Dept. of Computing Information Systems, University of Luton,Park Square/Luton, LU1 3JU,England/Royaume-Uni (2 aut.)
DT : Publication en série; Niveau analytique
SO : Machine graphics & vision; ISSN 1230-0535; Pologne; Da. 2005; Vol. 14; No. 2; Pp. 213-232; Bibl. 27 ref.
LA : Anglais
EA : With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.
CC : 001D02C03
FD : Concordance forme; Appariement image; Reconnaissance caractère; Reconnaissance forme; Reconnaissance optique caractère; Texte; Chinois; Signature électronique; Similitude; Arabe; Japonais; Jeu caractère
ED : Pattern matching; Image matching; Character recognition; Pattern recognition; Optical character recognition; Text; Chinese; Digital signature; Similarity; Arabic; Japanese; Character set
SD : Reconocimiento carácter; Reconocimiento patrón; Reconocimento óptico de caracteres; Texto; Chino; Firma numérica; Similitud; Árabe; Japonés; Juego caracter
LO : INIST-27544.354000134694410060
ID : 06-0200198

Links to Exploration step

Pascal:06-0200198

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Two template matching approaches to arabic, amharic and latin isolated characters recognition</title>
<author>
<name sortKey="Cowell, John" sort="Cowell, John" uniqKey="Cowell J" first="John" last="Cowell">John Cowell</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Centre for Computational Intelligence, De Montfort University, The Gateway</s1>
<s2>Leicester, LE1 9BH, England</s2>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Hussain, Fiaz" sort="Hussain, Fiaz" uniqKey="Hussain F" first="Fiaz" last="Hussain">Fiaz Hussain</name>
<affiliation>
<inist:fA14 i1="02">
<s1>Dept. of Computing Information Systems, University of Luton,Park Square</s1>
<s2>Luton, LU1 3JU,England</s2>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">06-0200198</idno>
<date when="2005">2005</date>
<idno type="stanalyst">PASCAL 06-0200198 INIST</idno>
<idno type="RBID">Pascal:06-0200198</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000395</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Two template matching approaches to arabic, amharic and latin isolated characters recognition</title>
<author>
<name sortKey="Cowell, John" sort="Cowell, John" uniqKey="Cowell J" first="John" last="Cowell">John Cowell</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Centre for Computational Intelligence, De Montfort University, The Gateway</s1>
<s2>Leicester, LE1 9BH, England</s2>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Hussain, Fiaz" sort="Hussain, Fiaz" uniqKey="Hussain F" first="Fiaz" last="Hussain">Fiaz Hussain</name>
<affiliation>
<inist:fA14 i1="02">
<s1>Dept. of Computing Information Systems, University of Luton,Park Square</s1>
<s2>Luton, LU1 3JU,England</s2>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Machine graphics & vision</title>
<title level="j" type="abbreviated">Mach. graph. vis.</title>
<idno type="ISSN">1230-0535</idno>
<imprint>
<date when="2005">2005</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Machine graphics & vision</title>
<title level="j" type="abbreviated">Mach. graph. vis.</title>
<idno type="ISSN">1230-0535</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Arabic</term>
<term>Character recognition</term>
<term>Character set</term>
<term>Chinese</term>
<term>Digital signature</term>
<term>Image matching</term>
<term>Japanese</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Pattern recognition</term>
<term>Similarity</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Concordance forme</term>
<term>Appariement image</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Texte</term>
<term>Chinois</term>
<term>Signature électronique</term>
<term>Similitude</term>
<term>Arabe</term>
<term>Japonais</term>
<term>Jeu caractère</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>1230-0535</s0>
</fA01>
<fA03 i2="1">
<s0>Mach. graph. vis.</s0>
</fA03>
<fA05>
<s2>14</s2>
</fA05>
<fA06>
<s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>Two template matching approaches to arabic, amharic and latin isolated characters recognition</s1>
</fA08>
<fA11 i1="01" i2="1">
<s1>COWELL (John)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>HUSSAIN (Fiaz)</s1>
</fA11>
<fA14 i1="01">
<s1>Centre for Computational Intelligence, De Montfort University, The Gateway</s1>
<s2>Leicester, LE1 9BH, England</s2>
<s3>GBR</s3>
<sZ>1 aut.</sZ>
</fA14>
<fA14 i1="02">
<s1>Dept. of Computing Information Systems, University of Luton,Park Square</s1>
<s2>Luton, LU1 3JU,England</s2>
<s3>GBR</s3>
<sZ>2 aut.</sZ>
</fA14>
<fA20>
<s1>213-232</s1>
</fA20>
<fA21>
<s1>2005</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>27544</s2>
<s5>354000134694410060</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2006 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>27 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>06-0200198</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Machine graphics & vision</s0>
</fA64>
<fA66 i1="01">
<s0>POL</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D02C03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Concordance forme</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Pattern matching</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE">
<s0>Appariement image</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG">
<s0>Image matching</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Reconnaissance caractère</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Character recognition</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA">
<s0>Reconocimiento carácter</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Reconnaissance forme</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Pattern recognition</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Reconocimiento patrón</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Reconnaissance optique caractère</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Optical character recognition</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Reconocimento óptico de caracteres</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Texte</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>Text</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Texto</s0>
<s5>11</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Chinois</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Chinese</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Chino</s0>
<s5>12</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Signature électronique</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Digital signature</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Firma numérica</s0>
<s5>13</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Similitude</s0>
<s5>14</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Similarity</s0>
<s5>14</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Similitud</s0>
<s5>14</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Arabe</s0>
<s5>18</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Arabic</s0>
<s5>18</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Árabe</s0>
<s5>18</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE">
<s0>Japonais</s0>
<s5>19</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG">
<s0>Japanese</s0>
<s5>19</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA">
<s0>Japonés</s0>
<s5>19</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Jeu caractère</s0>
<s5>20</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Character set</s0>
<s5>20</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Juego caracter</s0>
<s5>20</s5>
</fC03>
<fN21>
<s1>128</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
</standard>
<server>
<NO>PASCAL 06-0200198 INIST</NO>
<ET>Two template matching approaches to arabic, amharic and latin isolated characters recognition</ET>
<AU>COWELL (John); HUSSAIN (Fiaz)</AU>
<AF>Centre for Computational Intelligence, De Montfort University, The Gateway/Leicester, LE1 9BH, England/Royaume-Uni (1 aut.); Dept. of Computing Information Systems, University of Luton,Park Square/Luton, LU1 3JU,England/Royaume-Uni (2 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>Machine graphics & vision; ISSN 1230-0535; Pologne; Da. 2005; Vol. 14; No. 2; Pp. 213-232; Bibl. 27 ref.</SO>
<LA>Anglais</LA>
<EA>With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The Unicode 4.0 standard supports 50 scripts that are used across the world, and many, such as Amharic (Ethiopic), have attracted virtually no attention from researchers. An extensive literature review reveals no papers which report on an OCR system for Amharic. This paper describes a normalised technique which can be used for recognition of isolated Arabic, Amharic and Latin characters. Two approaches are considered for identifying the characters by comparing them to a series of templates and using a signature template scheme. The degrees of similarity between pairs of Amharic, Arabic and typical Latin characters are presented in the confusion matrix, and the performance of the two approaches is compared for each of these three character sets.</EA>
<CC>001D02C03</CC>
<FD>Concordance forme; Appariement image; Reconnaissance caractère; Reconnaissance forme; Reconnaissance optique caractère; Texte; Chinois; Signature électronique; Similitude; Arabe; Japonais; Jeu caractère</FD>
<ED>Pattern matching; Image matching; Character recognition; Pattern recognition; Optical character recognition; Text; Chinese; Digital signature; Similarity; Arabic; Japanese; Character set</ED>
<SD>Reconocimiento carácter; Reconocimiento patrón; Reconocimento óptico de caracteres; Texto; Chino; Firma numérica; Similitud; Árabe; Japonés; Juego caracter</SD>
<LO>INIST-27544.354000134694410060</LO>
<ID>06-0200198</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000395 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000395 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:06-0200198
   |texte=   Two template matching approaches to arabic, amharic and latin isolated characters recognition
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024