Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Computer recognition of printed Bangla script

Identifieur interne : 002C14 ( Main/Exploration ); précédent : 002C13; suivant : 002C15

Computer recognition of printed Bangla script

Auteurs : U. Pal [Inde] ; Bidyut Baran Chaudhuri [Inde]

Source :

RBID : Pascal:96-0007912

Descripteurs français

English descriptors

Abstract

This paper considers optical character recognition (OCR) of Bangla, the second most popular script in the Indian subcontinent. A complete OCR system is described for documents of single Bangla font, where more than three hundred character shapes are recognized by a combination of template and feature-matching approach. Here the document image captured by a flatbed scanner is subject to tilt correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. Some character occurrence statistics have been computed to aid the recognition process. The simple character recognition is done by a feature-based tree classifier, and the compound character recognition involves a template matching approach preceded by a feature-based grouping. At present, recognition accuracy of about 96% is obtained by the system.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Computer recognition of printed Bangla script</title>
<author>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Indian statistical inst., computer vision pattern recognition unit</s1>
<s2>Calcutta 700 035</s2>
<s3>IND</s3>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta 700 035</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation>
<country>Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">96-0007912</idno>
<date when="1995">1995</date>
<idno type="stanalyst">PASCAL 96-0007912 INIST</idno>
<idno type="RBID">Pascal:96-0007912</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000A36</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000963</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000A23</idno>
<idno type="wicri:doubleKey">0020-7721:1995:Pal U:computer:recognition:of</idno>
<idno type="wicri:Area/Main/Merge">002D76</idno>
<idno type="wicri:Area/Main/Curation">002C14</idno>
<idno type="wicri:Area/Main/Exploration">002C14</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Computer recognition of printed Bangla script</title>
<author>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Indian statistical inst., computer vision pattern recognition unit</s1>
<s2>Calcutta 700 035</s2>
<s3>IND</s3>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta 700 035</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation>
<country>Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal of systems science</title>
<title level="j" type="abbreviated">Int. j. syst. sci.</title>
<idno type="ISSN">0020-7721</idno>
<imprint>
<date when="1995">1995</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal of systems science</title>
<title level="j" type="abbreviated">Int. j. syst. sci.</title>
<idno type="ISSN">0020-7721</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Character string</term>
<term>Classifier</term>
<term>Digitizing</term>
<term>Histogram</term>
<term>Image processing</term>
<term>India</term>
<term>OCR</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance caractère</term>
<term>Inde</term>
<term>Segmentation</term>
<term>Extraction forme</term>
<term>Classificateur</term>
<term>Traitement image</term>
<term>Numérisation</term>
<term>Histogramme</term>
<term>Reconnaissance forme</term>
<term>Chaîne caractère</term>
<term>OCR</term>
</keywords>
<keywords scheme="Wicri" type="geographic" xml:lang="fr">
<term>Inde</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper considers optical character recognition (OCR) of Bangla, the second most popular script in the Indian subcontinent. A complete OCR system is described for documents of single Bangla font, where more than three hundred character shapes are recognized by a combination of template and feature-matching approach. Here the document image captured by a flatbed scanner is subject to tilt correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. Some character occurrence statistics have been computed to aid the recognition process. The simple character recognition is done by a feature-based tree classifier, and the compound character recognition involves a template matching approach preceded by a feature-based grouping. At present, recognition accuracy of about 96% is obtained by the system.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
<region>
<li>Bengale-Occidental</li>
</region>
<settlement>
<li>Calcutta</li>
</settlement>
<orgName>
<li>Institut indien de statistiques</li>
</orgName>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
</noRegion>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002C14 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002C14 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:96-0007912
   |texte=   Computer recognition of printed Bangla script
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024