OcrV1, Main, Exploration, bibRecordById, Pascal:96-0007912

Computer recognition of printed Bangla script

Identifieur interne : 002C14 ( Main/Exploration ); précédent : 002C13; suivant : 002C15

Computer recognition of printed Bangla script

Auteurs : U. Pal [Inde] ; Bidyut Baran Chaudhuri [Inde]

Source :

International journal of systems science [ 0020-7721 ] ; 1995.

RBID : Pascal:96-0007912

Descripteurs français

Pascal (Inist)
- Reconnaissance caractère, Inde, Segmentation, Extraction forme, Classificateur, Traitement image, Numérisation, Histogramme, Reconnaissance forme, Chaîne caractère, OCR.
Wicri :
- geographic : Inde.
- topic : Numérisation.

English descriptors

KwdEn :
- Character recognition, Character string, Classifier, Digitizing, Histogram, Image processing, India, OCR, Pattern extraction, Pattern recognition, Segmentation.

Abstract

This paper considers optical character recognition (OCR) of Bangla, the second most popular script in the Indian subcontinent. A complete OCR system is described for documents of single Bangla font, where more than three hundred character shapes are recognized by a combination of template and feature-matching approach. Here the document image captured by a flatbed scanner is subject to tilt correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. Some character occurrence statistics have been computed to aid the recognition process. The simple character recognition is done by a feature-based tree classifier, and the compound character recognition involves a template matching approach preceded by a feature-based grouping. At present, recognition accuracy of about 96% is obtained by the system.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000A36
to stream PascalFrancis, to step Curation: 000963
to stream PascalFrancis, to step Checkpoint: 000A23
to stream Main, to step Merge: 002D76
to stream Main, to step Curation: 002C14

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Computer recognition of printed Bangla script</title>
<author><name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Indian statistical inst., computer vision pattern recognition unit</s1>
<s2>Calcutta 700 035</s2>
<s3>IND</s3>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta 700 035</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation><country>Inde</country>
<placeName><settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">96-0007912</idno>
<date when="1995">1995</date>
<idno type="stanalyst">PASCAL 96-0007912 INIST</idno>
<idno type="RBID">Pascal:96-0007912</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000A36</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000963</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000A23</idno>
<idno type="wicri:doubleKey">0020-7721:1995:Pal U:computer:recognition:of</idno>
<idno type="wicri:Area/Main/Merge">002D76</idno>
<idno type="wicri:Area/Main/Curation">002C14</idno>
<idno type="wicri:Area/Main/Exploration">002C14</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Computer recognition of printed Bangla script</title>
<author><name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Indian statistical inst., computer vision pattern recognition unit</s1>
<s2>Calcutta 700 035</s2>
<s3>IND</s3>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Calcutta 700 035</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation><country>Inde</country>
<placeName><settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal of systems science</title>
<title level="j" type="abbreviated">Int. j. syst. sci.</title>
<idno type="ISSN">0020-7721</idno>
<imprint><date when="1995">1995</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal of systems science</title>
<title level="j" type="abbreviated">Int. j. syst. sci.</title>
<idno type="ISSN">0020-7721</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Character string</term>
<term>Classifier</term>
<term>Digitizing</term>
<term>Histogram</term>
<term>Image processing</term>
<term>India</term>
<term>OCR</term>
<term>Pattern extraction</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Inde</term>
<term>Segmentation</term>
<term>Extraction forme</term>
<term>Classificateur</term>
<term>Traitement image</term>
<term>Numérisation</term>
<term>Histogramme</term>
<term>Reconnaissance forme</term>
<term>Chaîne caractère</term>
<term>OCR</term>
</keywords>
<keywords scheme="Wicri" type="geographic" xml:lang="fr"><term>Inde</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper considers optical character recognition (OCR) of Bangla, the second most popular script in the Indian subcontinent. A complete OCR system is described for documents of single Bangla font, where more than three hundred character shapes are recognized by a combination of template and feature-matching approach. Here the document image captured by a flatbed scanner is subject to tilt correction, line, word and character segmentation, simple and compound character separation, feature extraction and finally character recognition. Some character occurrence statistics have been computed to aid the recognition process. The simple character recognition is done by a feature-based tree classifier, and the compound character recognition involves a template matching approach preceded by a feature-based grouping. At present, recognition accuracy of about 96% is obtained by the system.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
<region><li>Bengale-Occidental</li>
</region>
<settlement><li>Calcutta</li>
</settlement>
<orgName><li>Institut indien de statistiques</li>
</orgName>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Pal, U" sort="Pal, U" uniqKey="Pal U" first="U." last="Pal">U. Pal</name>
</noRegion>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002C14 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 002C14 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:96-0007912
   |texte=   Computer recognition of printed Bangla script
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Computer recognition of printed Bangla script

Computer recognition of printed Bangla script

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri