Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

Identifieur interne : 000028 ( Pmc/Corpus ); précédent : 000027; suivant : 000029

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research

Auteurs : Laslo Dinges ; Ayoub Al-Hamadi ; Moftah Elzobi ; Sherif El-Etriby

Source :

RBID : PMC:4813921

Abstract

Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers—that we proposed earlier—improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.


Url:
DOI: 10.3390/s16030346
PubMed: 26978368
PubMed Central: 4813921

Links to Exploration step

PMC:4813921***** Acces problem to record *****\

Le document en format XML


Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Pmc/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000028 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd -nk 000028 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Pmc
   |étape=   Corpus
   |type=    RBID
   |clé=     PMC:4813921
   |texte=   Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Pmc/Corpus/RBID.i   -Sk "pubmed:26978368" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Pmc/Corpus/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024