Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Offline Handwritten Arabic Character Segmentation with Probabilistic Model

Identifieur interne : 001051 ( Main/Merge ); précédent : 001050; suivant : 001052

Offline Handwritten Arabic Character Segmentation with Probabilistic Model

Auteurs : Pingping Xiu [République populaire de Chine] ; Liangrui Peng [République populaire de Chine] ; Xiaoqing Ding [République populaire de Chine] ; Hua Wang [République populaire de Chine]

Source :

RBID : ISTEX:E97F72643B849104A483AD5BE74808CEC4A1E1BB

Abstract

Abstract: The research on offline handwritten Arabic character recognition has received more and more attention in recent years, because of the increasing needs of Arabic document digitization. The variation in Arabic handwriting brings great difficulty in character segmentation and recognition, eg., the sub-parts (diacritics) of the Arabic character may shift away from the main part. In this paper, a new probabilistic segmentation model is proposed. First, a contour-based over-segmentation method is conducted, cutting the word image into graphemes. The graphemes are sorted into 3 queues, which are character main parts, sub-parts (diacritics) above or below main parts respectively. The confidence for each character is calculated by the probabilistic model, taking into account both of the recognizer output and the geometric confidence besides with logical constraint. Then, the global optimization is conducted to find optimal cutting path, taking weighted average of character confidences as objective function. Experiments on handwritten Arabic documents with various writing styles show the proposed method is effective.

Url:
DOI: 10.1007/11669487_36

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:E97F72643B849104A483AD5BE74808CEC4A1E1BB

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Offline Handwritten Arabic Character Segmentation with Probabilistic Model</title>
<author>
<name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
</author>
<author>
<name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
</author>
<author>
<name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
</author>
<author>
<name sortKey="Wang, Hua" sort="Wang, Hua" uniqKey="Wang H" first="Hua" last="Wang">Hua Wang</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E97F72643B849104A483AD5BE74808CEC4A1E1BB</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11669487_36</idno>
<idno type="url">https://api.istex.fr/document/E97F72643B849104A483AD5BE74808CEC4A1E1BB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000695</idno>
<idno type="wicri:Area/Istex/Curation">000687</idno>
<idno type="wicri:Area/Istex/Checkpoint">000A07</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Xiu P:offline:handwritten:arabic</idno>
<idno type="wicri:Area/Main/Merge">001051</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Offline Handwritten Arabic Character Segmentation with Probabilistic Model</title>
<author>
<name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
<affiliation wicri:level="3">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, 100084, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">République populaire de Chine</country>
</affiliation>
</author>
<author>
<name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
<affiliation wicri:level="3">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, 100084, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">République populaire de Chine</country>
</affiliation>
</author>
<author>
<name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<affiliation wicri:level="3">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, 100084, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">République populaire de Chine</country>
</affiliation>
</author>
<author>
<name sortKey="Wang, Hua" sort="Wang, Hua" uniqKey="Wang H" first="Hua" last="Wang">Hua Wang</name>
<affiliation wicri:level="3">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Dept. of Electronic Engineering, Tsinghua University, State Key Laboratory of Intelligent Technology and Systems, 100084, Beijing</wicri:regionArea>
<placeName>
<settlement type="city">Pékin</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">République populaire de Chine</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E97F72643B849104A483AD5BE74808CEC4A1E1BB</idno>
<idno type="DOI">10.1007/11669487_36</idno>
<idno type="ChapterID">36</idno>
<idno type="ChapterID">Chap36</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The research on offline handwritten Arabic character recognition has received more and more attention in recent years, because of the increasing needs of Arabic document digitization. The variation in Arabic handwriting brings great difficulty in character segmentation and recognition, eg., the sub-parts (diacritics) of the Arabic character may shift away from the main part. In this paper, a new probabilistic segmentation model is proposed. First, a contour-based over-segmentation method is conducted, cutting the word image into graphemes. The graphemes are sorted into 3 queues, which are character main parts, sub-parts (diacritics) above or below main parts respectively. The confidence for each character is calculated by the probabilistic model, taking into account both of the recognizer output and the geometric confidence besides with logical constraint. Then, the global optimization is conducted to find optimal cutting path, taking weighted average of character confidences as objective function. Experiments on handwritten Arabic documents with various writing styles show the proposed method is effective.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001051 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001051 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:E97F72643B849104A483AD5BE74808CEC4A1E1BB
   |texte=   Offline Handwritten Arabic Character Segmentation with Probabilistic Model
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024