OcrV1, PubMed, Checkpoint, bibRecord, 000001

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.

Identifieur interne : 000001 ( PubMed/Checkpoint ); précédent : 000000; suivant : 000002

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.

Auteurs : Laslo Dinges [Allemagne] ; Ayoub Al-Hamadi [Allemagne] ; Moftah Elzobi [Allemagne] ; Sherif El-Etriby [Égypte]

Source :

Sensors (Basel, Switzerland) [ 1424-8220 ] ; 2016.

RBID : pubmed:26978368

Abstract

Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers-that we proposed earlier-improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.

DOI: 10.3390/s16030346
PubMed: 26978368

Affiliations:

Allemagne, Égypte

Links toward previous steps (curation, corpus...)

to stream PubMed, to step Corpus: 000001
to stream PubMed, to step Curation: 000001

Links to Exploration step

pubmed:26978368

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.</title>
<author><name sortKey="Dinges, Laslo" sort="Dinges, Laslo" uniqKey="Dinges L" first="Laslo" last="Dinges">Laslo Dinges</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Laslo.Dinges@ovgu.de.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg</wicri:regionArea>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>D-39016 Magdeburg</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Al Hamadi, Ayoub" sort="Al Hamadi, Ayoub" uniqKey="Al Hamadi A" first="Ayoub" last="Al-Hamadi">Ayoub Al-Hamadi</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Ayoub.Al-Hamadi@ovgu.de.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg</wicri:regionArea>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>D-39016 Magdeburg</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Elzobi, Moftah" sort="Elzobi, Moftah" uniqKey="Elzobi M" first="Moftah" last="Elzobi">Moftah Elzobi</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Moftah.Elzobi@ovgu.de.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg</wicri:regionArea>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>D-39016 Magdeburg</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="El Etriby, Sherif" sort="El Etriby, Sherif" uniqKey="El Etriby S" first="Sherif" last="El-Etriby">Sherif El-Etriby</name>
<affiliation wicri:level="1"><nlm:affiliation>Faculty of Computers and Information, Menoufia University-MUFIC, Menoufia 32721, Egypt. sherif.el-etriby@ci.menofia.edu.eg.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computers and Information, Menoufia University-MUFIC, Menoufia 32721</wicri:regionArea>
<wicri:noRegion>Menoufia 32721</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2016">2016</date>
<idno type="doi">10.3390/s16030346</idno>
<idno type="RBID">pubmed:26978368</idno>
<idno type="pmid">26978368</idno>
<idno type="wicri:Area/PubMed/Corpus">000001</idno>
<idno type="wicri:Area/PubMed/Curation">000001</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000001</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.</title>
<author><name sortKey="Dinges, Laslo" sort="Dinges, Laslo" uniqKey="Dinges L" first="Laslo" last="Dinges">Laslo Dinges</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Laslo.Dinges@ovgu.de.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg</wicri:regionArea>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>D-39016 Magdeburg</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Al Hamadi, Ayoub" sort="Al Hamadi, Ayoub" uniqKey="Al Hamadi A" first="Ayoub" last="Al-Hamadi">Ayoub Al-Hamadi</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Ayoub.Al-Hamadi@ovgu.de.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg</wicri:regionArea>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>D-39016 Magdeburg</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Elzobi, Moftah" sort="Elzobi, Moftah" uniqKey="Elzobi M" first="Moftah" last="Elzobi">Moftah Elzobi</name>
<affiliation wicri:level="1"><nlm:affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Moftah.Elzobi@ovgu.de.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg</wicri:regionArea>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>39016 Magdeburg</wicri:noRegion>
<wicri:noRegion>D-39016 Magdeburg</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="El Etriby, Sherif" sort="El Etriby, Sherif" uniqKey="El Etriby S" first="Sherif" last="El-Etriby">Sherif El-Etriby</name>
<affiliation wicri:level="1"><nlm:affiliation>Faculty of Computers and Information, Menoufia University-MUFIC, Menoufia 32721, Egypt. sherif.el-etriby@ci.menofia.edu.eg.</nlm:affiliation>
<country xml:lang="fr">Égypte</country>
<wicri:regionArea>Faculty of Computers and Information, Menoufia University-MUFIC, Menoufia 32721</wicri:regionArea>
<wicri:noRegion>Menoufia 32721</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">Sensors (Basel, Switzerland)</title>
<idno type="eISSN">1424-8220</idno>
<imprint><date when="2016" type="published">2016</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers-that we proposed earlier-improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.</div>
</front>
</TEI>
<pubmed><MedlineCitation Owner="NLM" Status="In-Data-Review"><PMID Version="1">26978368</PMID>
<DateCreated><Year>2016</Year>
<Month>03</Month>
<Day>16</Day>
</DateCreated>
<DateRevised><Year>2016</Year>
<Month>04</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Electronic"><Journal><ISSN IssnType="Electronic">1424-8220</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>16</Volume>
<Issue>3</Issue>
<PubDate><Year>2016</Year>
</PubDate>
</JournalIssue>
<Title>Sensors (Basel, Switzerland)</Title>
<ISOAbbreviation>Sensors (Basel)</ISOAbbreviation>
</Journal>
<ArticleTitle>Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.</ArticleTitle>
<Pagination><MedlinePgn></MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.3390/s16030346</ELocationID>
<ELocationID EIdType="pii" ValidYN="Y">E346</ELocationID>
<Abstract><AbstractText>Document analysis tasks such as pattern recognition, word spotting or segmentation, require comprehensive databases for training and validation. Not only variations in writing style but also the used list of words is of importance in the case that training samples should reflect the input of a specific area of application. However, generation of training samples is expensive in the sense of manpower and time, particularly if complete text pages including complex ground truth are required. This is why there is a lack of such databases, especially for Arabic, the second most popular language. However, Arabic handwriting recognition involves different preprocessing, segmentation and recognition methods. Each requires particular ground truth or samples to enable optimal training and validation, which are often not covered by the currently available databases. To overcome this issue, we propose a system that synthesizes Arabic handwritten words and text pages and generates corresponding detailed ground truth. We use these syntheses to validate a new, segmentation based system that recognizes handwritten Arabic words. We found that a modification of an Active Shape Model based character classifiers-that we proposed earlier-improves the word recognition accuracy. Further improvements are achieved, by using a vocabulary of the 50,000 most common Arabic words for error correction.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Dinges</LastName>
<ForeName>Laslo</ForeName>
<Initials>L</Initials>
<AffiliationInfo><Affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Laslo.Dinges@ovgu.de.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Al-Hamadi</LastName>
<ForeName>Ayoub</ForeName>
<Initials>A</Initials>
<AffiliationInfo><Affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Ayoub.Al-Hamadi@ovgu.de.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Elzobi</LastName>
<ForeName>Moftah</ForeName>
<Initials>M</Initials>
<AffiliationInfo><Affiliation>Institute for Information Technology and Communications (IIKT), Otto-von-Guericke-University Magdeburg, D-39016 Magdeburg, Germany. Moftah.Elzobi@ovgu.de.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>El-Etriby</LastName>
<ForeName>Sherif</ForeName>
<Initials>S</Initials>
<AffiliationInfo><Affiliation>Faculty of Computers and Information, Menoufia University-MUFIC, Menoufia 32721, Egypt. sherif.el-etriby@ci.menofia.edu.eg.</Affiliation>
</AffiliationInfo>
<AffiliationInfo><Affiliation>Department of Computer, Umm Al-Qura University, Makkah 21421, Saudi Arabia. sherif.el-etriby@ci.menofia.edu.eg.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2016</Year>
<Month>03</Month>
<Day>11</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>Switzerland</Country>
<MedlineTA>Sensors (Basel)</MedlineTA>
<NlmUniqueID>101204366</NlmUniqueID>
<ISSNLinking>1424-8220</ISSNLinking>
</MedlineJournalInfo>
<CitationSubset>IM</CitationSubset>
<CommentsCorrectionsList><CommentsCorrections RefType="Cites"><RefSource>IEEE Trans Pattern Anal Mach Intell. 2006 May;28(5):712-24</RefSource>
<PMID Version="1">16640258</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="Cites"><RefSource>Sensors (Basel). 2015;15(6):14241-60</RefSource>
<PMID Version="1">26091392</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
<OtherID Source="NLM">PMC4813921</OtherID>
<KeywordList Owner="NOTNLM"><Keyword MajorTopicYN="N">Active Shape Model</Keyword>
<Keyword MajorTopicYN="N">Arabic handwritings</Keyword>
<Keyword MajorTopicYN="N">digital pens</Keyword>
<Keyword MajorTopicYN="N">feature extraction and analysis</Keyword>
<Keyword MajorTopicYN="N">handwriting synthesis</Keyword>
<Keyword MajorTopicYN="N">intelligent systems</Keyword>
<Keyword MajorTopicYN="N">optical character recognition (OCR)</Keyword>
<Keyword MajorTopicYN="N">recognition and interpretation</Keyword>
<Keyword MajorTopicYN="N">word segmentation</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2015</Year>
<Month>12</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised"><Year>2016</Year>
<Month>2</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2016</Year>
<Month>2</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2016</Year>
<Month>3</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2016</Year>
<Month>3</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2016</Year>
<Month>3</Month>
<Day>16</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pii">s16030346</ArticleId>
<ArticleId IdType="doi">10.3390/s16030346</ArticleId>
<ArticleId IdType="pubmed">26978368</ArticleId>
<ArticleId IdType="pmc">PMC4813921</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations><list><country><li>Allemagne</li>
<li>Égypte</li>
</country>
</list>
<tree><country name="Allemagne"><noRegion><name sortKey="Dinges, Laslo" sort="Dinges, Laslo" uniqKey="Dinges L" first="Laslo" last="Dinges">Laslo Dinges</name>
</noRegion>
<name sortKey="Al Hamadi, Ayoub" sort="Al Hamadi, Ayoub" uniqKey="Al Hamadi A" first="Ayoub" last="Al-Hamadi">Ayoub Al-Hamadi</name>
<name sortKey="Elzobi, Moftah" sort="Elzobi, Moftah" uniqKey="Elzobi M" first="Moftah" last="Elzobi">Moftah Elzobi</name>
</country>
<country name="Égypte"><noRegion><name sortKey="El Etriby, Sherif" sort="El Etriby, Sherif" uniqKey="El Etriby S" first="Sherif" last="El-Etriby">Sherif El-Etriby</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PubMed/Checkpoint

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000001 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd -nk 000001 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PubMed
   |étape=   Checkpoint
   |type=    RBID
   |clé=     pubmed:26978368
   |texte=   Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/PubMed/Checkpoint/RBID.i   -Sk "pubmed:26978368" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/PubMed/Checkpoint/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.

Synthesis of Common Arabic Handwritings to Aid Optical Character Recognition Research.

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki