A Restoration and Segmentation Unit for the Historic Persian Documents
Identifieur interne : 001388 ( Main/Merge ); précédent : 001387; suivant : 001389A Restoration and Segmentation Unit for the Historic Persian Documents
Auteurs : Shahpour Alirezaee [Iran] ; Shayesteh Fard [Iran] ; Hassan Aghaeinia [Iran] ; Karim Faez [Iran]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.
Abstract
Abstract: This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.
Url:
DOI: 10.1007/11558484_85
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001883
- to stream Istex, to step Curation: 001785
- to stream Istex, to step Checkpoint: 000C61
Links to Exploration step
ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BBLe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Restoration and Segmentation Unit for the Historic Persian Documents</title>
<author><name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
</author>
<author><name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
</author>
<author><name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
</author>
<author><name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11558484_85</idno>
<idno type="url">https://api.istex.fr/document/25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001883</idno>
<idno type="wicri:Area/Istex/Curation">001785</idno>
<idno type="wicri:Area/Istex/Checkpoint">000C61</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Alirezaee S:a:restoration:and</idno>
<idno type="wicri:Area/Main/Merge">001388</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Restoration and Segmentation Unit for the Historic Persian Documents</title>
<author><name sortKey="Alirezaee, Shahpour" sort="Alirezaee, Shahpour" uniqKey="Alirezaee S" first="Shahpour" last="Alirezaee">Shahpour Alirezaee</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Islamic Azad University of Abhar, Abhar</wicri:regionArea>
<wicri:noRegion>Abhar</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Fard, Shayesteh" sort="Fard, Shayesteh" uniqKey="Fard S" first="Shayesteh" last="Fard">Shayesteh Fard</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Zanjan University, Zanjan</wicri:regionArea>
<wicri:noRegion>Zanjan</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Aghaeinia, Hassan" sort="Aghaeinia, Hassan" uniqKey="Aghaeinia H" first="Hassan" last="Aghaeinia">Hassan Aghaeinia</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Amirkabir University of Technology, Hafez Ave., Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
<author><name sortKey="Faez, Karim" sort="Faez, Karim" uniqKey="Faez K" first="Karim" last="Faez">Karim Faez</name>
<affiliation wicri:level="1"><country xml:lang="fr">Iran</country>
<wicri:regionArea>Electrical Engineering Department, Amirkabir University of Technology, Hafez Ave., Tehran</wicri:regionArea>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Iran</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB</idno>
<idno type="DOI">10.1007/11558484_85</idno>
<idno type="ChapterID">85</idno>
<idno type="ChapterID">Chap85</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper aims to provide a document restoration and segmentation algorithm for the Historic Middle Persian or Pahlavi manuscripts. The proposed algorithm uses the mathematical morphology and connected component concept to segment the line, word, and character overlapped in the Middle-age Persian documents in preparation for OCR application. To evaluate the performance of the restoration algorithm, 200 pages of the Pahlavi documents are used as experimental data in our test. Numerical results indicate that the proposed algorithm can remove the noise and destructive effects. The results also show 99.14% accuracy on the baseline detection, 97.35% accuracy on the text line extraction and removing other lines overlaps, and 99.5% accuracy for segmenting the extracted text lines to their components.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001388 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001388 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:25A5B5A5BE3B1263DAEBBEBDAB6E9BA6A67855BB |texte= A Restoration and Segmentation Unit for the Historic Persian Documents }}
This area was generated with Dilib version V0.6.32. |