A Novel Approach for Word Spotting Using Merge-Split Edit Distance
Identifieur interne : 000D32 ( Istex/Curation ); précédent : 000D31; suivant : 000D33A Novel Approach for Word Spotting Using Merge-Split Edit Distance
Auteurs : Khurram Khurshid [France] ; Claudie Faure [France] ; Nicole Vincent [France]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.
Abstract
Abstract: Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.
Url:
DOI: 10.1007/978-3-642-03767-2_26
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: Pour aller vers cette notice dans l'étape Curation :000D61
Links to Exploration step
ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">A Novel Approach for Word Spotting Using Merge-Split Edit Distance</title>
<author><name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
<affiliation wicri:level="1"><mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: khurram.khurshid@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<affiliation wicri:level="1"><mods:affiliation>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: cfaure@enst.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
<affiliation wicri:level="1"><mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: nicole.vincent@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03767-2_26</idno>
<idno type="url">https://api.istex.fr/document/8C1F3989D2466FF4A187343DA0F0E8326A4176F7/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000D61</idno>
<idno type="wicri:Area/Istex/Curation">000D32</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">A Novel Approach for Word Spotting Using Merge-Split Edit Distance</title>
<author><name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
<affiliation wicri:level="1"><mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: khurram.khurshid@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<affiliation wicri:level="1"><mods:affiliation>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: cfaure@enst.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author><name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
<affiliation wicri:level="1"><mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1"><mods:affiliation>E-mail: nicole.vincent@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">8C1F3989D2466FF4A187343DA0F0E8326A4176F7</idno>
<idno type="DOI">10.1007/978-3-642-03767-2_26</idno>
<idno type="ChapterID">26</idno>
<idno type="ChapterID">Chap26</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D32 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 000D32 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Istex |étape= Curation |type= RBID |clé= ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7 |texte= A Novel Approach for Word Spotting Using Merge-Split Edit Distance }}
This area was generated with Dilib version V0.6.32. |