Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

Identifieur interne : 000D32 ( Istex/Curation ); précédent : 000D31; suivant : 000D33

A Novel Approach for Word Spotting Using Merge-Split Edit Distance

Auteurs : Khurram Khurshid [France] ; Claudie Faure [France] ; Nicole Vincent [France]

Source :

RBID : ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7

Abstract

Abstract: Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.

Url:
DOI: 10.1007/978-3-642-03767-2_26

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A Novel Approach for Word Spotting Using Merge-Split Edit Distance</title>
<author>
<name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
<affiliation wicri:level="1">
<mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: khurram.khurshid@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<affiliation wicri:level="1">
<mods:affiliation>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: cfaure@enst.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
<affiliation wicri:level="1">
<mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: nicole.vincent@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03767-2_26</idno>
<idno type="url">https://api.istex.fr/document/8C1F3989D2466FF4A187343DA0F0E8326A4176F7/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000D61</idno>
<idno type="wicri:Area/Istex/Curation">000D32</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A Novel Approach for Word Spotting Using Merge-Split Edit Distance</title>
<author>
<name sortKey="Khurshid, Khurram" sort="Khurshid, Khurram" uniqKey="Khurshid K" first="Khurram" last="Khurshid">Khurram Khurshid</name>
<affiliation wicri:level="1">
<mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: khurram.khurshid@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Faure, Claudie" sort="Faure, Claudie" uniqKey="Faure C" first="Claudie" last="Faure">Claudie Faure</name>
<affiliation wicri:level="1">
<mods:affiliation>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>UMR CNRS 5141 - GET ENST, 46 rue Barrault, 75634, Paris Cedex 13</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: cfaure@enst.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
<author>
<name sortKey="Vincent, Nicole" sort="Vincent, Nicole" uniqKey="Vincent N" first="Nicole" last="Vincent">Nicole Vincent</name>
<affiliation wicri:level="1">
<mods:affiliation>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris, France</mods:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>Laboratoire CRIP5 – SIP, Université Paris Descartes, 45 rue des Saints-Pères, 75006, Paris</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: nicole.vincent@mi.parisdescartes.fr</mods:affiliation>
<country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">8C1F3989D2466FF4A187343DA0F0E8326A4176F7</idno>
<idno type="DOI">10.1007/978-3-642-03767-2_26</idno>
<idno type="ChapterID">26</idno>
<idno type="ChapterID">Chap26</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Edit distance matching has been used in literature for word spotting with characters taken as primitives. The recognition rate however, is limited by the segmentation inconsistencies of characters (broken or merged) caused by noisy images or distorted characters. In this paper, we have proposed a Merge-split edit distance which overcomes these segmentation problems by incorporating a multi-purpose merge cost function. The system is based on the extraction of words and characters in the text and then attributing each character with a set of features. Characters are matched by comparing their extracted feature sets using Dynamic Time Warping (DTW) while the words are matched by comparing the strings of characters using the proposed Merge-Split Edit distance algorithm. Evaluation of the method on 19th century historical document images exhibits extremely promising results.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D32 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 000D32 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:8C1F3989D2466FF4A187343DA0F0E8326A4176F7
   |texte=   A Novel Approach for Word Spotting Using Merge-Split Edit Distance
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024