Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Keyword Spotting on Korean Document Images by Matching the Keyword Image

Identifieur interne : 001320 ( Main/Merge ); précédent : 001319; suivant : 001321

Keyword Spotting on Korean Document Images by Matching the Keyword Image

Auteurs : Hyung Kim [Corée du Sud] ; Cheol Park [Corée du Sud] ; Bu Jeong [Corée du Sud] ; Soo Kim [Corée du Sud] ; Ro Park [Corée du Sud] ; Sang Lee [Corée du Sud]

Source :

RBID : ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56

Abstract

Abstract: In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.

Url:
DOI: 10.1007/11599517_18

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Keyword Spotting on Korean Document Images by Matching the Keyword Image</title>
<author>
<name sortKey="Kim, Hyung" sort="Kim, Hyung" uniqKey="Kim H" first="Hyung" last="Kim">Hyung Kim</name>
</author>
<author>
<name sortKey="Park, Cheol" sort="Park, Cheol" uniqKey="Park C" first="Cheol" last="Park">Cheol Park</name>
</author>
<author>
<name sortKey="Jeong, Bu" sort="Jeong, Bu" uniqKey="Jeong B" first="Bu" last="Jeong">Bu Jeong</name>
</author>
<author>
<name sortKey="Kim, Soo" sort="Kim, Soo" uniqKey="Kim S" first="Soo" last="Kim">Soo Kim</name>
</author>
<author>
<name sortKey="Park, Ro" sort="Park, Ro" uniqKey="Park R" first="Ro" last="Park">Ro Park</name>
</author>
<author>
<name sortKey="Lee, Sang" sort="Lee, Sang" uniqKey="Lee S" first="Sang" last="Lee">Sang Lee</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11599517_18</idno>
<idno type="url">https://api.istex.fr/document/540DE1824CA5AA632F38A3D06C4870EADF34EE56/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000419</idno>
<idno type="wicri:Area/Istex/Curation">000412</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B93</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Kim H:keyword:spotting:on</idno>
<idno type="wicri:Area/Main/Merge">001320</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Keyword Spotting on Korean Document Images by Matching the Keyword Image</title>
<author>
<name sortKey="Kim, Hyung" sort="Kim, Hyung" uniqKey="Kim H" first="Hyung" last="Kim">Hyung Kim</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Park, Cheol" sort="Park, Cheol" uniqKey="Park C" first="Cheol" last="Park">Cheol Park</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Jeong, Bu" sort="Jeong, Bu" uniqKey="Jeong B" first="Bu" last="Jeong">Bu Jeong</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Internet Software, Honam University, 59-1 Sebong-dong, Gwangsan-gu, 506-714, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Kim, Soo" sort="Kim, Soo" uniqKey="Kim S" first="Soo" last="Kim">Soo Kim</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Park, Ro" sort="Park, Ro" uniqKey="Park R" first="Ro" last="Park">Ro Park</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author>
<name sortKey="Lee, Sang" sort="Lee, Sang" uniqKey="Lee S" first="Sang" last="Lee">Sang Lee</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">540DE1824CA5AA632F38A3D06C4870EADF34EE56</idno>
<idno type="DOI">10.1007/11599517_18</idno>
<idno type="ChapterID">18</idno>
<idno type="ChapterID">Chap18</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001320 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001320 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56
   |texte=   Keyword Spotting on Korean Document Images by Matching the Keyword Image
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024