Keyword Spotting on Korean Document Images by Matching the Keyword Image
Identifieur interne : 001320 ( Main/Merge ); précédent : 001319; suivant : 001321Keyword Spotting on Korean Document Images by Matching the Keyword Image
Auteurs : Hyung Kim [Corée du Sud] ; Cheol Park [Corée du Sud] ; Bu Jeong [Corée du Sud] ; Soo Kim [Corée du Sud] ; Ro Park [Corée du Sud] ; Sang Lee [Corée du Sud]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2005.
Abstract
Abstract: In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.
Url:
DOI: 10.1007/11599517_18
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000419
- to stream Istex, to step Curation: 000412
- to stream Istex, to step Checkpoint: 000B93
Links to Exploration step
ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Keyword Spotting on Korean Document Images by Matching the Keyword Image</title>
<author><name sortKey="Kim, Hyung" sort="Kim, Hyung" uniqKey="Kim H" first="Hyung" last="Kim">Hyung Kim</name>
</author>
<author><name sortKey="Park, Cheol" sort="Park, Cheol" uniqKey="Park C" first="Cheol" last="Park">Cheol Park</name>
</author>
<author><name sortKey="Jeong, Bu" sort="Jeong, Bu" uniqKey="Jeong B" first="Bu" last="Jeong">Bu Jeong</name>
</author>
<author><name sortKey="Kim, Soo" sort="Kim, Soo" uniqKey="Kim S" first="Soo" last="Kim">Soo Kim</name>
</author>
<author><name sortKey="Park, Ro" sort="Park, Ro" uniqKey="Park R" first="Ro" last="Park">Ro Park</name>
</author>
<author><name sortKey="Lee, Sang" sort="Lee, Sang" uniqKey="Lee S" first="Sang" last="Lee">Sang Lee</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56</idno>
<date when="2005" year="2005">2005</date>
<idno type="doi">10.1007/11599517_18</idno>
<idno type="url">https://api.istex.fr/document/540DE1824CA5AA632F38A3D06C4870EADF34EE56/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000419</idno>
<idno type="wicri:Area/Istex/Curation">000412</idno>
<idno type="wicri:Area/Istex/Checkpoint">000B93</idno>
<idno type="wicri:doubleKey">0302-9743:2005:Kim H:keyword:spotting:on</idno>
<idno type="wicri:Area/Main/Merge">001320</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Keyword Spotting on Korean Document Images by Matching the Keyword Image</title>
<author><name sortKey="Kim, Hyung" sort="Kim, Hyung" uniqKey="Kim H" first="Hyung" last="Kim">Hyung Kim</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author><name sortKey="Park, Cheol" sort="Park, Cheol" uniqKey="Park C" first="Cheol" last="Park">Cheol Park</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author><name sortKey="Jeong, Bu" sort="Jeong, Bu" uniqKey="Jeong B" first="Bu" last="Jeong">Bu Jeong</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Internet Software, Honam University, 59-1 Sebong-dong, Gwangsan-gu, 506-714, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author><name sortKey="Kim, Soo" sort="Kim, Soo" uniqKey="Kim S" first="Soo" last="Kim">Soo Kim</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author><name sortKey="Park, Ro" sort="Park, Ro" uniqKey="Park R" first="Ro" last="Park">Ro Park</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
<author><name sortKey="Lee, Sang" sort="Lee, Sang" uniqKey="Lee S" first="Sang" last="Lee">Sang Lee</name>
<affiliation wicri:level="1"><country xml:lang="fr">Corée du Sud</country>
<wicri:regionArea>Department of Computer Science, Chonnam National University, 300 Yongbong-dong, Buk-gu, 500-700, Kwangju</wicri:regionArea>
<wicri:noRegion>Kwangju</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Corée du Sud</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2005</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">540DE1824CA5AA632F38A3D06C4870EADF34EE56</idno>
<idno type="DOI">10.1007/11599517_18</idno>
<idno type="ChapterID">18</idno>
<idno type="ChapterID">Chap18</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper, we propose a keyword spotting system for Korean document images and compare the proposed system with an OCR-based document retrieval system. The system is composed of character segmentation, feature extraction for the query keyword, and word-to-word matching. In the character segmentation step, we propose an effective method to resolve the connection between adjacent characters. In the query creation step, feature vector for the query is constructed by a combination of the features for the constituent characters. In the matching step, word-to-word matching is applied based on a character matching. We demonstrated that the proposed keyword spotting system is more efficient than the OCR-based one to search a keyword on Korean document images, especially when the quality of documents is quite poor.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001320 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001320 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:540DE1824CA5AA632F38A3D06C4870EADF34EE56 |texte= Keyword Spotting on Korean Document Images by Matching the Keyword Image }}
This area was generated with Dilib version V0.6.32. |