Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

An Approach for Adding Noise-Tolerance to Restricted-Domain Information Retrieval

Identifieur interne : 000153 ( Istex/Curation ); précédent : 000152; suivant : 000154

An Approach for Adding Noise-Tolerance to Restricted-Domain Information Retrieval

Auteurs : Katia Vila [Cuba, Espagne] ; Josval Díaz [Cuba] ; Antonio Fernández [Cuba] ; Antonio Ferrández [Espagne]

Source :

RBID : ISTEX:861EAEEFE366F3A4F5DD834D8DE62A6F533D21F2

Abstract

Abstract: Corpus of Information Retrieval (IR) systems are formed by text documents that often come from rather heterogeneous sources, such as Web sites or OCR (Optical Character Recognition) systems. Faithfully converting these sources into flat text files is not a trivial task, since noise can be easily introduced due to spelling or typeset errors. Importantly, if the size of the corpus is large enough, then redundancy helps in controlling the effects of noise because the same text often appears with and without noise throughout the corpus. Conversely, noise becomes a serious problem in restricted-domain IR where corpus is usually small and it has little or no redundancy. Therefore, noise hinders the retrieval task in restricted domains and erroneous results are likely to be obtained. In order to overcome this situation, this paper presents an approach for using restricted-domain resources, such as Knowledge Organization Systems (KOS), to add noise-tolerance to existing IR systems. To show the suitability of our approach in one real restricted-domain case study, a set of experiments has been carried out for the agricultural domain.

Url:
DOI: 10.1007/978-3-642-13881-2_1

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:861EAEEFE366F3A4F5DD834D8DE62A6F533D21F2

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">An Approach for Adding Noise-Tolerance to Restricted-Domain Information Retrieval</title>
<author>
<name sortKey="Vila, Katia" sort="Vila, Katia" uniqKey="Vila K" first="Katia" last="Vila">Katia Vila</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas, Cuba</mods:affiliation>
<country xml:lang="fr">Cuba</country>
<wicri:regionArea>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: kvila@dlsi.ua.es</mods:affiliation>
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author>
<name sortKey="Diaz, Josval" sort="Diaz, Josval" uniqKey="Diaz J" first="Josval" last="Díaz">Josval Díaz</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas, Cuba</mods:affiliation>
<country xml:lang="fr">Cuba</country>
<wicri:regionArea>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: josval.diaz@umcc.cu</mods:affiliation>
<country wicri:rule="url">Cuba</country>
</affiliation>
</author>
<author>
<name sortKey="Fernandez, Antonio" sort="Fernandez, Antonio" uniqKey="Fernandez A" first="Antonio" last="Fernández">Antonio Fernández</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas, Cuba</mods:affiliation>
<country xml:lang="fr">Cuba</country>
<wicri:regionArea>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: antonio.fernandez@umcc.cu</mods:affiliation>
<country wicri:rule="url">Cuba</country>
</affiliation>
</author>
<author>
<name sortKey="Ferrandez, Antonio" sort="Ferrandez, Antonio" uniqKey="Ferrandez A" first="Antonio" last="Ferrández">Antonio Ferrández</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Software and Computing Systems, University of Alicante, San Vicente del Raspeig Road, 03690, Alicante, Spain</mods:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Department of Software and Computing Systems, University of Alicante, San Vicente del Raspeig Road, 03690, Alicante</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: antonio@dlsi.ua.es</mods:affiliation>
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:861EAEEFE366F3A4F5DD834D8DE62A6F533D21F2</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-13881-2_1</idno>
<idno type="url">https://api.istex.fr/document/861EAEEFE366F3A4F5DD834D8DE62A6F533D21F2/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000155</idno>
<idno type="wicri:Area/Istex/Curation">000153</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">An Approach for Adding Noise-Tolerance to Restricted-Domain Information Retrieval</title>
<author>
<name sortKey="Vila, Katia" sort="Vila, Katia" uniqKey="Vila K" first="Katia" last="Vila">Katia Vila</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas, Cuba</mods:affiliation>
<country xml:lang="fr">Cuba</country>
<wicri:regionArea>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: kvila@dlsi.ua.es</mods:affiliation>
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author>
<name sortKey="Diaz, Josval" sort="Diaz, Josval" uniqKey="Diaz J" first="Josval" last="Díaz">Josval Díaz</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas, Cuba</mods:affiliation>
<country xml:lang="fr">Cuba</country>
<wicri:regionArea>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: josval.diaz@umcc.cu</mods:affiliation>
<country wicri:rule="url">Cuba</country>
</affiliation>
</author>
<author>
<name sortKey="Fernandez, Antonio" sort="Fernandez, Antonio" uniqKey="Fernandez A" first="Antonio" last="Fernández">Antonio Fernández</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas, Cuba</mods:affiliation>
<country xml:lang="fr">Cuba</country>
<wicri:regionArea>Department of Informatics, University of Matanzas, Varadero Road, 40100, Matanzas</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: antonio.fernandez@umcc.cu</mods:affiliation>
<country wicri:rule="url">Cuba</country>
</affiliation>
</author>
<author>
<name sortKey="Ferrandez, Antonio" sort="Ferrandez, Antonio" uniqKey="Ferrandez A" first="Antonio" last="Ferrández">Antonio Ferrández</name>
<affiliation wicri:level="1">
<mods:affiliation>Department of Software and Computing Systems, University of Alicante, San Vicente del Raspeig Road, 03690, Alicante, Spain</mods:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Department of Software and Computing Systems, University of Alicante, San Vicente del Raspeig Road, 03690, Alicante</wicri:regionArea>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>E-mail: antonio@dlsi.ua.es</mods:affiliation>
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2010</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">861EAEEFE366F3A4F5DD834D8DE62A6F533D21F2</idno>
<idno type="DOI">10.1007/978-3-642-13881-2_1</idno>
<idno type="ChapterID">1</idno>
<idno type="ChapterID">Chap1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Corpus of Information Retrieval (IR) systems are formed by text documents that often come from rather heterogeneous sources, such as Web sites or OCR (Optical Character Recognition) systems. Faithfully converting these sources into flat text files is not a trivial task, since noise can be easily introduced due to spelling or typeset errors. Importantly, if the size of the corpus is large enough, then redundancy helps in controlling the effects of noise because the same text often appears with and without noise throughout the corpus. Conversely, noise becomes a serious problem in restricted-domain IR where corpus is usually small and it has little or no redundancy. Therefore, noise hinders the retrieval task in restricted domains and erroneous results are likely to be obtained. In order to overcome this situation, this paper presents an approach for using restricted-domain resources, such as Knowledge Organization Systems (KOS), to add noise-tolerance to existing IR systems. To show the suitability of our approach in one real restricted-domain case study, a set of experiments has been carried out for the agricultural domain.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000153 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 000153 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:861EAEEFE366F3A4F5DD834D8DE62A6F533D21F2
   |texte=   An Approach for Adding Noise-Tolerance to Restricted-Domain Information Retrieval
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024