Do Thesauri enhance rule-based categorization for OCR text?
Identifieur interne : 001877 ( Main/Merge ); précédent : 001876; suivant : 001878Do Thesauri enhance rule-based categorization for OCR text?
Auteurs : Kazem Taghva [États-Unis] ; Jeffrey Coombs [États-Unis]Source :
- SPIE proceedings series [ 1017-2653 ] ; 2003.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
A rule-based automatic text categorizer was tested to see if two types of thesaurus expansion, called query expansion and Junker expansion respectively, would improve categorization. Thesauri used were domain-specific to an OCR (Optical Character Recognition) test collection focussed on a single topic. Results show that neither type of expansion significantly improved categorization.
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000597
- to stream PascalFrancis, to step Curation: 000194
- to stream PascalFrancis, to step Checkpoint: 000563
Links to Exploration step
Pascal:03-0421336Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Do Thesauri enhance rule-based categorization for OCR text?</title>
<author><name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">03-0421336</idno>
<date when="2003">2003</date>
<idno type="stanalyst">PASCAL 03-0421336 INIST</idno>
<idno type="RBID">Pascal:03-0421336</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000597</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000194</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000563</idno>
<idno type="wicri:doubleKey">1017-2653:2003:Taghva K:do:thesauri:enhance</idno>
<idno type="wicri:Area/Main/Merge">001877</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Do Thesauri enhance rule-based categorization for OCR text?</title>
<author><name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2"><inist:fA14 i1="01"><s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="2003">2003</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic classification</term>
<term>Categorization</term>
<term>Improvement</term>
<term>Optical character recognition</term>
<term>Performance evaluation</term>
<term>Query expansion</term>
<term>Thesaurus</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance optique caractère</term>
<term>Catégorisation</term>
<term>Classification automatique</term>
<term>Thesaurus</term>
<term>Amélioration</term>
<term>Evaluation performance</term>
<term>Règle</term>
<term>Junker (M.)</term>
<term>C-KANT (Clips Knowledge Acquisition eNgine for Text categorization)</term>
<term>Elargissement question</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A rule-based automatic text categorizer was tested to see if two types of thesaurus expansion, called query expansion and Junker expansion respectively, would improve categorization. Thesauri used were domain-specific to an OCR (Optical Character Recognition) test collection focussed on a single topic. Results show that neither type of expansion significantly improved categorization.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Nevada</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Nevada"><name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</region>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001877 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 001877 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= Pascal:03-0421336 |texte= Do Thesauri enhance rule-based categorization for OCR text? }}
This area was generated with Dilib version V0.6.32. |