Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Using Consensus Sequence Voting to Correct OCR Errors

Identifieur interne : 002594 ( Main/Merge ); précédent : 002593; suivant : 002595

Using Consensus Sequence Voting to Correct OCR Errors

Auteurs : Daniel Lopresti ; Jiangying Zhou

Source :

RBID : ISTEX:15AD7C1ECD56158935609D33695A742405F45145

Abstract

We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).

Url:
DOI: 10.1006/cviu.1996.0502

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:15AD7C1ECD56158935609D33695A742405F45145

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Using Consensus Sequence Voting to Correct OCR Errors</title>
<author>
<name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
</author>
<author>
<name sortKey="Zhou, Jiangying" sort="Zhou, Jiangying" uniqKey="Zhou J" first="Jiangying" last="Zhou">Jiangying Zhou</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:15AD7C1ECD56158935609D33695A742405F45145</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1006/cviu.1996.0502</idno>
<idno type="url">https://api.istex.fr/document/15AD7C1ECD56158935609D33695A742405F45145/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000036</idno>
<idno type="wicri:Area/Istex/Curation">000036</idno>
<idno type="wicri:Area/Istex/Checkpoint">001915</idno>
<idno type="wicri:doubleKey">1077-3142:1997:Lopresti D:using:consensus:sequence</idno>
<idno type="wicri:Area/Main/Merge">002594</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Using Consensus Sequence Voting to Correct OCR Errors</title>
<author>
<name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
<affiliation>
<wicri:noCountry code="subField">08540</wicri:noCountry>
</affiliation>
</author>
<author>
<name sortKey="Zhou, Jiangying" sort="Zhou, Jiangying" uniqKey="Zhou J" first="Jiangying" last="Zhou">Jiangying Zhou</name>
<affiliation>
<wicri:noCountry code="subField">08540</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Computer Vision and Image Understanding</title>
<title level="j" type="abbrev">YCVIU</title>
<idno type="ISSN">1077-3142</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">67</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="39">39</biblScope>
<biblScope unit="page" to="47">47</biblScope>
</imprint>
<idno type="ISSN">1077-3142</idno>
</series>
<idno type="istex">15AD7C1ECD56158935609D33695A742405F45145</idno>
<idno type="DOI">10.1006/cviu.1996.0502</idno>
<idno type="PII">S1077-3142(96)90502-0</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1077-3142</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002594 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002594 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:15AD7C1ECD56158935609D33695A742405F45145
   |texte=   Using Consensus Sequence Voting to Correct OCR Errors
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024