OcrV1, Main, Merge, bibRecord, 002594

Using Consensus Sequence Voting to Correct OCR Errors

Identifieur interne : 002594 ( Main/Merge ); précédent : 002593; suivant : 002595

Using Consensus Sequence Voting to Correct OCR Errors

Auteurs : Daniel Lopresti ; Jiangying Zhou

Source :

Computer Vision and Image Understanding [ 1077-3142 ] ; 1996.

RBID : ISTEX:15AD7C1ECD56158935609D33695A742405F45145

Abstract

We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).

Url:

https://api.istex.fr/document/15AD7C1ECD56158935609D33695A742405F45145/fulltext/pdf

DOI: 10.1006/cviu.1996.0502

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000036
to stream Istex, to step Curation: 000036
to stream Istex, to step Checkpoint: 001915

Links to Exploration step

ISTEX:15AD7C1ECD56158935609D33695A742405F45145

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Using Consensus Sequence Voting to Correct OCR Errors</title>
<author><name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
</author>
<author><name sortKey="Zhou, Jiangying" sort="Zhou, Jiangying" uniqKey="Zhou J" first="Jiangying" last="Zhou">Jiangying Zhou</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:15AD7C1ECD56158935609D33695A742405F45145</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1006/cviu.1996.0502</idno>
<idno type="url">https://api.istex.fr/document/15AD7C1ECD56158935609D33695A742405F45145/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000036</idno>
<idno type="wicri:Area/Istex/Curation">000036</idno>
<idno type="wicri:Area/Istex/Checkpoint">001915</idno>
<idno type="wicri:doubleKey">1077-3142:1997:Lopresti D:using:consensus:sequence</idno>
<idno type="wicri:Area/Main/Merge">002594</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Using Consensus Sequence Voting to Correct OCR Errors</title>
<author><name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
<affiliation><wicri:noCountry code="subField">08540</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Zhou, Jiangying" sort="Zhou, Jiangying" uniqKey="Zhou J" first="Jiangying" last="Zhou">Jiangying Zhou</name>
<affiliation><wicri:noCountry code="subField">08540</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Computer Vision and Image Understanding</title>
<title level="j" type="abbrev">YCVIU</title>
<idno type="ISSN">1077-3142</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">67</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="39">39</biblScope>
<biblScope unit="page" to="47">47</biblScope>
</imprint>
<idno type="ISSN">1077-3142</idno>
</series>
<idno type="istex">15AD7C1ECD56158935609D33695A742405F45145</idno>
<idno type="DOI">10.1006/cviu.1996.0502</idno>
<idno type="PII">S1077-3142(96)90502-0</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1077-3142</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002594 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002594 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:15AD7C1ECD56158935609D33695A742405F45145
   |texte=   Using Consensus Sequence Voting to Correct OCR Errors
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Using Consensus Sequence Voting to Correct OCR Errors

Using Consensus Sequence Voting to Correct OCR Errors

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri