Using Consensus Sequence Voting to Correct OCR Errors
Identifieur interne : 002594 ( Main/Merge ); précédent : 002593; suivant : 002595Using Consensus Sequence Voting to Correct OCR Errors
Auteurs : Daniel Lopresti ; Jiangying ZhouSource :
- Computer Vision and Image Understanding [ 1077-3142 ] ; 1996.
Abstract
We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).
Url:
DOI: 10.1006/cviu.1996.0502
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 000036
- to stream Istex, to step Curation: 000036
- to stream Istex, to step Checkpoint: 001915
Links to Exploration step
ISTEX:15AD7C1ECD56158935609D33695A742405F45145Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Using Consensus Sequence Voting to Correct OCR Errors</title>
<author><name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
</author>
<author><name sortKey="Zhou, Jiangying" sort="Zhou, Jiangying" uniqKey="Zhou J" first="Jiangying" last="Zhou">Jiangying Zhou</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:15AD7C1ECD56158935609D33695A742405F45145</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1006/cviu.1996.0502</idno>
<idno type="url">https://api.istex.fr/document/15AD7C1ECD56158935609D33695A742405F45145/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000036</idno>
<idno type="wicri:Area/Istex/Curation">000036</idno>
<idno type="wicri:Area/Istex/Checkpoint">001915</idno>
<idno type="wicri:doubleKey">1077-3142:1997:Lopresti D:using:consensus:sequence</idno>
<idno type="wicri:Area/Main/Merge">002594</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Using Consensus Sequence Voting to Correct OCR Errors</title>
<author><name sortKey="Lopresti, Daniel" sort="Lopresti, Daniel" uniqKey="Lopresti D" first="Daniel" last="Lopresti">Daniel Lopresti</name>
<affiliation><wicri:noCountry code="subField">08540</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Zhou, Jiangying" sort="Zhou, Jiangying" uniqKey="Zhou J" first="Jiangying" last="Zhou">Jiangying Zhou</name>
<affiliation><wicri:noCountry code="subField">08540</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Computer Vision and Image Understanding</title>
<title level="j" type="abbrev">YCVIU</title>
<idno type="ISSN">1077-3142</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">67</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="39">39</biblScope>
<biblScope unit="page" to="47">47</biblScope>
</imprint>
<idno type="ISSN">1077-3142</idno>
</series>
<idno type="istex">15AD7C1ECD56158935609D33695A742405F45145</idno>
<idno type="DOI">10.1006/cviu.1996.0502</idno>
<idno type="PII">S1077-3142(96)90502-0</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">1077-3142</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">We present experimental results suggesting that between 20 and 50% of the errors caused by a single OCR package can be eliminated by simply scanning a page three times and running a “consensus sequence” voting procedure. This technique, which originates from molecular biology, takes exponential time in general, but can be specialized to a fast heuristic guaranteed to be optimal for the cases of interest. The improvement in recognition accuracy is achieved without makinga prioriassumptions about the distribution of OCR errors (i.e., no “training” is required).</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002594 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002594 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:15AD7C1ECD56158935609D33695A742405F45145 |texte= Using Consensus Sequence Voting to Correct OCR Errors }}
This area was generated with Dilib version V0.6.32. |