OcrV1, Main, Merge, bibRecord, 003B17

Visual text recognition through contextual processing

Identifieur interne : 003B17 ( Main/Merge ); précédent : 003B16; suivant : 003B18

Visual text recognition through contextual processing

Auteurs : R. M. K. Sinha [Canada] ; Birendra Prasada [Canada]

Source :

Pattern Recognition [ 0031-3203 ] ; 1987.

RBID : ISTEX:40428F17F872B9FA6D65042F200F226F25774C00

Abstract

In most of the works on contextual processing using a dictionary it is assumed that all the words of the document lie within the dictionary. It is also assumed that a clear word boundary exists. In practice neither is true.In this work we present a two pass contextual processing algorithm limited to the word level using a partial dictionary with an augmented dictionary approach, modified Viterbi algorithm and some heuristics based on pragmatic features. A character confusion matrix obtained through training and weighted with respect to dictionary words within the document is used to generate aliases for the input word. It has been tested with an omnifont character recogniser on documents of varying types. The overall performance of our system exceeds 98% correct character recognition (97% word recognition) which is better than that of other reported works.

Url:

https://api.istex.fr/document/40428F17F872B9FA6D65042F200F226F25774C00/fulltext/pdf

DOI: 10.1016/0031-3203(88)90006-4

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000838
to stream Istex, to step Curation: 000829
to stream Istex, to step Checkpoint: 002A95

Links to Exploration step

ISTEX:40428F17F872B9FA6D65042F200F226F25774C00

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>Visual text recognition through contextual processing</title>
<author><name sortKey="Sinha, R M K" sort="Sinha, R M K" uniqKey="Sinha R" first="R. M. K." last="Sinha">R. M. K. Sinha</name>
</author>
<author><name sortKey="Prasada, Birendra" sort="Prasada, Birendra" uniqKey="Prasada B" first="Birendra" last="Prasada">Birendra Prasada</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:40428F17F872B9FA6D65042F200F226F25774C00</idno>
<date when="1988" year="1988">1988</date>
<idno type="doi">10.1016/0031-3203(88)90006-4</idno>
<idno type="url">https://api.istex.fr/document/40428F17F872B9FA6D65042F200F226F25774C00/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000838</idno>
<idno type="wicri:Area/Istex/Curation">000829</idno>
<idno type="wicri:Area/Istex/Checkpoint">002A95</idno>
<idno type="wicri:doubleKey">0031-3203:1988:Sinha R:visual:text:recognition</idno>
<idno type="wicri:Area/Main/Merge">003B17</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">Visual text recognition through contextual processing</title>
<author><name sortKey="Sinha, R M K" sort="Sinha, R M K" uniqKey="Sinha R" first="R. M. K." last="Sinha">R. M. K. Sinha</name>
<affiliation wicri:level="1"><country>Canada</country>
<wicri:regionArea>INRS-Télécommunications, 3 Place du Commerce, Ile des Soeurs, Québec</wicri:regionArea>
<wicri:noRegion>Québec</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Prasada, Birendra" sort="Prasada, Birendra" uniqKey="Prasada B" first="Birendra" last="Prasada">Birendra Prasada</name>
<affiliation wicri:level="1"><country>Canada</country>
<wicri:regionArea>INRS-Télécommunications, 3 Place du Commerce, Ile des Soeurs, Québec</wicri:regionArea>
<wicri:noRegion>Québec</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1987">1987</date>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="463">463</biblScope>
<biblScope unit="page" to="479">479</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">40428F17F872B9FA6D65042F200F226F25774C00</idno>
<idno type="DOI">10.1016/0031-3203(88)90006-4</idno>
<idno type="PII">0031-3203(88)90006-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In most of the works on contextual processing using a dictionary it is assumed that all the words of the document lie within the dictionary. It is also assumed that a clear word boundary exists. In practice neither is true.In this work we present a two pass contextual processing algorithm limited to the word level using a partial dictionary with an augmented dictionary approach, modified Viterbi algorithm and some heuristics based on pragmatic features. A character confusion matrix obtained through training and weighted with respect to dictionary words within the document is used to generate aliases for the input word. It has been tested with an omnifont character recogniser on documents of varying types. The overall performance of our system exceeds 98% correct character recognition (97% word recognition) which is better than that of other reported works.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 003B17 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 003B17 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:40428F17F872B9FA6D65042F200F226F25774C00
   |texte=   Visual text recognition through contextual processing
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Visual text recognition through contextual processing

Visual text recognition through contextual processing

Source :

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri