OcrV1, PascalFrancis, Corpus, bibRecord, 000622

An integrated system for the analysis and the recognition of characters in ancient documents

Identifieur interne : 000622 ( PascalFrancis/Corpus ); précédent : 000621; suivant : 000623

An integrated system for the analysis and the recognition of characters in ancient documents

Auteurs : Stefano Vezzosi ; Luigi Bedini ; Anna Tonazzini

Source :

Lecture notes in computer science [ 0302-9743 ] ; 2002.

RBID : Pascal:03-0248637

Descripteurs français

Pascal (Inist)
- Réseau neuronal, Système intégré, Analyse système, Reconnaissance caractère, Reconnaissance forme, Algorithme rétropropagation, Reconnaissance optique caractère, Méthode adaptative, Transformation ondelette, Détection seuil, Caractère imprimé, Document imprimé, Document imprimé ancien.

English descriptors

KwdEn :
- Adaptive method, Backpropagation algorithm, Character recognition, Integrated system, Neural network, Optical character recognition, Pattern recognition, Printed character, Printed document, System analysis, Threshold detection, Wavelet transformation.

Abstract

This paper describes an integrated system for processing and analyzing highly degraded ancient printed documents. For each page, the system reduces noise by wavelet-based filtering, extracts and segments the text lines into characters by a fast adaptive thresholding, and performs OCR by a feed-forward back-propagation multilayer neural network. The probability recognition is used as a discriminant parameter for determining the automatic activation of a feed-back process, leading back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques. The experimental results highlight the good performance of the whole system in the analysis of even strongly degraded texts.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0302-9743`
A05				`@2 2423`
A08	`01`	`1`	`ENG`	`@1 An integrated system for the analysis and the recognition of characters in ancient documents`
A09	`01`	`1`	`ENG`	`@1 DAS 2002 : document analysis systems V : Princeton NJ, 19-21 August 2002`
A11	`01`	`1`		`@1 VEZZOSI (Stefano)`
A11	`02`	`1`		`@1 BEDINI (Luigi)`
A11	`03`	`1`		`@1 TONAZZINI (Anna)`
A12	`01`	`1`		`@1 LOPRESTI (Daniel) @9 ed.`
A12	`02`	`1`		`@1 JIANYING HU @9 ed.`
A12	`03`	`1`		`@1 KASHI (Ramanujan) @9 ed.`
A14	`01`			`@1 Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1 @2 56124 Pisa @3 ITA @Z 1 aut. @Z 2 aut. @Z 3 aut.`
A20				`@1 49-52`
A21				`@1 2002`
A23	`01`			`@0 ENG`
A26	`01`			`@0 3-540-44068-2`
A43	`01`			`@1 INIST @2 16343 @5 354000108470940050`
A44				`@0 0000 @1 © 2003 INIST-CNRS. All rights reserved.`
A45				`@0 7 ref.`
A47	`01`	`1`		`@0 03-0248637`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Lecture notes in computer science`
A66	`01`			`@0 DEU`
C01	`01`		`ENG`	@0 This paper describes an integrated system for processing and analyzing highly degraded ancient printed documents. For each page, the system reduces noise by wavelet-based filtering, extracts and segments the text lines into characters by a fast adaptive thresholding, and performs OCR by a feed-forward back-propagation multilayer neural network. The probability recognition is used as a discriminant parameter for determining the automatic activation of a feed-back process, leading back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques. The experimental results highlight the good performance of the whole system in the analysis of even strongly degraded texts.
C02	`01`	`X`		`@0 001D02C03`
C03	`01`	`X`	`FRE`	`@0 Réseau neuronal @5 01`
C03	`01`	`X`	`ENG`	`@0 Neural network @5 01`
C03	`01`	`X`	`SPA`	`@0 Red neuronal @5 01`
C03	`02`	`X`	`FRE`	`@0 Système intégré @5 02`
C03	`02`	`X`	`ENG`	`@0 Integrated system @5 02`
C03	`02`	`X`	`SPA`	`@0 Sistema integrado @5 02`
C03	`03`	`X`	`FRE`	`@0 Analyse système @5 03`
C03	`03`	`X`	`ENG`	`@0 System analysis @5 03`
C03	`03`	`X`	`SPA`	`@0 Análisis sistema @5 03`
C03	`04`	`X`	`FRE`	`@0 Reconnaissance caractère @5 04`
C03	`04`	`X`	`ENG`	`@0 Character recognition @5 04`
C03	`04`	`X`	`SPA`	`@0 Reconocimiento carácter @5 04`
C03	`05`	`X`	`FRE`	`@0 Reconnaissance forme @5 05`
C03	`05`	`X`	`ENG`	`@0 Pattern recognition @5 05`
C03	`05`	`X`	`SPA`	`@0 Reconocimiento patrón @5 05`
C03	`06`	`X`	`FRE`	`@0 Algorithme rétropropagation @5 06`
C03	`06`	`X`	`ENG`	`@0 Backpropagation algorithm @5 06`
C03	`06`	`X`	`SPA`	`@0 Algoritmo retropropagación @5 06`
C03	`07`	`X`	`FRE`	`@0 Reconnaissance optique caractère @5 07`
C03	`07`	`X`	`ENG`	`@0 Optical character recognition @5 07`
C03	`07`	`X`	`SPA`	`@0 Reconocimento óptico de caracteres @5 07`
C03	`08`	`X`	`FRE`	`@0 Méthode adaptative @5 08`
C03	`08`	`X`	`ENG`	`@0 Adaptive method @5 08`
C03	`08`	`X`	`SPA`	`@0 Método adaptativo @5 08`
C03	`09`	`X`	`FRE`	`@0 Transformation ondelette @5 09`
C03	`09`	`X`	`ENG`	`@0 Wavelet transformation @5 09`
C03	`09`	`X`	`SPA`	`@0 Transformación ondita @5 09`
C03	`10`	`X`	`FRE`	`@0 Détection seuil @5 10`
C03	`10`	`X`	`ENG`	`@0 Threshold detection @5 10`
C03	`10`	`X`	`SPA`	`@0 Detección umbral @5 10`
C03	`11`	`X`	`FRE`	`@0 Caractère imprimé @5 11`
C03	`11`	`X`	`ENG`	`@0 Printed character @5 11`
C03	`11`	`X`	`SPA`	`@0 Carácter impreso @5 11`
C03	`12`	`X`	`FRE`	`@0 Document imprimé @5 12`
C03	`12`	`X`	`ENG`	`@0 Printed document @5 12`
C03	`12`	`X`	`SPA`	`@0 Documento impreso @5 12`
C03	`13`	`X`	`FRE`	`@0 Document imprimé ancien @4 INC @5 82`
N21				`@1 160`
N82				`@1 PSI`

A30	`01`	`1`	`ENG`	`@1 IAPR workshop on document analysis systems @2 5 @3 Princeton NJ USA @4 2002-08-19`

Format Inist (serveur)

NO :	PASCAL 03-0248637 INIST
ET :	An integrated system for the analysis and the recognition of characters in ancient documents
AU :	VEZZOSI (Stefano); BEDINI (Luigi); TONAZZINI (Anna); LOPRESTI (Daniel); JIANYING HU; KASHI (Ramanujan)
AF :	Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1/56124 Pisa/Italie (1 aut., 2 aut., 3 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 2002; Vol. 2423; Pp. 49-52; Bibl. 7 ref.
LA :	Anglais
EA :	This paper describes an integrated system for processing and analyzing highly degraded ancient printed documents. For each page, the system reduces noise by wavelet-based filtering, extracts and segments the text lines into characters by a fast adaptive thresholding, and performs OCR by a feed-forward back-propagation multilayer neural network. The probability recognition is used as a discriminant parameter for determining the automatic activation of a feed-back process, leading back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques. The experimental results highlight the good performance of the whole system in the analysis of even strongly degraded texts.
CC :	001D02C03
FD :	Réseau neuronal; Système intégré; Analyse système; Reconnaissance caractère; Reconnaissance forme; Algorithme rétropropagation; Reconnaissance optique caractère; Méthode adaptative; Transformation ondelette; Détection seuil; Caractère imprimé; Document imprimé; Document imprimé ancien
ED :	Neural network; Integrated system; System analysis; Character recognition; Pattern recognition; Backpropagation algorithm; Optical character recognition; Adaptive method; Wavelet transformation; Threshold detection; Printed character; Printed document
SD :	Red neuronal; Sistema integrado; Análisis sistema; Reconocimiento carácter; Reconocimiento patrón; Algoritmo retropropagación; Reconocimento óptico de caracteres; Método adaptativo; Transformación ondita; Detección umbral; Carácter impreso; Documento impreso
LO :	INIST-16343.354000108470940050
ID :	03-0248637

Links to Exploration step

Pascal:03-0248637

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">An integrated system for the analysis and the recognition of characters in ancient documents</title>
<author><name sortKey="Vezzosi, Stefano" sort="Vezzosi, Stefano" uniqKey="Vezzosi S" first="Stefano" last="Vezzosi">Stefano Vezzosi</name>
<affiliation><inist:fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Bedini, Luigi" sort="Bedini, Luigi" uniqKey="Bedini L" first="Luigi" last="Bedini">Luigi Bedini</name>
<affiliation><inist:fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Tonazzini, Anna" sort="Tonazzini, Anna" uniqKey="Tonazzini A" first="Anna" last="Tonazzini">Anna Tonazzini</name>
<affiliation><inist:fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">03-0248637</idno>
<date when="2002">2002</date>
<idno type="stanalyst">PASCAL 03-0248637 INIST</idno>
<idno type="RBID">Pascal:03-0248637</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000622</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">An integrated system for the analysis and the recognition of characters in ancient documents</title>
<author><name sortKey="Vezzosi, Stefano" sort="Vezzosi, Stefano" uniqKey="Vezzosi S" first="Stefano" last="Vezzosi">Stefano Vezzosi</name>
<affiliation><inist:fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Bedini, Luigi" sort="Bedini, Luigi" uniqKey="Bedini L" first="Luigi" last="Bedini">Luigi Bedini</name>
<affiliation><inist:fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Tonazzini, Anna" sort="Tonazzini, Anna" uniqKey="Tonazzini A" first="Anna" last="Tonazzini">Anna Tonazzini</name>
<affiliation><inist:fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="2002">2002</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Adaptive method</term>
<term>Backpropagation algorithm</term>
<term>Character recognition</term>
<term>Integrated system</term>
<term>Neural network</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Printed character</term>
<term>Printed document</term>
<term>System analysis</term>
<term>Threshold detection</term>
<term>Wavelet transformation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Réseau neuronal</term>
<term>Système intégré</term>
<term>Analyse système</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Algorithme rétropropagation</term>
<term>Reconnaissance optique caractère</term>
<term>Méthode adaptative</term>
<term>Transformation ondelette</term>
<term>Détection seuil</term>
<term>Caractère imprimé</term>
<term>Document imprimé</term>
<term>Document imprimé ancien</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper describes an integrated system for processing and analyzing highly degraded ancient printed documents. For each page, the system reduces noise by wavelet-based filtering, extracts and segments the text lines into characters by a fast adaptive thresholding, and performs OCR by a feed-forward back-propagation multilayer neural network. The probability recognition is used as a discriminant parameter for determining the automatic activation of a feed-back process, leading back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques. The experimental results highlight the good performance of the whole system in the analysis of even strongly degraded texts.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0302-9743</s0>
</fA01>
<fA05><s2>2423</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>An integrated system for the analysis and the recognition of characters in ancient documents</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>DAS 2002 : document analysis systems V : Princeton NJ, 19-21 August 2002</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>VEZZOSI (Stefano)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>BEDINI (Luigi)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>TONAZZINI (Anna)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>LOPRESTI (Daniel)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>JIANYING HU</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="03" i2="1"><s1>KASHI (Ramanujan)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1</s1>
<s2>56124 Pisa</s2>
<s3>ITA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA20><s1>49-52</s1>
</fA20>
<fA21><s1>2002</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA26 i1="01"><s0>3-540-44068-2</s0>
</fA26>
<fA43 i1="01"><s1>INIST</s1>
<s2>16343</s2>
<s5>354000108470940050</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2003 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>7 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>03-0248637</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Lecture notes in computer science</s0>
</fA64>
<fA66 i1="01"><s0>DEU</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>This paper describes an integrated system for processing and analyzing highly degraded ancient printed documents. For each page, the system reduces noise by wavelet-based filtering, extracts and segments the text lines into characters by a fast adaptive thresholding, and performs OCR by a feed-forward back-propagation multilayer neural network. The probability recognition is used as a discriminant parameter for determining the automatic activation of a feed-back process, leading back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques. The experimental results highlight the good performance of the whole system in the analysis of even strongly degraded texts.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Réseau neuronal</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Neural network</s0>
<s5>01</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Red neuronal</s0>
<s5>01</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Système intégré</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Integrated system</s0>
<s5>02</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Sistema integrado</s0>
<s5>02</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Analyse système</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>System analysis</s0>
<s5>03</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Análisis sistema</s0>
<s5>03</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Reconnaissance caractère</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Character recognition</s0>
<s5>04</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Reconocimiento carácter</s0>
<s5>04</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Reconnaissance forme</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Pattern recognition</s0>
<s5>05</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Reconocimiento patrón</s0>
<s5>05</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Algorithme rétropropagation</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Backpropagation algorithm</s0>
<s5>06</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Algoritmo retropropagación</s0>
<s5>06</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Optical character recognition</s0>
<s5>07</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Reconocimento óptico de caracteres</s0>
<s5>07</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Méthode adaptative</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Adaptive method</s0>
<s5>08</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Método adaptativo</s0>
<s5>08</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Transformation ondelette</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Wavelet transformation</s0>
<s5>09</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Transformación ondita</s0>
<s5>09</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Détection seuil</s0>
<s5>10</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Threshold detection</s0>
<s5>10</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Detección umbral</s0>
<s5>10</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Caractère imprimé</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>Printed character</s0>
<s5>11</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Carácter impreso</s0>
<s5>11</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Document imprimé</s0>
<s5>12</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Printed document</s0>
<s5>12</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Documento impreso</s0>
<s5>12</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE"><s0>Document imprimé ancien</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fN21><s1>160</s1>
</fN21>
<fN82><s1>PSI</s1>
</fN82>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>IAPR workshop on document analysis systems</s1>
<s2>5</s2>
<s3>Princeton NJ USA</s3>
<s4>2002-08-19</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 03-0248637 INIST</NO>
<ET>An integrated system for the analysis and the recognition of characters in ancient documents</ET>
<AU>VEZZOSI (Stefano); BEDINI (Luigi); TONAZZINI (Anna); LOPRESTI (Daniel); JIANYING HU; KASHI (Ramanujan)</AU>
<AF>Istituto di Elaborazione della Informazione - CNR, Via G. Moruzzi, 1/56124 Pisa/Italie (1 aut., 2 aut., 3 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Lecture notes in computer science; ISSN 0302-9743; Allemagne; Da. 2002; Vol. 2423; Pp. 49-52; Bibl. 7 ref.</SO>
<LA>Anglais</LA>
<EA>This paper describes an integrated system for processing and analyzing highly degraded ancient printed documents. For each page, the system reduces noise by wavelet-based filtering, extracts and segments the text lines into characters by a fast adaptive thresholding, and performs OCR by a feed-forward back-propagation multilayer neural network. The probability recognition is used as a discriminant parameter for determining the automatic activation of a feed-back process, leading back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques. The experimental results highlight the good performance of the whole system in the analysis of even strongly degraded texts.</EA>
<CC>001D02C03</CC>
<FD>Réseau neuronal; Système intégré; Analyse système; Reconnaissance caractère; Reconnaissance forme; Algorithme rétropropagation; Reconnaissance optique caractère; Méthode adaptative; Transformation ondelette; Détection seuil; Caractère imprimé; Document imprimé; Document imprimé ancien</FD>
<ED>Neural network; Integrated system; System analysis; Character recognition; Pattern recognition; Backpropagation algorithm; Optical character recognition; Adaptive method; Wavelet transformation; Threshold detection; Printed character; Printed document</ED>
<SD>Red neuronal; Sistema integrado; Análisis sistema; Reconocimiento carácter; Reconocimiento patrón; Algoritmo retropropagación; Reconocimento óptico de caracteres; Método adaptativo; Transformación ondita; Detección umbral; Carácter impreso; Documento impreso</SD>
<LO>INIST-16343.354000108470940050</LO>
<ID>03-0248637</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000622 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000622 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:03-0248637
   |texte=   An integrated system for the analysis and the recognition of characters in ancient documents
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

An integrated system for the analysis and the recognition of characters in ancient documents

An integrated system for the analysis and the recognition of characters in ancient documents

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri