OcrV1, PascalFrancis, Corpus, bibRecord, 000307

A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Identifieur interne : 000307 ( PascalFrancis/Corpus ); précédent : 000306; suivant : 000308

A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Auteurs : SHAOLEI FENG ; R. Manmatha

Source :

RBID : Francis:08-0091673

Descripteurs français

Pascal (Inist)
- Traitement automatique, Evaluation performance, Reconnaissance optique caractère, Etude utilisation, Bibliothèque électronique, Résultat.

English descriptors

KwdEn :
- Automatic processing, Electronic library, Optical character recognition, Performance evaluation, Result, Use study.

Abstract

A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A08	`01`	`1`	`ENG`	`@1 A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books`
A09	`01`	`1`	`ENG`	`@1 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006 : opening information horizons : June 11-15, 2006, Chapel Hill NC`
A11	`01`	`1`		`@1 SHAOLEI FENG`
A11	`02`	`1`		`@1 MANMATHA (R.)`
A14	`01`			`@1 Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts @2 Amherst @3 USA @Z 1 aut. @Z 2 aut.`
A18	`01`	`1`		`@1 Association for Computing Machinery. Special Interest Group on Information Retrieval @3 USA @9 org-cong.`
A18	`02`	`1`		`@1 Association for Computing Machinery. Special Interest Group on Hypertext, Hypermedia and Web @3 USA @9 org-cong.`
A18	`03`	`1`		`@1 IEEE Computer Society. Technical Committee on Digital Libraries @3 USA @9 org-cong.`
A20				`@1 109-118`
A21				`@1 2006`
A23	`01`			`@0 ENG`
A25	`01`			`@1 ACM Press @2 New York NY`
A26	`01`			`@0 1-59593-354-9`
A30	`01`	`1`	`ENG`	`@1 ACM/IEEE Joint Conference on Digital Libraries @2 6 @3 Chapel Hill NC USA @4 2006`
A43	`01`			`@1 INIST @2 Y 38968 @5 354000153512330170`
A44				`@0 0000 @1 © 2008 INIST-CNRS. All rights reserved.`
A45				`@0 21 ref.`
A47	`01`	`1`		`@0 08-0091673`
A60				`@1 C`
A61				`@0 A`
A66	`01`			`@0 USA`
C01	`01`		`ENG`	@0 A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results.
C02	`01`	`X`		`@0 790B05 @1 II`
C03	`01`	`X`	`FRE`	`@0 Traitement automatique @5 04`
C03	`01`	`X`	`ENG`	`@0 Automatic processing @5 04`
C03	`01`	`X`	`SPA`	`@0 Tratamiento automático @5 04`
C03	`02`	`X`	`FRE`	`@0 Evaluation performance @5 05`
C03	`02`	`X`	`ENG`	`@0 Performance evaluation @5 05`
C03	`02`	`X`	`SPA`	`@0 Evaluación prestación @5 05`
C03	`03`	`X`	`FRE`	`@0 Reconnaissance optique caractère @5 06`
C03	`03`	`X`	`ENG`	`@0 Optical character recognition @5 06`
C03	`03`	`X`	`SPA`	`@0 Reconocimento óptico de caracteres @5 06`
C03	`04`	`X`	`FRE`	`@0 Etude utilisation @5 07`
C03	`04`	`X`	`ENG`	`@0 Use study @5 07`
C03	`04`	`X`	`SPA`	`@0 Estudio utilización @5 07`
C03	`05`	`X`	`FRE`	`@0 Bibliothèque électronique @5 08`
C03	`05`	`X`	`ENG`	`@0 Electronic library @5 08`
C03	`05`	`X`	`SPA`	`@0 Biblioteca electronica @5 08`
C03	`06`	`X`	`FRE`	`@0 Résultat @5 09`
C03	`06`	`X`	`ENG`	`@0 Result @5 09`
C03	`06`	`X`	`SPA`	`@0 Resultado @5 09`
N21				`@1 052`

Format Inist (serveur)

NO :	FRANCIS 08-0091673 INIST
ET :	A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
AU :	SHAOLEI FENG; MANMATHA (R.)
AF :	Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts/Amherst/Etats-Unis (1 aut., 2 aut.)
DT :	Congrès; Niveau analytique
SO :	ACM/IEEE Joint Conference on Digital Libraries/6/2006/Chapel Hill NC USA; Etats-Unis; New York NY: ACM Press; Da. 2006; Pp. 109-118; ISBN 1-59593-354-9
LA :	Anglais
EA :	A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results.
CC :	790B05
FD :	Traitement automatique; Evaluation performance; Reconnaissance optique caractère; Etude utilisation; Bibliothèque électronique; Résultat
ED :	Automatic processing; Performance evaluation; Optical character recognition; Use study; Electronic library; Result
SD :	Tratamiento automático; Evaluación prestación; Reconocimento óptico de caracteres; Estudio utilización; Biblioteca electronica; Resultado
LO :	INIST-Y 38968.354000153512330170
ID :	08-0091673

Links to Exploration step

Francis:08-0091673

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books</title>
<author><name sortKey="Shaolei Feng" sort="Shaolei Feng" uniqKey="Shaolei Feng" last="Shaolei Feng">SHAOLEI FENG</name>
<affiliation><inist:fA14 i1="01"><s1>Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Manmatha, R" sort="Manmatha, R" uniqKey="Manmatha R" first="R." last="Manmatha">R. Manmatha</name>
<affiliation><inist:fA14 i1="01"><s1>Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">08-0091673</idno>
<date when="2006">2006</date>
<idno type="stanalyst">FRANCIS 08-0091673 INIST</idno>
<idno type="RBID">Francis:08-0091673</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000307</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books</title>
<author><name sortKey="Shaolei Feng" sort="Shaolei Feng" uniqKey="Shaolei Feng" last="Shaolei Feng">SHAOLEI FENG</name>
<affiliation><inist:fA14 i1="01"><s1>Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Manmatha, R" sort="Manmatha, R" uniqKey="Manmatha R" first="R." last="Manmatha">R. Manmatha</name>
<affiliation><inist:fA14 i1="01"><s1>Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic processing</term>
<term>Electronic library</term>
<term>Optical character recognition</term>
<term>Performance evaluation</term>
<term>Result</term>
<term>Use study</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Traitement automatique</term>
<term>Evaluation performance</term>
<term>Reconnaissance optique caractère</term>
<term>Etude utilisation</term>
<term>Bibliothèque électronique</term>
<term>Résultat</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA08 i1="01" i2="1" l="ENG"><s1>A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006 : opening information horizons : June 11-15, 2006, Chapel Hill NC</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>SHAOLEI FENG</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>MANMATHA (R.)</s1>
</fA11>
<fA14 i1="01"><s1>Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts</s1>
<s2>Amherst</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1"><s1>Association for Computing Machinery. Special Interest Group on Information Retrieval</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1"><s1>Association for Computing Machinery. Special Interest Group on Hypertext, Hypermedia and Web</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="03" i2="1"><s1>IEEE Computer Society. Technical Committee on Digital Libraries</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA20><s1>109-118</s1>
</fA20>
<fA21><s1>2006</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA25 i1="01"><s1>ACM Press</s1>
<s2>New York NY</s2>
</fA25>
<fA26 i1="01"><s0>1-59593-354-9</s0>
</fA26>
<fA30 i1="01" i2="1" l="ENG"><s1>ACM/IEEE Joint Conference on Digital Libraries</s1>
<s2>6</s2>
<s3>Chapel Hill NC USA</s3>
<s4>2006</s4>
</fA30>
<fA43 i1="01"><s1>INIST</s1>
<s2>Y 38968</s2>
<s5>354000153512330170</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2008 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>21 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>08-0091673</s0>
</fA47>
<fA60><s1>C</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA66 i1="01"><s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>790B05</s0>
<s1>II</s1>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Traitement automatique</s0>
<s5>04</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Automatic processing</s0>
<s5>04</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Tratamiento automático</s0>
<s5>04</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Evaluation performance</s0>
<s5>05</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Performance evaluation</s0>
<s5>05</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Evaluación prestación</s0>
<s5>05</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>06</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Optical character recognition</s0>
<s5>06</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Reconocimento óptico de caracteres</s0>
<s5>06</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Etude utilisation</s0>
<s5>07</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Use study</s0>
<s5>07</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Estudio utilización</s0>
<s5>07</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Bibliothèque électronique</s0>
<s5>08</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Electronic library</s0>
<s5>08</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Biblioteca electronica</s0>
<s5>08</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Résultat</s0>
<s5>09</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Result</s0>
<s5>09</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Resultado</s0>
<s5>09</s5>
</fC03>
<fN21><s1>052</s1>
</fN21>
</pA>
</standard>
<server><NO>FRANCIS 08-0091673 INIST</NO>
<ET>A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books</ET>
<AU>SHAOLEI FENG; MANMATHA (R.)</AU>
<AF>Multimedia Indexing and Retrieval Group Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts/Amherst/Etats-Unis (1 aut., 2 aut.)</AF>
<DT>Congrès; Niveau analytique</DT>
<SO>ACM/IEEE Joint Conference on Digital Libraries/6/2006/Chapel Hill NC USA; Etats-Unis; New York NY: ACM Press; Da. 2006; Pp. 109-118; ISBN 1-59593-354-9</SO>
<LA>Anglais</LA>
<EA>A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar efforts from Yahoo and Microsoft. Content-based on line book retrieval usually requires first converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can affect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the first work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and effectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results.</EA>
<CC>790B05</CC>
<FD>Traitement automatique; Evaluation performance; Reconnaissance optique caractère; Etude utilisation; Bibliothèque électronique; Résultat</FD>
<ED>Automatic processing; Performance evaluation; Optical character recognition; Use study; Electronic library; Result</ED>
<SD>Tratamiento automático; Evaluación prestación; Reconocimento óptico de caracteres; Estudio utilización; Biblioteca electronica; Resultado</SD>
<LO>INIST-Y 38968.354000153512330170</LO>
<ID>08-0091673</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000307 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000307 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Francis:08-0091673
   |texte=   A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

A hierarchical, HMM-based automatic evaluation of OCR accuracy for a digital library of books

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri