OcrV1, PascalFrancis, Corpus, bibRecord, 000953

Public domain optical character recognition

Identifieur interne : 000953 ( PascalFrancis/Corpus ); précédent : 000952; suivant : 000954

Public domain optical character recognition

Auteurs : M. D. Garris ; J. L. Blue ; G. T. Candela ; D. L. Dimmick ; J. Geist ; P. J. Grother ; S. A. Janet ; C. L. Wilson

Source :

SPIE proceedings series [ 1017-2653 ] ; 1995.

RBID : Pascal:97-0135005

Descripteurs français

Pascal (Inist)
- Reconnaissance caractère, Reconnaissance forme, Réseau neuronal, Reconnaissance optique caractère, Document, Ecriture, Traitement document, Domaine public, Texte manuscrit.

English descriptors

KwdEn :
- Character recognition, Document, Document processing, Hand writing, Handwritten text, Neural network, Optical character recognition, Pattern recognition, Public domain.

Abstract

A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on Handwriting Sample Forms like the ones distributed with NIST Special Database I. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations including computers manufactured by Digital Equipment Corporation, Hewlett Packard, IBM, Silicon Graphics Incorporated, and Sum Microsystems. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 1017-2653`
A05				`@2 2422`
A08	`01`	`1`	`ENG`	`@1 Public domain optical character recognition`
A09	`01`	`1`	`ENG`	`@1 Document recognition II : San Jose CA, 6-7 February 1995`
A11	`01`	`1`		`@1 GARRIS (M. D.)`
A11	`02`	`1`		`@1 BLUE (J. L.)`
A11	`03`	`1`		`@1 CANDELA (G. T.)`
A11	`04`	`1`		`@1 DIMMICK (D. L.)`
A11	`05`	`1`		`@1 GEIST (J.)`
A11	`06`	`1`		`@1 GROTHER (P. J.)`
A11	`07`	`1`		`@1 JANET (S. A.)`
A11	`08`	`1`		`@1 WILSON (C. L.)`
A12	`01`	`1`		`@1 VINCENT (Luc M.) @9 ed.`
A12	`02`	`1`		`@1 BAIRD (Henry S.) @9 ed.`
A14	`01`			`@1 National Institute of Standards and Technology @2 Gaithersburg, Maryland 20899 @3 USA @Z 1 aut. @Z 2 aut. @Z 3 aut. @Z 4 aut. @Z 5 aut. @Z 6 aut. @Z 7 aut. @Z 8 aut.`
A18	`01`	`1`		`@1 International Society for Optical Engineering @2 Bellingham WA @3 USA @9 patr.`
A18	`02`	`1`		`@1 Society for Imaging Science and Technology @2 Springfield VA @3 USA @9 patr.`
A20				`@1 2-14`
A21				`@1 1995`
A23	`01`			`@0 ENG`
A43	`01`			`@1 INIST @2 21760 @5 354000053416650010`
A44				`@0 0000 @1 © 1997 INIST-CNRS. All rights reserved.`
A47	`01`	`1`		`@0 97-0135005`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 SPIE proceedings series`
A66	`01`			`@0 USA`
C01	`01`		`ENG`	@0 A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on Handwriting Sample Forms like the ones distributed with NIST Special Database I. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations including computers manufactured by Digital Equipment Corporation, Hewlett Packard, IBM, Silicon Graphics Incorporated, and Sum Microsystems. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.
C02	`01`	`X`		`@0 001A01G02A`
C02	`02`	`X`		`@0 205`
C03	`01`	`X`	`FRE`	`@0 Reconnaissance caractère @5 04`
C03	`01`	`X`	`ENG`	`@0 Character recognition @5 04`
C03	`01`	`X`	`SPA`	`@0 Reconocimiento carácter @5 04`
C03	`02`	`X`	`FRE`	`@0 Reconnaissance forme @5 05`
C03	`02`	`X`	`ENG`	`@0 Pattern recognition @5 05`
C03	`02`	`X`	`GER`	`@0 Mustererkennung @5 05`
C03	`02`	`X`	`SPA`	`@0 Reconocimiento patrón @5 05`
C03	`03`	`X`	`FRE`	`@0 Réseau neuronal @5 06`
C03	`03`	`X`	`ENG`	`@0 Neural network @5 06`
C03	`03`	`X`	`SPA`	`@0 Red neuronal @5 06`
C03	`04`	`X`	`FRE`	`@0 Reconnaissance optique caractère @5 07`
C03	`04`	`X`	`ENG`	`@0 Optical character recognition @5 07`
C03	`04`	`X`	`SPA`	`@0 Reconocimento óptico de caracteres @5 07`
C03	`05`	`X`	`FRE`	`@0 Document @5 11`
C03	`05`	`X`	`ENG`	`@0 Document @5 11`
C03	`05`	`X`	`SPA`	`@0 Documento @5 11`
C03	`06`	`X`	`FRE`	`@0 Ecriture @5 12`
C03	`06`	`X`	`ENG`	`@0 Hand writing @5 12`
C03	`06`	`X`	`SPA`	`@0 Escritura manual @5 12`
C03	`07`	`X`	`FRE`	`@0 Traitement document @5 13`
C03	`07`	`X`	`ENG`	`@0 Document processing @5 13`
C03	`07`	`X`	`SPA`	`@0 Tratamiento documento @5 13`
C03	`08`	`X`	`FRE`	`@0 Domaine public @4 CD @5 96`
C03	`08`	`X`	`ENG`	`@0 Public domain @4 CD @5 96`
C03	`09`	`X`	`FRE`	`@0 Texte manuscrit @4 CD @5 97`
C03	`09`	`X`	`ENG`	`@0 Handwritten text @4 CD @5 97`
N21				`@1 055`

A30	`01`	`1`	`ENG`	`@1 Document recognition. Conference @3 San Jose CA USA @4 1995-02-06`

Format Inist (serveur)

NO :	PASCAL 97-0135005 INIST
ET :	Public domain optical character recognition
AU :	GARRIS (M. D.); BLUE (J. L.); CANDELA (G. T.); DIMMICK (D. L.); GEIST (J.); GROTHER (P. J.); JANET (S. A.); WILSON (C. L.); VINCENT (Luc M.); BAIRD (Henry S.)
AF :	National Institute of Standards and Technology/Gaithersburg, Maryland 20899/Etats-Unis (1 aut., 2 aut., 3 aut., 4 aut., 5 aut., 6 aut., 7 aut., 8 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	SPIE proceedings series; ISSN 1017-2653; Etats-Unis; Da. 1995; Vol. 2422; Pp. 2-14
LA :	Anglais
EA :	A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on Handwriting Sample Forms like the ones distributed with NIST Special Database I. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations including computers manufactured by Digital Equipment Corporation, Hewlett Packard, IBM, Silicon Graphics Incorporated, and Sum Microsystems. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.
CC :	001A01G02A; 205
FD :	Reconnaissance caractère; Reconnaissance forme; Réseau neuronal; Reconnaissance optique caractère; Document; Ecriture; Traitement document; Domaine public; Texte manuscrit
ED :	Character recognition; Pattern recognition; Neural network; Optical character recognition; Document; Hand writing; Document processing; Public domain; Handwritten text
GD :	Mustererkennung
SD :	Reconocimiento carácter; Reconocimiento patrón; Red neuronal; Reconocimento óptico de caracteres; Documento; Escritura manual; Tratamiento documento
LO :	INIST-21760.354000053416650010
ID :	97-0135005

Links to Exploration step

Pascal:97-0135005

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Public domain optical character recognition</title>
<author><name sortKey="Garris, M D" sort="Garris, M D" uniqKey="Garris M" first="M. D." last="Garris">M. D. Garris</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Blue, J L" sort="Blue, J L" uniqKey="Blue J" first="J. L." last="Blue">J. L. Blue</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Candela, G T" sort="Candela, G T" uniqKey="Candela G" first="G. T." last="Candela">G. T. Candela</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Dimmick, D L" sort="Dimmick, D L" uniqKey="Dimmick D" first="D. L." last="Dimmick">D. L. Dimmick</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Geist, J" sort="Geist, J" uniqKey="Geist J" first="J." last="Geist">J. Geist</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Grother, P J" sort="Grother, P J" uniqKey="Grother P" first="P. J." last="Grother">P. J. Grother</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Janet, S A" sort="Janet, S A" uniqKey="Janet S" first="S. A." last="Janet">S. A. Janet</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Wilson, C L" sort="Wilson, C L" uniqKey="Wilson C" first="C. L." last="Wilson">C. L. Wilson</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">97-0135005</idno>
<date when="1995">1995</date>
<idno type="stanalyst">PASCAL 97-0135005 INIST</idno>
<idno type="RBID">Pascal:97-0135005</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000953</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Public domain optical character recognition</title>
<author><name sortKey="Garris, M D" sort="Garris, M D" uniqKey="Garris M" first="M. D." last="Garris">M. D. Garris</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Blue, J L" sort="Blue, J L" uniqKey="Blue J" first="J. L." last="Blue">J. L. Blue</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Candela, G T" sort="Candela, G T" uniqKey="Candela G" first="G. T." last="Candela">G. T. Candela</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Dimmick, D L" sort="Dimmick, D L" uniqKey="Dimmick D" first="D. L." last="Dimmick">D. L. Dimmick</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Geist, J" sort="Geist, J" uniqKey="Geist J" first="J." last="Geist">J. Geist</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Grother, P J" sort="Grother, P J" uniqKey="Grother P" first="P. J." last="Grother">P. J. Grother</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Janet, S A" sort="Janet, S A" uniqKey="Janet S" first="S. A." last="Janet">S. A. Janet</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Wilson, C L" sort="Wilson, C L" uniqKey="Wilson C" first="C. L." last="Wilson">C. L. Wilson</name>
<affiliation><inist:fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="1995">1995</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Document</term>
<term>Document processing</term>
<term>Hand writing</term>
<term>Handwritten text</term>
<term>Neural network</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Public domain</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Réseau neuronal</term>
<term>Reconnaissance optique caractère</term>
<term>Document</term>
<term>Ecriture</term>
<term>Traitement document</term>
<term>Domaine public</term>
<term>Texte manuscrit</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on Handwriting Sample Forms like the ones distributed with NIST Special Database I. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations including computers manufactured by Digital Equipment Corporation, Hewlett Packard, IBM, Silicon Graphics Incorporated, and Sum Microsystems. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>1017-2653</s0>
</fA01>
<fA05><s2>2422</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>Public domain optical character recognition</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Document recognition II : San Jose CA, 6-7 February 1995</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>GARRIS (M. D.)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>BLUE (J. L.)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>CANDELA (G. T.)</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>DIMMICK (D. L.)</s1>
</fA11>
<fA11 i1="05" i2="1"><s1>GEIST (J.)</s1>
</fA11>
<fA11 i1="06" i2="1"><s1>GROTHER (P. J.)</s1>
</fA11>
<fA11 i1="07" i2="1"><s1>JANET (S. A.)</s1>
</fA11>
<fA11 i1="08" i2="1"><s1>WILSON (C. L.)</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>VINCENT (Luc M.)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>BAIRD (Henry S.)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>National Institute of Standards and Technology</s1>
<s2>Gaithersburg, Maryland 20899</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
<sZ>6 aut.</sZ>
<sZ>7 aut.</sZ>
<sZ>8 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1"><s1>International Society for Optical Engineering</s1>
<s2>Bellingham WA</s2>
<s3>USA</s3>
<s9>patr.</s9>
</fA18>
<fA18 i1="02" i2="1"><s1>Society for Imaging Science and Technology</s1>
<s2>Springfield VA</s2>
<s3>USA</s3>
<s9>patr.</s9>
</fA18>
<fA20><s1>2-14</s1>
</fA20>
<fA21><s1>1995</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA43 i1="01"><s1>INIST</s1>
<s2>21760</s2>
<s5>354000053416650010</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 1997 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA47 i1="01" i2="1"><s0>97-0135005</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>SPIE proceedings series</s0>
</fA64>
<fA66 i1="01"><s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on Handwriting Sample Forms like the ones distributed with NIST Special Database I. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations including computers manufactured by Digital Equipment Corporation, Hewlett Packard, IBM, Silicon Graphics Incorporated, and Sum Microsystems. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001A01G02A</s0>
</fC02>
<fC02 i1="02" i2="X"><s0>205</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Reconnaissance caractère</s0>
<s5>04</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Character recognition</s0>
<s5>04</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Reconocimiento carácter</s0>
<s5>04</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Reconnaissance forme</s0>
<s5>05</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Pattern recognition</s0>
<s5>05</s5>
</fC03>
<fC03 i1="02" i2="X" l="GER"><s0>Mustererkennung</s0>
<s5>05</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Reconocimiento patrón</s0>
<s5>05</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Réseau neuronal</s0>
<s5>06</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Neural network</s0>
<s5>06</s5>
</fC03>
<fC03 i1="03" i2="X" l="SPA"><s0>Red neuronal</s0>
<s5>06</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>07</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Optical character recognition</s0>
<s5>07</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Reconocimento óptico de caracteres</s0>
<s5>07</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Document</s0>
<s5>11</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Document</s0>
<s5>11</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Documento</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Ecriture</s0>
<s5>12</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Hand writing</s0>
<s5>12</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Escritura manual</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Traitement document</s0>
<s5>13</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Document processing</s0>
<s5>13</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Tratamiento documento</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Domaine public</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Public domain</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Texte manuscrit</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Handwritten text</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fN21><s1>055</s1>
</fN21>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>Document recognition. Conference</s1>
<s3>San Jose CA USA</s3>
<s4>1995-02-06</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 97-0135005 INIST</NO>
<ET>Public domain optical character recognition</ET>
<AU>GARRIS (M. D.); BLUE (J. L.); CANDELA (G. T.); DIMMICK (D. L.); GEIST (J.); GROTHER (P. J.); JANET (S. A.); WILSON (C. L.); VINCENT (Luc M.); BAIRD (Henry S.)</AU>
<AF>National Institute of Standards and Technology/Gaithersburg, Maryland 20899/Etats-Unis (1 aut., 2 aut., 3 aut., 4 aut., 5 aut., 6 aut., 7 aut., 8 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>SPIE proceedings series; ISSN 1017-2653; Etats-Unis; Da. 1995; Vol. 2422; Pp. 2-14</SO>
<LA>Anglais</LA>
<EA>A public domain document processing system has been developed by the National Institute of Standards and Technology (NIST). The system is a standard reference form-based handprint recognition system for evaluating optical character recognition (OCR), and it is intended to provide a baseline of performance on an open application. The system's source code, training data, performance assessment tools, and type of forms processed are all publicly available. The system recognizes the handprint entered on Handwriting Sample Forms like the ones distributed with NIST Special Database I. From these forms, the system reads hand-printed numeric fields, upper and lowercase alphabetic fields, and unconstrained text paragraphs comprised of words from a limited-size dictionary. The modular design of the system makes it useful for component evaluation and comparison, training and testing set validation, and multiple system voting schemes. The system contains a number of significant contributions to OCR technology, including an optimized Probabilistic Neural Network (PNN) classifier that operates a factor of 20 times faster than traditional software implementations of the algorithm. The source code for the recognition system is written in C and is organized into 11 libraries. In all, there are approximately 19,000 lines of code supporting more than 550 subroutines. Source code is provided for form registration, form removal, field isolation, field segmentation, character normalization, feature extraction, character classification, and dictionary-based postprocessing. The recognition system has been successfully compiled and tested on a host of UNIX workstations including computers manufactured by Digital Equipment Corporation, Hewlett Packard, IBM, Silicon Graphics Incorporated, and Sum Microsystems. This paper gives an overview of the recognition system's software architecture, including descriptions of the various system components along with timing and accuracy statistics.</EA>
<CC>001A01G02A; 205</CC>
<FD>Reconnaissance caractère; Reconnaissance forme; Réseau neuronal; Reconnaissance optique caractère; Document; Ecriture; Traitement document; Domaine public; Texte manuscrit</FD>
<ED>Character recognition; Pattern recognition; Neural network; Optical character recognition; Document; Hand writing; Document processing; Public domain; Handwritten text</ED>
<GD>Mustererkennung</GD>
<SD>Reconocimiento carácter; Reconocimiento patrón; Red neuronal; Reconocimento óptico de caracteres; Documento; Escritura manual; Tratamiento documento</SD>
<LO>INIST-21760.354000053416650010</LO>
<ID>97-0135005</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000953 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000953 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:97-0135005
   |texte=   Public domain optical character recognition
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Public domain optical character recognition

Public domain optical character recognition

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri