Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Performance evaluation for document analysis

Identifieur interne : 000127 ( Istex/Corpus ); précédent : 000126; suivant : 000128

Performance evaluation for document analysis

Auteurs : Jonathan J. Hull

Source :

RBID : ISTEX:4148AFEA65A3D5C80768F42F0DA929541A59586F

Abstract

A framework for evaluating the performance of a document analysis system is presented. This framework takes into account the task definition for the document analysis system, a data base on which that system is evaluated, the metrics used to evaluate performance, and the generalization of the results achieved beyond the confines of the test. Several recent significant efforts in evaluating document analysis systems are surveyed. How these efforts fit the general framework is discussed. The specific task that was evaluated, the data base used for the evaluation, and the generalization of the derived performance is presented. Most of these projects were designed for limited applications in which the translation of images of text into ASCII was the primary consideration. However, this is only part of what a document analysis system must often calculate. Other, less easily measured tasks, such as the subdivision of a document image into zones that represent regions of graphics, photographs, and text, must also be performed. Generally accepted solutions for measuring the performance of such tasks often do not exist. Several of them are mentioned as areas for future research. © 1996 John Wiley & Sons, Inc.

Url:
DOI: 10.1002/(SICI)1098-1098(199624)7:4<357::AID-IMA10>3.0.CO;2-T

Links to Exploration step

ISTEX:4148AFEA65A3D5C80768F42F0DA929541A59586F

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Performance evaluation for document analysis</title>
<author>
<name sortKey="Hull, Jonathan J" sort="Hull, Jonathan J" uniqKey="Hull J" first="Jonathan J." last="Hull">Jonathan J. Hull</name>
<affiliation>
<mods:affiliation>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</mods:affiliation>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:4148AFEA65A3D5C80768F42F0DA929541A59586F</idno>
<date when="1996" year="1996">1996</date>
<idno type="doi">10.1002/(SICI)1098-1098(199624)7:4<357::AID-IMA10>3.0.CO;2-T</idno>
<idno type="url">https://api.istex.fr/document/4148AFEA65A3D5C80768F42F0DA929541A59586F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000127</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Performance evaluation for document analysis</title>
<author>
<name sortKey="Hull, Jonathan J" sort="Hull, Jonathan J" uniqKey="Hull J" first="Jonathan J." last="Hull">Jonathan J. Hull</name>
<affiliation>
<mods:affiliation>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</mods:affiliation>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="ISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="1996-12">1996-12</date>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="357">357</biblScope>
<biblScope unit="page" to="362">362</biblScope>
</imprint>
<idno type="ISSN">0899-9457</idno>
</series>
<idno type="istex">4148AFEA65A3D5C80768F42F0DA929541A59586F</idno>
<idno type="DOI">10.1002/(SICI)1098-1098(199624)7:4<357::AID-IMA10>3.0.CO;2-T</idno>
<idno type="ArticleID">IMA10</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0899-9457</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A framework for evaluating the performance of a document analysis system is presented. This framework takes into account the task definition for the document analysis system, a data base on which that system is evaluated, the metrics used to evaluate performance, and the generalization of the results achieved beyond the confines of the test. Several recent significant efforts in evaluating document analysis systems are surveyed. How these efforts fit the general framework is discussed. The specific task that was evaluated, the data base used for the evaluation, and the generalization of the derived performance is presented. Most of these projects were designed for limited applications in which the translation of images of text into ASCII was the primary consideration. However, this is only part of what a document analysis system must often calculate. Other, less easily measured tasks, such as the subdivision of a document image into zones that represent regions of graphics, photographs, and text, must also be performed. Generally accepted solutions for measuring the performance of such tasks often do not exist. Several of them are mentioned as areas for future research. © 1996 John Wiley & Sons, Inc.</div>
</front>
</TEI>
<istex>
<corpusName>wiley</corpusName>
<author>
<json:item>
<name>Jonathan J. Hull</name>
<affiliations>
<json:string>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</json:string>
</affiliations>
</json:item>
</author>
<articleId>
<json:string>IMA10</json:string>
</articleId>
<language>
<json:string>eng</json:string>
</language>
<abstract>A framework for evaluating the performance of a document analysis system is presented. This framework takes into account the task definition for the document analysis system, a data base on which that system is evaluated, the metrics used to evaluate performance, and the generalization of the results achieved beyond the confines of the test. Several recent significant efforts in evaluating document analysis systems are surveyed. How these efforts fit the general framework is discussed. The specific task that was evaluated, the data base used for the evaluation, and the generalization of the derived performance is presented. Most of these projects were designed for limited applications in which the translation of images of text into ASCII was the primary consideration. However, this is only part of what a document analysis system must often calculate. Other, less easily measured tasks, such as the subdivision of a document image into zones that represent regions of graphics, photographs, and text, must also be performed. Generally accepted solutions for measuring the performance of such tasks often do not exist. Several of them are mentioned as areas for future research. © 1996 John Wiley & Sons, Inc.</abstract>
<qualityIndicators>
<score>7.004</score>
<pdfVersion>1.3</pdfVersion>
<pdfPageSize>612 x 792 pts (letter)</pdfPageSize>
<refBibsNative>true</refBibsNative>
<keywordCount>0</keywordCount>
<abstractCharCount>1219</abstractCharCount>
<pdfWordCount>4712</pdfWordCount>
<pdfCharCount>27577</pdfCharCount>
<pdfPageCount>6</pdfPageCount>
<abstractWordCount>191</abstractWordCount>
</qualityIndicators>
<title>Performance evaluation for document analysis</title>
<genre.original>
<json:string>article</json:string>
</genre.original>
<genre>
<json:string>article</json:string>
</genre>
<host>
<volume>7</volume>
<publisherId>
<json:string>IMA</json:string>
</publisherId>
<pages>
<total>6</total>
<last>362</last>
<first>357</first>
</pages>
<issn>
<json:string>0899-9457</json:string>
</issn>
<issue>4</issue>
<genre>
<json:string>journal</json:string>
</genre>
<language>
<json:string>unknown</json:string>
</language>
<eissn>
<json:string>1098-1098</json:string>
</eissn>
<title>International Journal of Imaging Systems and Technology</title>
<doi>
<json:string>10.1002/(ISSN)1098-1098</json:string>
</doi>
</host>
<publicationDate>1996</publicationDate>
<copyrightDate>1996</copyrightDate>
<doi>
<json:string>10.1002/(SICI)1098-1098(199624)7:4>357::AID-IMA10>3.0.CO;2-T</json:string>
</doi>
<id>4148AFEA65A3D5C80768F42F0DA929541A59586F</id>
<fulltext>
<json:item>
<original>true</original>
<mimetype>application/pdf</mimetype>
<extension>pdf</extension>
<uri>https://api.istex.fr/document/4148AFEA65A3D5C80768F42F0DA929541A59586F/fulltext/pdf</uri>
</json:item>
<json:item>
<original>false</original>
<mimetype>application/zip</mimetype>
<extension>zip</extension>
<uri>https://api.istex.fr/document/4148AFEA65A3D5C80768F42F0DA929541A59586F/fulltext/zip</uri>
</json:item>
<istex:fulltextTEI uri="https://api.istex.fr/document/4148AFEA65A3D5C80768F42F0DA929541A59586F/fulltext/tei">
<teiHeader>
<fileDesc>
<titleStmt>
<title level="a" type="main" xml:lang="en">Performance evaluation for document analysis</title>
</titleStmt>
<publicationStmt>
<authority>ISTEX</authority>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<availability>
<p>WILEY</p>
</availability>
<date>1996</date>
</publicationStmt>
<sourceDesc>
<biblStruct type="inbook">
<analytic>
<title level="a" type="main" xml:lang="en">Performance evaluation for document analysis</title>
<author>
<persName>
<forename type="first">Jonathan J.</forename>
<surname>Hull</surname>
</persName>
<note type="correspondence">
<p>Correspondence: Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</p>
</note>
<affiliation>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</affiliation>
</author>
</analytic>
<monogr>
<title level="j">International Journal of Imaging Systems and Technology</title>
<title level="j" type="abbrev">Int. J. Imaging Syst. Technol.</title>
<idno type="pISSN">0899-9457</idno>
<idno type="eISSN">1098-1098</idno>
<idno type="DOI">10.1002/(ISSN)1098-1098</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="1996-12"></date>
<biblScope unit="volume">7</biblScope>
<biblScope unit="issue">4</biblScope>
<biblScope unit="page" from="357">357</biblScope>
<biblScope unit="page" to="362">362</biblScope>
</imprint>
</monogr>
<idno type="istex">4148AFEA65A3D5C80768F42F0DA929541A59586F</idno>
<idno type="DOI">10.1002/(SICI)1098-1098(199624)7:4<357::AID-IMA10>3.0.CO;2-T</idno>
<idno type="ArticleID">IMA10</idno>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<creation>
<date>1996</date>
</creation>
<langUsage>
<language ident="en">en</language>
</langUsage>
<abstract xml:lang="en">
<p>A framework for evaluating the performance of a document analysis system is presented. This framework takes into account the task definition for the document analysis system, a data base on which that system is evaluated, the metrics used to evaluate performance, and the generalization of the results achieved beyond the confines of the test. Several recent significant efforts in evaluating document analysis systems are surveyed. How these efforts fit the general framework is discussed. The specific task that was evaluated, the data base used for the evaluation, and the generalization of the derived performance is presented. Most of these projects were designed for limited applications in which the translation of images of text into ASCII was the primary consideration. However, this is only part of what a document analysis system must often calculate. Other, less easily measured tasks, such as the subdivision of a document image into zones that represent regions of graphics, photographs, and text, must also be performed. Generally accepted solutions for measuring the performance of such tasks often do not exist. Several of them are mentioned as areas for future research. © 1996 John Wiley & Sons, Inc.</p>
</abstract>
</profileDesc>
<revisionDesc>
<change when="1996-12">Published</change>
</revisionDesc>
</teiHeader>
</istex:fulltextTEI>
<json:item>
<original>false</original>
<mimetype>text/plain</mimetype>
<extension>txt</extension>
<uri>https://api.istex.fr/document/4148AFEA65A3D5C80768F42F0DA929541A59586F/fulltext/txt</uri>
</json:item>
</fulltext>
<metadata>
<istex:metadataXml wicri:clean="Wiley, elements deleted: body">
<istex:xmlDeclaration>version="1.0" encoding="UTF-8" standalone="yes"</istex:xmlDeclaration>
<istex:document>
<component version="2.0" type="serialArticle" xml:lang="en">
<header>
<publicationMeta level="product">
<publisherInfo>
<publisherName>Wiley Subscription Services, Inc., A Wiley Company</publisherName>
<publisherLoc>Hoboken</publisherLoc>
</publisherInfo>
<doi registered="yes">10.1002/(ISSN)1098-1098</doi>
<issn type="print">0899-9457</issn>
<issn type="electronic">1098-1098</issn>
<idGroup>
<id type="product" value="IMA"></id>
</idGroup>
<titleGroup>
<title type="main" xml:lang="en" sort="INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY">International Journal of Imaging Systems and Technology</title>
<title type="short">Int. J. Imaging Syst. Technol.</title>
</titleGroup>
</publicationMeta>
<publicationMeta level="part" position="40">
<doi origin="wiley" registered="yes">10.1002/(SICI)1098-1098(199624)7:4<>1.0.CO;2-4</doi>
<numberingGroup>
<numbering type="journalVolume" number="7">7</numbering>
<numbering type="journalIssue">4</numbering>
</numberingGroup>
<coverDate startDate="1996-12">Winter 1996</coverDate>
</publicationMeta>
<publicationMeta level="unit" type="article" position="110" status="forIssue">
<doi origin="wiley" registered="yes">10.1002/(SICI)1098-1098(199624)7:4<357::AID-IMA10>3.0.CO;2-T</doi>
<idGroup>
<id type="unit" value="IMA10"></id>
</idGroup>
<countGroup>
<count type="pageTotal" number="6"></count>
</countGroup>
<copyright ownership="publisher">Copyright © 1996 John Wiley & Sons, Inc.</copyright>
<eventGroup>
<event type="manuscriptRevised" date="1996-05-29"></event>
<event type="firstOnline" date="1998-12-07"></event>
<event type="publishedOnlineFinalForm" date="1998-12-07"></event>
<event type="xmlConverted" agent="Converter:JWSART34_TO_WML3G version:2.3.2 mode:FullText source:HeaderRef result:HeaderRef" date="2010-03-04"></event>
<event type="xmlConverted" agent="Converter:WILEY_ML3G_TO_WILEY_ML3GV2 version:3.8.8" date="2014-01-28"></event>
<event type="xmlConverted" agent="Converter:WML3G_To_WML3G version:4.1.7 mode:FullText,remove_FC" date="2014-10-23"></event>
</eventGroup>
<numberingGroup>
<numbering type="pageFirst">357</numbering>
<numbering type="pageLast">362</numbering>
</numberingGroup>
<correspondenceTo>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</correspondenceTo>
<linkGroup>
<link type="toTypesetVersion" href="file:IMA.IMA10.pdf"></link>
</linkGroup>
</publicationMeta>
<contentMeta>
<countGroup>
<count type="figureTotal" number="4"></count>
<count type="tableTotal" number="1"></count>
<count type="referenceTotal" number="14"></count>
</countGroup>
<titleGroup>
<title type="main" xml:lang="en">Performance evaluation for document analysis</title>
</titleGroup>
<creators>
<creator xml:id="au1" creatorRole="author" affiliationRef="#af1" corresponding="yes">
<personName>
<givenNames>Jonathan J.</givenNames>
<familyName>Hull</familyName>
</personName>
</creator>
</creators>
<affiliationGroup>
<affiliation xml:id="af1" countryCode="US" type="organization">
<unparsedAffiliation>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</unparsedAffiliation>
</affiliation>
</affiliationGroup>
<abstractGroup>
<abstract type="main" xml:lang="en">
<title type="main">Abstract</title>
<p>A framework for evaluating the performance of a document analysis system is presented. This framework takes into account the task definition for the document analysis system, a data base on which that system is evaluated, the metrics used to evaluate performance, and the generalization of the results achieved beyond the confines of the test. Several recent significant efforts in evaluating document analysis systems are surveyed. How these efforts fit the general framework is discussed. The specific task that was evaluated, the data base used for the evaluation, and the generalization of the derived performance is presented. Most of these projects were designed for limited applications in which the translation of images of text into ASCII was the primary consideration. However, this is only part of what a document analysis system must often calculate. Other, less easily measured tasks, such as the subdivision of a document image into zones that represent regions of graphics, photographs, and text, must also be performed. Generally accepted solutions for measuring the performance of such tasks often do not exist. Several of them are mentioned as areas for future research. © 1996 John Wiley & Sons, Inc.</p>
</abstract>
</abstractGroup>
</contentMeta>
</header>
</component>
</istex:document>
</istex:metadataXml>
<mods version="3.6">
<titleInfo lang="en">
<title>Performance evaluation for document analysis</title>
</titleInfo>
<titleInfo type="alternative" contentType="CDATA" lang="en">
<title>Performance evaluation for document analysis</title>
</titleInfo>
<name type="personal">
<namePart type="given">Jonathan J.</namePart>
<namePart type="family">Hull</namePart>
<affiliation>Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</affiliation>
<description>Correspondence: Ricoh California Research Center, 2882 Sand Hill Road, Suite 115, Menlo Park, CA 94025</description>
<role>
<roleTerm type="text">author</roleTerm>
</role>
</name>
<typeOfResource>text</typeOfResource>
<genre type="article" displayLabel="article"></genre>
<originInfo>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<place>
<placeTerm type="text">Hoboken</placeTerm>
</place>
<dateIssued encoding="w3cdtf">1996-12</dateIssued>
<copyrightDate encoding="w3cdtf">1996</copyrightDate>
</originInfo>
<language>
<languageTerm type="code" authority="rfc3066">en</languageTerm>
<languageTerm type="code" authority="iso639-2b">eng</languageTerm>
</language>
<physicalDescription>
<internetMediaType>text/html</internetMediaType>
<extent unit="figures">4</extent>
<extent unit="tables">1</extent>
<extent unit="references">14</extent>
</physicalDescription>
<abstract lang="en">A framework for evaluating the performance of a document analysis system is presented. This framework takes into account the task definition for the document analysis system, a data base on which that system is evaluated, the metrics used to evaluate performance, and the generalization of the results achieved beyond the confines of the test. Several recent significant efforts in evaluating document analysis systems are surveyed. How these efforts fit the general framework is discussed. The specific task that was evaluated, the data base used for the evaluation, and the generalization of the derived performance is presented. Most of these projects were designed for limited applications in which the translation of images of text into ASCII was the primary consideration. However, this is only part of what a document analysis system must often calculate. Other, less easily measured tasks, such as the subdivision of a document image into zones that represent regions of graphics, photographs, and text, must also be performed. Generally accepted solutions for measuring the performance of such tasks often do not exist. Several of them are mentioned as areas for future research. © 1996 John Wiley & Sons, Inc.</abstract>
<relatedItem type="host">
<titleInfo>
<title>International Journal of Imaging Systems and Technology</title>
</titleInfo>
<titleInfo type="abbreviated">
<title>Int. J. Imaging Syst. Technol.</title>
</titleInfo>
<genre type="journal">journal</genre>
<identifier type="ISSN">0899-9457</identifier>
<identifier type="eISSN">1098-1098</identifier>
<identifier type="DOI">10.1002/(ISSN)1098-1098</identifier>
<identifier type="PublisherID">IMA</identifier>
<part>
<date>1996</date>
<detail type="volume">
<caption>vol.</caption>
<number>7</number>
</detail>
<detail type="issue">
<caption>no.</caption>
<number>4</number>
</detail>
<extent unit="pages">
<start>357</start>
<end>362</end>
<total>6</total>
</extent>
</part>
</relatedItem>
<identifier type="istex">4148AFEA65A3D5C80768F42F0DA929541A59586F</identifier>
<identifier type="DOI">10.1002/(SICI)1098-1098(199624)7:4<357::AID-IMA10>3.0.CO;2-T</identifier>
<identifier type="ArticleID">IMA10</identifier>
<accessCondition type="use and reproduction" contentType="copyright">Copyright © 1996 John Wiley & Sons, Inc.</accessCondition>
<recordInfo>
<recordContentSource>WILEY</recordContentSource>
<recordOrigin>Wiley Subscription Services, Inc., A Wiley Company</recordOrigin>
</recordInfo>
</mods>
</metadata>
<serie></serie>
</istex>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000127 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Corpus/biblio.hfd -nk 000127 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Corpus
   |type=    RBID
   |clé=     ISTEX:4148AFEA65A3D5C80768F42F0DA929541A59586F
   |texte=   Performance evaluation for document analysis
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024