Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

PINK PANTHER: A COMPLETE ENVIRONMENT FOR GROUND-TRUTHING AND BENCHMARKING DOCUMENT PAGE SEGMENTATION

Identifieur interne : 000270 ( Istex/Curation ); précédent : 000269; suivant : 000271

PINK PANTHER: A COMPLETE ENVIRONMENT FOR GROUND-TRUTHING AND BENCHMARKING DOCUMENT PAGE SEGMENTATION

Auteurs : Berrin A. Yanikoglu [États-Unis] ; Luc Vincent [États-Unis]

Source :

RBID : ISTEX:A9B3D95815AB77E4F32DD782C6F48B935A34A24B

Abstract

We describe a new approach for the automatic evaluation of document page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: segmentation quality is assessed by comparing the segmentation output, described as a set of regions, to the corresponding ground-truth. Error maps are used to keep track of all the errors associated with each pixel, regardless of the document complexity. Misclassifications, splitting, and merging of regions are among the errors detected by the system. Each error can be weighted individually and the system can be customized to benchmark virtually any type of segmentation task.

Url:
DOI: 10.1016/S0031-3203(97)00137-4

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:A9B3D95815AB77E4F32DD782C6F48B935A34A24B

Curation

No country items

Berrin A. Yanikoglu
<affiliation>
<mods:affiliation>E-mail: berrin@almaden.ibm.com</mods:affiliation>
<wicri:noCountry code="no comma">E-mail: berrin@almaden.ibm.com</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA</mods:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120</wicri:regionArea>
</affiliation>

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>PINK PANTHER: A COMPLETE ENVIRONMENT FOR GROUND-TRUTHING AND BENCHMARKING DOCUMENT PAGE SEGMENTATION</title>
<author>
<name sortKey="Yanikoglu, Berrin A" sort="Yanikoglu, Berrin A" uniqKey="Yanikoglu B" first="Berrin A." last="Yanikoglu">Berrin A. Yanikoglu</name>
<affiliation>
<mods:affiliation>E-mail: berrin@almaden.ibm.com</mods:affiliation>
<wicri:noCountry code="no comma">E-mail: berrin@almaden.ibm.com</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA</mods:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Vincent, Luc" sort="Vincent, Luc" uniqKey="Vincent L" first="Luc" last="Vincent">Luc Vincent</name>
<affiliation wicri:level="1">
<mods:affiliation>Xerox Desktop Document Systems, 3400 Hillview Avenue, Palo Alto, CA 94304, USA</mods:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Xerox Desktop Document Systems, 3400 Hillview Avenue, Palo Alto, CA 94304</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:A9B3D95815AB77E4F32DD782C6F48B935A34A24B</idno>
<date when="1998" year="1998">1998</date>
<idno type="doi">10.1016/S0031-3203(97)00137-4</idno>
<idno type="url">https://api.istex.fr/document/A9B3D95815AB77E4F32DD782C6F48B935A34A24B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000275</idno>
<idno type="wicri:Area/Istex/Curation">000270</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">PINK PANTHER: A COMPLETE ENVIRONMENT FOR GROUND-TRUTHING AND BENCHMARKING DOCUMENT PAGE SEGMENTATION</title>
<author>
<name sortKey="Yanikoglu, Berrin A" sort="Yanikoglu, Berrin A" uniqKey="Yanikoglu B" first="Berrin A." last="Yanikoglu">Berrin A. Yanikoglu</name>
<affiliation>
<mods:affiliation>E-mail: berrin@almaden.ibm.com</mods:affiliation>
<wicri:noCountry code="no comma">E-mail: berrin@almaden.ibm.com</wicri:noCountry>
</affiliation>
<affiliation wicri:level="1">
<mods:affiliation>IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA</mods:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120</wicri:regionArea>
</affiliation>
</author>
<author>
<name sortKey="Vincent, Luc" sort="Vincent, Luc" uniqKey="Vincent L" first="Luc" last="Vincent">Luc Vincent</name>
<affiliation wicri:level="1">
<mods:affiliation>Xerox Desktop Document Systems, 3400 Hillview Avenue, Palo Alto, CA 94304, USA</mods:affiliation>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Xerox Desktop Document Systems, 3400 Hillview Avenue, Palo Alto, CA 94304</wicri:regionArea>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1997">1997</date>
<biblScope unit="volume">31</biblScope>
<biblScope unit="issue">9</biblScope>
<biblScope unit="page" from="1191">1191</biblScope>
<biblScope unit="page" to="1204">1204</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">A9B3D95815AB77E4F32DD782C6F48B935A34A24B</idno>
<idno type="DOI">10.1016/S0031-3203(97)00137-4</idno>
<idno type="PII">S0031-3203(97)00137-4</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We describe a new approach for the automatic evaluation of document page segmentation algorithms. Unlike techniques that rely on OCR output, our method is region-based: segmentation quality is assessed by comparing the segmentation output, described as a set of regions, to the corresponding ground-truth. Error maps are used to keep track of all the errors associated with each pixel, regardless of the document complexity. Misclassifications, splitting, and merging of regions are among the errors detected by the system. Each error can be weighted individually and the system can be customized to benchmark virtually any type of segmentation task.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Istex/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000270 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Curation/biblio.hfd -nk 000270 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Istex
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:A9B3D95815AB77E4F32DD782C6F48B935A34A24B
   |texte=   PINK PANTHER: A COMPLETE ENVIRONMENT FOR GROUND-TRUTHING AND BENCHMARKING DOCUMENT PAGE SEGMENTATION
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024