OcrV1, Main, Merge, bibRecord, 002142

Scale space technique for word segmentation in handwritten documents

Identifieur interne : 002142 ( Main/Merge ); précédent : 002141; suivant : 002143

Scale space technique for word segmentation in handwritten documents

Auteurs : R. Manmatha [États-Unis] ; N. Srimal [États-Unis]

Source :

Lecture notes in computer science [ 0302-9743 ] ; 1999.

RBID : Pascal:99-0517474

Descripteurs français

Pascal (Inist)
- Segmentation, Ecriture, Mot, Caractère manuscrit, Reconnaissance optique caractère, Image niveau gris, Analyse documentaire, Image binaire, Segmentation ligne, Segmentation mot.

English descriptors

KwdEn :
- Binary image, Document analysis, Grey level image, Hand writing, Manuscript character, Optical character recognition, Segmentation, Word.

Abstract

Indexing large archives of historical manuscripts, like the papers of George Washington, is required to allow rapid perusal by scholars and researchers who wish to consult the original manuscripts. Presently, such large archives are indexed manually. Since optical character recognition (OCR) works poorly with handwriting, a scheme based on matching word images called word spotting has been suggested previously for indexing such documents. The important steps in this scheme are segmentation of a document page into words and creation of lists containing instances of the same word by word image matching. We have developed a novel methodology for segmenting handwritten document images by analyzing the extent of "blobs' in a scale space representationof the image. We believe this is the first application of scale space to this problem. The algorithm has been applied to around 30 grey level images randomly picked from different sections of the George Washington corpus of 6,400 handwritten docurnent images. An accuracy of 77 - 96 percent was observed with an average accuracy of around 87 percent. The algorithm works well in the presence of noise, shine through and other artifacts which may arise due aging and degradation of the page over a couple of centuries or through the man made processes of photocopying and scanning.

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000795
to stream PascalFrancis, to step Curation: 000B99
to stream PascalFrancis, to step Checkpoint: 000761

Links to Exploration step

Pascal:99-0517474

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Scale space technique for word segmentation in handwritten documents</title>
<author><name sortKey="Manmatha, R" sort="Manmatha, R" uniqKey="Manmatha R" first="R." last="Manmatha">R. Manmatha</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Computer Science Department, University of Massachusetts</s1>
<s2>Amherst MA 01003</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><settlement type="city">Amherst (Massachusetts)</settlement>
<region type="state">Massachusetts</region>
</placeName>
<orgName type="university">Université du Massachusetts</orgName>
</affiliation>
</author>
<author><name sortKey="Srimal, N" sort="Srimal, N" uniqKey="Srimal N" first="N." last="Srimal">N. Srimal</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Computer Science Department, University of Massachusetts</s1>
<s2>Amherst MA 01003</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><settlement type="city">Amherst (Massachusetts)</settlement>
<region type="state">Massachusetts</region>
</placeName>
<orgName type="university">Université du Massachusetts</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">99-0517474</idno>
<date when="1999">1999</date>
<idno type="stanalyst">PASCAL 99-0517474 INIST</idno>
<idno type="RBID">Pascal:99-0517474</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000795</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000B99</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000761</idno>
<idno type="wicri:doubleKey">0302-9743:1999:Manmatha R:scale:space:technique</idno>
<idno type="wicri:Area/Main/Merge">002142</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Scale space technique for word segmentation in handwritten documents</title>
<author><name sortKey="Manmatha, R" sort="Manmatha, R" uniqKey="Manmatha R" first="R." last="Manmatha">R. Manmatha</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Computer Science Department, University of Massachusetts</s1>
<s2>Amherst MA 01003</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><settlement type="city">Amherst (Massachusetts)</settlement>
<region type="state">Massachusetts</region>
</placeName>
<orgName type="university">Université du Massachusetts</orgName>
</affiliation>
</author>
<author><name sortKey="Srimal, N" sort="Srimal, N" uniqKey="Srimal N" first="N." last="Srimal">N. Srimal</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Computer Science Department, University of Massachusetts</s1>
<s2>Amherst MA 01003</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName><settlement type="city">Amherst (Massachusetts)</settlement>
<region type="state">Massachusetts</region>
</placeName>
<orgName type="university">Université du Massachusetts</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Binary image</term>
<term>Document analysis</term>
<term>Grey level image</term>
<term>Hand writing</term>
<term>Manuscript character</term>
<term>Optical character recognition</term>
<term>Segmentation</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Segmentation</term>
<term>Ecriture</term>
<term>Mot</term>
<term>Caractère manuscrit</term>
<term>Reconnaissance optique caractère</term>
<term>Image niveau gris</term>
<term>Analyse documentaire</term>
<term>Image binaire</term>
<term>Segmentation ligne</term>
<term>Segmentation mot</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Indexing large archives of historical manuscripts, like the papers of George Washington, is required to allow rapid perusal by scholars and researchers who wish to consult the original manuscripts. Presently, such large archives are indexed manually. Since optical character recognition (OCR) works poorly with handwriting, a scheme based on matching word images called word spotting has been suggested previously for indexing such documents. The important steps in this scheme are segmentation of a document page into words and creation of lists containing instances of the same word by word image matching. We have developed a novel methodology for segmenting handwritten document images by analyzing the extent of "blobs' in a scale space representationof the image. We believe this is the first application of scale space to this problem. The algorithm has been applied to around 30 grey level images randomly picked from different sections of the George Washington corpus of 6,400 handwritten docurnent images. An accuracy of 77 - 96 percent was observed with an average accuracy of around 87 percent. The algorithm works well in the presence of noise, shine through and other artifacts which may arise due aging and degradation of the page over a couple of centuries or through the man made processes of photocopying and scanning.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>Massachusetts</li>
</region>
<settlement><li>Amherst (Massachusetts)</li>
</settlement>
<orgName><li>Université du Massachusetts</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="Massachusetts"><name sortKey="Manmatha, R" sort="Manmatha, R" uniqKey="Manmatha R" first="R." last="Manmatha">R. Manmatha</name>
</region>
<name sortKey="Srimal, N" sort="Srimal, N" uniqKey="Srimal N" first="N." last="Srimal">N. Srimal</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002142 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002142 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     Pascal:99-0517474
   |texte=   Scale space technique for word segmentation in handwritten documents
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Scale space technique for word segmentation in handwritten documents

Scale space technique for word segmentation in handwritten documents

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri