Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis

Identifieur interne : 001820 ( Main/Curation ); précédent : 001819; suivant : 001821

On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis

Auteurs : U. Garain [Inde] ; Bidyut Baran Chaudhuri [Inde]

Source :

RBID : ISTEX:67B2D947CE71BAE7AE954A7AB4101782A775EFF9

Descripteurs français

English descriptors

Abstract

Abstract: Optical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor recognition rate for degraded documents is existence of touching or connected characters, which create a major problem for designing an effective character segmentation procedure. In this paper, a new technique is proposed for segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting cut-points to segment touching characters. Initially, our proposed method has been applied for segmenting touching characters that appear in Devnagari (Hindi) and Bangla, two major scripts in Indian sub-continent. The results obtained from a test-set of considerable size show that a high recognition rate can be achieved with a reasonable amount of computations.

Url:
DOI: 10.1007/3-540-45631-7_52

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:67B2D947CE71BAE7AE954A7AB4101782A775EFF9

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis</title>
<author>
<name sortKey="Garain, U" sort="Garain, U" uniqKey="Garain U" first="U." last="Garain">U. Garain</name>
</author>
<author>
<name sortKey="Chaudhuri, B" sort="Chaudhuri, B" uniqKey="Chaudhuri B" first="B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation>
<country>Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:67B2D947CE71BAE7AE954A7AB4101782A775EFF9</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1007/3-540-45631-7_52</idno>
<idno type="url">https://api.istex.fr/document/67B2D947CE71BAE7AE954A7AB4101782A775EFF9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000671</idno>
<idno type="wicri:Area/Istex/Curation">000663</idno>
<idno type="wicri:Area/Istex/Checkpoint">000F34</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Garain U:on:ocr:of</idno>
<idno type="wicri:Area/Main/Merge">001900</idno>
<idno type="wicri:source">INIST</idno>
<idno type="RBID">Pascal:02-0280432</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000665</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000127</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000601</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Garain U:on:ocr:of</idno>
<idno type="wicri:Area/Main/Merge">001A54</idno>
<idno type="wicri:Area/Main/Curation">001820</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis</title>
<author>
<name sortKey="Garain, U" sort="Garain, U" uniqKey="Garain U" first="U." last="Garain">U. Garain</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road, 700 035, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B" sort="Chaudhuri, B" uniqKey="Chaudhuri B" first="B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road, 700 035, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2002</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">67B2D947CE71BAE7AE954A7AB4101782A775EFF9</idno>
<idno type="DOI">10.1007/3-540-45631-7_52</idno>
<idno type="ChapterID">52</idno>
<idno type="ChapterID">Chap52</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Decision making</term>
<term>Document processing</term>
<term>Fuzzy analysis</term>
<term>Fuzzy decision</term>
<term>Fuzzy logic</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>System performance</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse floue</term>
<term>Analyse multifactorielle</term>
<term>Décision floue</term>
<term>Logique floue</term>
<term>Performance système</term>
<term>Prise décision</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance optique caractère</term>
<term>Traitement document</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Optical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor recognition rate for degraded documents is existence of touching or connected characters, which create a major problem for designing an effective character segmentation procedure. In this paper, a new technique is proposed for segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting cut-points to segment touching characters. Initially, our proposed method has been applied for segmenting touching characters that appear in Devnagari (Hindi) and Bangla, two major scripts in Indian sub-continent. The results obtained from a test-set of considerable size show that a high recognition rate can be achieved with a reasonable amount of computations.</div>
</front>
</TEI>
<double idat="0302-9743:2002:Garain U:on:ocr:of">
<INIST>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">On OCR of degraded documents using fuzzy multifactorial analysis</title>
<author>
<name sortKey="Garain, U" sort="Garain, U" uniqKey="Garain U" first="U." last="Garain">U. Garain</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road</s1>
<s2>Kolkata 700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Kolkata 700 035</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road</s1>
<s2>Kolkata 700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Kolkata 700 035</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">02-0280432</idno>
<date when="2002">2002</date>
<idno type="stanalyst">PASCAL 02-0280432 INIST</idno>
<idno type="RBID">Pascal:02-0280432</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000665</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000127</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000601</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Garain U:on:ocr:of</idno>
<idno type="wicri:Area/Main/Merge">001A54</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">On OCR of degraded documents using fuzzy multifactorial analysis</title>
<author>
<name sortKey="Garain, U" sort="Garain, U" uniqKey="Garain U" first="U." last="Garain">U. Garain</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road</s1>
<s2>Kolkata 700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Kolkata 700 035</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B B" sort="Chaudhuri, B B" uniqKey="Chaudhuri B" first="B. B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road</s1>
<s2>Kolkata 700 035</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Kolkata 700 035</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint>
<date when="2002">2002</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Decision making</term>
<term>Document processing</term>
<term>Fuzzy analysis</term>
<term>Fuzzy decision</term>
<term>Fuzzy logic</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>System performance</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse floue</term>
<term>Prise décision</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Performance système</term>
<term>Traitement document</term>
<term>Décision floue</term>
<term>Logique floue</term>
<term>Reconnaissance forme</term>
<term>Analyse multifactorielle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Optical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor recognition rate for degraded documents is existence of touching or connected characters, which create a major problem for designing an effective character segmentation procedure. In this paper, a new technique is proposed for segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting cut-points to segment touching characters. Initially, our proposed method has been applied for segmenting touching characters that appear in Devnagari (Hindi) and Bangla, two major scripts in Indian sub-continent. The results obtained from a test-set of considerable size show that a high recognition rate can be achieved with a reasonable amount of computations.</div>
</front>
</TEI>
</INIST>
<ISTEX>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis</title>
<author>
<name sortKey="Garain, U" sort="Garain, U" uniqKey="Garain U" first="U." last="Garain">U. Garain</name>
</author>
<author>
<name sortKey="Chaudhuri, B" sort="Chaudhuri, B" uniqKey="Chaudhuri B" first="B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation>
<country>Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:67B2D947CE71BAE7AE954A7AB4101782A775EFF9</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1007/3-540-45631-7_52</idno>
<idno type="url">https://api.istex.fr/document/67B2D947CE71BAE7AE954A7AB4101782A775EFF9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000671</idno>
<idno type="wicri:Area/Istex/Curation">000663</idno>
<idno type="wicri:Area/Istex/Checkpoint">000F34</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Garain U:on:ocr:of</idno>
<idno type="wicri:Area/Main/Merge">001900</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis</title>
<author>
<name sortKey="Garain, U" sort="Garain, U" uniqKey="Garain U" first="U." last="Garain">U. Garain</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road, 700 035, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
</affiliation>
</author>
<author>
<name sortKey="Chaudhuri, B" sort="Chaudhuri, B" uniqKey="Chaudhuri B" first="B." last="Chaudhuri">Bidyut Baran Chaudhuri</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Inde</country>
<wicri:regionArea>Computer Vision & Pattern Recognition Unit, Indian Statistical Institute, 203, B. T. Road, 700 035, Kolkata</wicri:regionArea>
<wicri:noRegion>Kolkata</wicri:noRegion>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Inde</country>
<placeName>
<settlement type="city">Calcutta</settlement>
<region type="province">Bengale-Occidental</region>
</placeName>
<orgName type="lab" n="5">Institut indien de statistiques</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2002</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">67B2D947CE71BAE7AE954A7AB4101782A775EFF9</idno>
<idno type="DOI">10.1007/3-540-45631-7_52</idno>
<idno type="ChapterID">52</idno>
<idno type="ChapterID">Chap52</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Optical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor recognition rate for degraded documents is existence of touching or connected characters, which create a major problem for designing an effective character segmentation procedure. In this paper, a new technique is proposed for segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting cut-points to segment touching characters. Initially, our proposed method has been applied for segmenting touching characters that appear in Devnagari (Hindi) and Bangla, two major scripts in Indian sub-continent. The results obtained from a test-set of considerable size show that a high recognition rate can be achieved with a reasonable amount of computations.</div>
</front>
</TEI>
</ISTEX>
</double>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Curation
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001820 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Curation/biblio.hfd -nk 001820 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Curation
   |type=    RBID
   |clé=     ISTEX:67B2D947CE71BAE7AE954A7AB4101782A775EFF9
   |texte=   On OCR of Degraded Documents Using Fuzzy Multifactorial Analysis
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024