OcrV1, PascalFrancis, Corpus, bibRecord, 000194

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Identifieur interne : 000194 ( PascalFrancis/Corpus ); précédent : 000193; suivant : 000195

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Auteurs : Suryaprakash Kompalli ; Srirangaraj Setlur ; Venu Govindaraju

Source :

International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2009.

RBID : Pascal:10-0180818

Descripteurs français

Pascal (Inist)
- Reconnaissance caractère, Reconnaissance optique caractère, Concordance forme, Classification, Mot, Langage naturel, Automate stochastique, Automate fini, Machine état fini, Linguistique, Reconnaissance forme, Traitement image, Segmentation, Approche probabiliste, Modélisation, Méthode graphe, Théorie graphe, ., Appariement image, Modèle n gramme.

English descriptors

KwdEn :
- Character recognition, Classification, Finite automaton, Finite state machine, Graph method, Graph theory, Image matching, Image processing, Linguistics, Modeling, N gram model, Natural language, Optical character recognition, Pattern matching, Pattern recognition, Probabilistic approach, Segmentation, Stochastic automaton, Word.

Abstract

This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 1433-2833`
A03		`1`		`@0 Int. j. doc. anal. recognit. : (Print)`
A05				`@2 12`
A06				`@2 2`
A08	`01`	`1`	`ENG`	`@1 Devanagari OCR using a recognition driven segmentation framework and stochastic language models`
A11	`01`	`1`		`@1 KOMPALLI (Suryaprakash)`
A11	`02`	`1`		`@1 SETLUR (Srirangaraj)`
A11	`03`	`1`		`@1 GOVINDARAJU (Venu)`
A14	`01`			`@1 Department of Computer Science and Engineering, University at Buffalo, State University of New York @2 Buffalo @3 USA @Z 1 aut. @Z 2 aut. @Z 3 aut.`
A20				`@1 123-138`
A21				`@1 2009`
A23	`01`			`@0 ENG`
A43	`01`			`@1 INIST @2 26790 @5 354000170255530050`
A44				`@0 0000 @1 © 2010 INIST-CNRS. All rights reserved.`
A45				`@0 47 ref.`
A47	`01`	`1`		`@0 10-0180818`
A60				`@1 P`
A61				`@0 A`
A64	`01`	`1`		`@0 International journal on document analysis and recognition : (Print)`
A66	`01`			`@0 DEU`
C01	`01`		`ENG`	@0 This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.
C02	`01`	`X`		`@0 001D02C03`
C03	`01`	`X`	`FRE`	`@0 Reconnaissance caractère @5 06`
C03	`01`	`X`	`ENG`	`@0 Character recognition @5 06`
C03	`01`	`X`	`SPA`	`@0 Reconocimiento carácter @5 06`
C03	`02`	`X`	`FRE`	`@0 Reconnaissance optique caractère @5 07`
C03	`02`	`X`	`ENG`	`@0 Optical character recognition @5 07`
C03	`02`	`X`	`SPA`	`@0 Reconocimento óptico de caracteres @5 07`
C03	`03`	`X`	`FRE`	`@0 Concordance forme @5 08`
C03	`03`	`X`	`ENG`	`@0 Pattern matching @5 08`
C03	`04`	`X`	`FRE`	`@0 Classification @5 09`
C03	`04`	`X`	`ENG`	`@0 Classification @5 09`
C03	`04`	`X`	`SPA`	`@0 Clasificación @5 09`
C03	`05`	`X`	`FRE`	`@0 Mot @5 10`
C03	`05`	`X`	`ENG`	`@0 Word @5 10`
C03	`05`	`X`	`SPA`	`@0 Palabra @5 10`
C03	`06`	`X`	`FRE`	`@0 Langage naturel @5 11`
C03	`06`	`X`	`ENG`	`@0 Natural language @5 11`
C03	`06`	`X`	`SPA`	`@0 Lenguaje natural @5 11`
C03	`07`	`X`	`FRE`	`@0 Automate stochastique @5 12`
C03	`07`	`X`	`ENG`	`@0 Stochastic automaton @5 12`
C03	`07`	`X`	`SPA`	`@0 Autómata estocástico @5 12`
C03	`08`	`X`	`FRE`	`@0 Automate fini @5 13`
C03	`08`	`X`	`ENG`	`@0 Finite automaton @5 13`
C03	`08`	`X`	`SPA`	`@0 Autómata estado finito @5 13`
C03	`09`	`X`	`FRE`	`@0 Machine état fini @5 14`
C03	`09`	`X`	`ENG`	`@0 Finite state machine @5 14`
C03	`09`	`X`	`SPA`	`@0 Máquina estado finito @5 14`
C03	`10`	`X`	`FRE`	`@0 Linguistique @5 15`
C03	`10`	`X`	`ENG`	`@0 Linguistics @5 15`
C03	`10`	`X`	`SPA`	`@0 Linguística @5 15`
C03	`11`	`X`	`FRE`	`@0 Reconnaissance forme @5 16`
C03	`11`	`X`	`ENG`	`@0 Pattern recognition @5 16`
C03	`11`	`X`	`SPA`	`@0 Reconocimiento patrón @5 16`
C03	`12`	`X`	`FRE`	`@0 Traitement image @5 17`
C03	`12`	`X`	`ENG`	`@0 Image processing @5 17`
C03	`12`	`X`	`SPA`	`@0 Procesamiento imagen @5 17`
C03	`13`	`X`	`FRE`	`@0 Segmentation @5 23`
C03	`13`	`X`	`ENG`	`@0 Segmentation @5 23`
C03	`13`	`X`	`SPA`	`@0 Segmentación @5 23`
C03	`14`	`X`	`FRE`	`@0 Approche probabiliste @5 24`
C03	`14`	`X`	`ENG`	`@0 Probabilistic approach @5 24`
C03	`14`	`X`	`SPA`	`@0 Enfoque probabilista @5 24`
C03	`15`	`X`	`FRE`	`@0 Modélisation @5 25`
C03	`15`	`X`	`ENG`	`@0 Modeling @5 25`
C03	`15`	`X`	`SPA`	`@0 Modelización @5 25`
C03	`16`	`X`	`FRE`	`@0 Méthode graphe @5 26`
C03	`16`	`X`	`ENG`	`@0 Graph method @5 26`
C03	`16`	`X`	`SPA`	`@0 Método grafo @5 26`
C03	`17`	`X`	`FRE`	`@0 Théorie graphe @5 27`
C03	`17`	`X`	`ENG`	`@0 Graph theory @5 27`
C03	`17`	`X`	`SPA`	`@0 Teoría grafo @5 27`
C03	`18`	`X`	`FRE`	`@0 . @4 INC @5 82`
C03	`19`	`X`	`FRE`	`@0 Appariement image @4 CD @5 96`
C03	`19`	`X`	`ENG`	`@0 Image matching @4 CD @5 96`
C03	`19`	`X`	`SPA`	`@0 reconocimiento de patrones en imágenes @4 CD @5 96`
C03	`20`	`X`	`FRE`	`@0 Modèle n gramme @4 CD @5 97`
C03	`20`	`X`	`ENG`	`@0 N gram model @4 CD @5 97`
C03	`20`	`X`	`SPA`	`@0 Modelo n grama @4 CD @5 97`
N21				`@1 123`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

Format Inist (serveur)

NO :	PASCAL 10-0180818 INIST
ET :	Devanagari OCR using a recognition driven segmentation framework and stochastic language models
AU :	KOMPALLI (Suryaprakash); SETLUR (Srirangaraj); GOVINDARAJU (Venu)
AF :	Department of Computer Science and Engineering, University at Buffalo, State University of New York/Buffalo/Etats-Unis (1 aut., 2 aut., 3 aut.)
DT :	Publication en série; Niveau analytique
SO :	International journal on document analysis and recognition : (Print); ISSN 1433-2833; Allemagne; Da. 2009; Vol. 12; No. 2; Pp. 123-138; Bibl. 47 ref.
LA :	Anglais
EA :	This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.
CC :	001D02C03
FD :	Reconnaissance caractère; Reconnaissance optique caractère; Concordance forme; Classification; Mot; Langage naturel; Automate stochastique; Automate fini; Machine état fini; Linguistique; Reconnaissance forme; Traitement image; Segmentation; Approche probabiliste; Modélisation; Méthode graphe; Théorie graphe; .; Appariement image; Modèle n gramme
ED :	Character recognition; Optical character recognition; Pattern matching; Classification; Word; Natural language; Stochastic automaton; Finite automaton; Finite state machine; Linguistics; Pattern recognition; Image processing; Segmentation; Probabilistic approach; Modeling; Graph method; Graph theory; Image matching; N gram model
SD :	Reconocimiento carácter; Reconocimento óptico de caracteres; Clasificación; Palabra; Lenguaje natural; Autómata estocástico; Autómata estado finito; Máquina estado finito; Linguística; Reconocimiento patrón; Procesamiento imagen; Segmentación; Enfoque probabilista; Modelización; Método grafo; Teoría grafo; reconocimiento de patrones en imágenes; Modelo n grama
LO :	INIST-26790.354000170255530050
ID :	10-0180818

Links to Exploration step

Pascal:10-0180818

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venu Govindaraju</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0180818</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0180818 INIST</idno>
<idno type="RBID">Pascal:10-0180818</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000194</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venu Govindaraju</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Classification</term>
<term>Finite automaton</term>
<term>Finite state machine</term>
<term>Graph method</term>
<term>Graph theory</term>
<term>Image matching</term>
<term>Image processing</term>
<term>Linguistics</term>
<term>Modeling</term>
<term>N gram model</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Pattern recognition</term>
<term>Probabilistic approach</term>
<term>Segmentation</term>
<term>Stochastic automaton</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Classification</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Automate stochastique</term>
<term>Automate fini</term>
<term>Machine état fini</term>
<term>Linguistique</term>
<term>Reconnaissance forme</term>
<term>Traitement image</term>
<term>Segmentation</term>
<term>Approche probabiliste</term>
<term>Modélisation</term>
<term>Méthode graphe</term>
<term>Théorie graphe</term>
<term>.</term>
<term>Appariement image</term>
<term>Modèle n gramme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>1433-2833</s0>
</fA01>
<fA03 i2="1"><s0>Int. j. doc. anal. recognit. : (Print)</s0>
</fA03>
<fA05><s2>12</s2>
</fA05>
<fA06><s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG"><s1>Devanagari OCR using a recognition driven segmentation framework and stochastic language models</s1>
</fA08>
<fA11 i1="01" i2="1"><s1>KOMPALLI (Suryaprakash)</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>SETLUR (Srirangaraj)</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>GOVINDARAJU (Venu)</s1>
</fA11>
<fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA20><s1>123-138</s1>
</fA20>
<fA21><s1>2009</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA43 i1="01"><s1>INIST</s1>
<s2>26790</s2>
<s5>354000170255530050</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2010 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>47 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>10-0180818</s0>
</fA47>
<fA60><s1>P</s1>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>International journal on document analysis and recognition : (Print)</s0>
</fA64>
<fA66 i1="01"><s0>DEU</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</s0>
</fC01>
<fC02 i1="01" i2="X"><s0>001D02C03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Reconnaissance caractère</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Character recognition</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Reconocimiento carácter</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG"><s0>Optical character recognition</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA"><s0>Reconocimento óptico de caracteres</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE"><s0>Concordance forme</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG"><s0>Pattern matching</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE"><s0>Classification</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG"><s0>Classification</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA"><s0>Clasificación</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE"><s0>Mot</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG"><s0>Word</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA"><s0>Palabra</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE"><s0>Langage naturel</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG"><s0>Natural language</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA"><s0>Lenguaje natural</s0>
<s5>11</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE"><s0>Automate stochastique</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG"><s0>Stochastic automaton</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA"><s0>Autómata estocástico</s0>
<s5>12</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Automate fini</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Finite automaton</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Autómata estado finito</s0>
<s5>13</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Machine état fini</s0>
<s5>14</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Finite state machine</s0>
<s5>14</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Máquina estado finito</s0>
<s5>14</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE"><s0>Linguistique</s0>
<s5>15</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG"><s0>Linguistics</s0>
<s5>15</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA"><s0>Linguística</s0>
<s5>15</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE"><s0>Reconnaissance forme</s0>
<s5>16</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG"><s0>Pattern recognition</s0>
<s5>16</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA"><s0>Reconocimiento patrón</s0>
<s5>16</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE"><s0>Traitement image</s0>
<s5>17</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG"><s0>Image processing</s0>
<s5>17</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA"><s0>Procesamiento imagen</s0>
<s5>17</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE"><s0>Segmentation</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG"><s0>Segmentation</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA"><s0>Segmentación</s0>
<s5>23</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE"><s0>Approche probabiliste</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG"><s0>Probabilistic approach</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA"><s0>Enfoque probabilista</s0>
<s5>24</s5>
</fC03>
<fC03 i1="15" i2="X" l="FRE"><s0>Modélisation</s0>
<s5>25</s5>
</fC03>
<fC03 i1="15" i2="X" l="ENG"><s0>Modeling</s0>
<s5>25</s5>
</fC03>
<fC03 i1="15" i2="X" l="SPA"><s0>Modelización</s0>
<s5>25</s5>
</fC03>
<fC03 i1="16" i2="X" l="FRE"><s0>Méthode graphe</s0>
<s5>26</s5>
</fC03>
<fC03 i1="16" i2="X" l="ENG"><s0>Graph method</s0>
<s5>26</s5>
</fC03>
<fC03 i1="16" i2="X" l="SPA"><s0>Método grafo</s0>
<s5>26</s5>
</fC03>
<fC03 i1="17" i2="X" l="FRE"><s0>Théorie graphe</s0>
<s5>27</s5>
</fC03>
<fC03 i1="17" i2="X" l="ENG"><s0>Graph theory</s0>
<s5>27</s5>
</fC03>
<fC03 i1="17" i2="X" l="SPA"><s0>Teoría grafo</s0>
<s5>27</s5>
</fC03>
<fC03 i1="18" i2="X" l="FRE"><s0>.</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fC03 i1="19" i2="X" l="FRE"><s0>Appariement image</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="19" i2="X" l="ENG"><s0>Image matching</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="19" i2="X" l="SPA"><s0>reconocimiento de patrones en imágenes</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="20" i2="X" l="FRE"><s0>Modèle n gramme</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fC03 i1="20" i2="X" l="ENG"><s0>N gram model</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fC03 i1="20" i2="X" l="SPA"><s0>Modelo n grama</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fN21><s1>123</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
</standard>
<server><NO>PASCAL 10-0180818 INIST</NO>
<ET>Devanagari OCR using a recognition driven segmentation framework and stochastic language models</ET>
<AU>KOMPALLI (Suryaprakash); SETLUR (Srirangaraj); GOVINDARAJU (Venu)</AU>
<AF>Department of Computer Science and Engineering, University at Buffalo, State University of New York/Buffalo/Etats-Unis (1 aut., 2 aut., 3 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>International journal on document analysis and recognition : (Print); ISSN 1433-2833; Allemagne; Da. 2009; Vol. 12; No. 2; Pp. 123-138; Bibl. 47 ref.</SO>
<LA>Anglais</LA>
<EA>This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</EA>
<CC>001D02C03</CC>
<FD>Reconnaissance caractère; Reconnaissance optique caractère; Concordance forme; Classification; Mot; Langage naturel; Automate stochastique; Automate fini; Machine état fini; Linguistique; Reconnaissance forme; Traitement image; Segmentation; Approche probabiliste; Modélisation; Méthode graphe; Théorie graphe; .; Appariement image; Modèle n gramme</FD>
<ED>Character recognition; Optical character recognition; Pattern matching; Classification; Word; Natural language; Stochastic automaton; Finite automaton; Finite state machine; Linguistics; Pattern recognition; Image processing; Segmentation; Probabilistic approach; Modeling; Graph method; Graph theory; Image matching; N gram model</ED>
<SD>Reconocimiento carácter; Reconocimento óptico de caracteres; Clasificación; Palabra; Lenguaje natural; Autómata estocástico; Autómata estado finito; Máquina estado finito; Linguística; Reconocimiento patrón; Procesamiento imagen; Segmentación; Enfoque probabilista; Modelización; Método grafo; Teoría grafo; reconocimiento de patrones en imágenes; Modelo n grama</SD>
<LO>INIST-26790.354000170255530050</LO>
<ID>10-0180818</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000194 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000194 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:10-0180818
   |texte=   Devanagari OCR using a recognition driven segmentation framework and stochastic language models
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri