Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Identifieur interne : 000194 ( PascalFrancis/Corpus ); précédent : 000193; suivant : 000195

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Auteurs : Suryaprakash Kompalli ; Srirangaraj Setlur ; Venu Govindaraju

Source :

RBID : Pascal:10-0180818

Descripteurs français

English descriptors

Abstract

This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 1433-2833
A03   1    @0 Int. j. doc. anal. recognit. : (Print)
A05       @2 12
A06       @2 2
A08 01  1  ENG  @1 Devanagari OCR using a recognition driven segmentation framework and stochastic language models
A11 01  1    @1 KOMPALLI (Suryaprakash)
A11 02  1    @1 SETLUR (Srirangaraj)
A11 03  1    @1 GOVINDARAJU (Venu)
A14 01      @1 Department of Computer Science and Engineering, University at Buffalo, State University of New York @2 Buffalo @3 USA @Z 1 aut. @Z 2 aut. @Z 3 aut.
A20       @1 123-138
A21       @1 2009
A23 01      @0 ENG
A43 01      @1 INIST @2 26790 @5 354000170255530050
A44       @0 0000 @1 © 2010 INIST-CNRS. All rights reserved.
A45       @0 47 ref.
A47 01  1    @0 10-0180818
A60       @1 P
A61       @0 A
A64 01  1    @0 International journal on document analysis and recognition : (Print)
A66 01      @0 DEU
C01 01    ENG  @0 This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.
C02 01  X    @0 001D02C03
C03 01  X  FRE  @0 Reconnaissance caractère @5 06
C03 01  X  ENG  @0 Character recognition @5 06
C03 01  X  SPA  @0 Reconocimiento carácter @5 06
C03 02  X  FRE  @0 Reconnaissance optique caractère @5 07
C03 02  X  ENG  @0 Optical character recognition @5 07
C03 02  X  SPA  @0 Reconocimento óptico de caracteres @5 07
C03 03  X  FRE  @0 Concordance forme @5 08
C03 03  X  ENG  @0 Pattern matching @5 08
C03 04  X  FRE  @0 Classification @5 09
C03 04  X  ENG  @0 Classification @5 09
C03 04  X  SPA  @0 Clasificación @5 09
C03 05  X  FRE  @0 Mot @5 10
C03 05  X  ENG  @0 Word @5 10
C03 05  X  SPA  @0 Palabra @5 10
C03 06  X  FRE  @0 Langage naturel @5 11
C03 06  X  ENG  @0 Natural language @5 11
C03 06  X  SPA  @0 Lenguaje natural @5 11
C03 07  X  FRE  @0 Automate stochastique @5 12
C03 07  X  ENG  @0 Stochastic automaton @5 12
C03 07  X  SPA  @0 Autómata estocástico @5 12
C03 08  X  FRE  @0 Automate fini @5 13
C03 08  X  ENG  @0 Finite automaton @5 13
C03 08  X  SPA  @0 Autómata estado finito @5 13
C03 09  X  FRE  @0 Machine état fini @5 14
C03 09  X  ENG  @0 Finite state machine @5 14
C03 09  X  SPA  @0 Máquina estado finito @5 14
C03 10  X  FRE  @0 Linguistique @5 15
C03 10  X  ENG  @0 Linguistics @5 15
C03 10  X  SPA  @0 Linguística @5 15
C03 11  X  FRE  @0 Reconnaissance forme @5 16
C03 11  X  ENG  @0 Pattern recognition @5 16
C03 11  X  SPA  @0 Reconocimiento patrón @5 16
C03 12  X  FRE  @0 Traitement image @5 17
C03 12  X  ENG  @0 Image processing @5 17
C03 12  X  SPA  @0 Procesamiento imagen @5 17
C03 13  X  FRE  @0 Segmentation @5 23
C03 13  X  ENG  @0 Segmentation @5 23
C03 13  X  SPA  @0 Segmentación @5 23
C03 14  X  FRE  @0 Approche probabiliste @5 24
C03 14  X  ENG  @0 Probabilistic approach @5 24
C03 14  X  SPA  @0 Enfoque probabilista @5 24
C03 15  X  FRE  @0 Modélisation @5 25
C03 15  X  ENG  @0 Modeling @5 25
C03 15  X  SPA  @0 Modelización @5 25
C03 16  X  FRE  @0 Méthode graphe @5 26
C03 16  X  ENG  @0 Graph method @5 26
C03 16  X  SPA  @0 Método grafo @5 26
C03 17  X  FRE  @0 Théorie graphe @5 27
C03 17  X  ENG  @0 Graph theory @5 27
C03 17  X  SPA  @0 Teoría grafo @5 27
C03 18  X  FRE  @0 . @4 INC @5 82
C03 19  X  FRE  @0 Appariement image @4 CD @5 96
C03 19  X  ENG  @0 Image matching @4 CD @5 96
C03 19  X  SPA  @0 reconocimiento de patrones en imágenes @4 CD @5 96
C03 20  X  FRE  @0 Modèle n gramme @4 CD @5 97
C03 20  X  ENG  @0 N gram model @4 CD @5 97
C03 20  X  SPA  @0 Modelo n grama @4 CD @5 97
N21       @1 123
N44 01      @1 OTO
N82       @1 OTO

Format Inist (serveur)

NO : PASCAL 10-0180818 INIST
ET : Devanagari OCR using a recognition driven segmentation framework and stochastic language models
AU : KOMPALLI (Suryaprakash); SETLUR (Srirangaraj); GOVINDARAJU (Venu)
AF : Department of Computer Science and Engineering, University at Buffalo, State University of New York/Buffalo/Etats-Unis (1 aut., 2 aut., 3 aut.)
DT : Publication en série; Niveau analytique
SO : International journal on document analysis and recognition : (Print); ISSN 1433-2833; Allemagne; Da. 2009; Vol. 12; No. 2; Pp. 123-138; Bibl. 47 ref.
LA : Anglais
EA : This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.
CC : 001D02C03
FD : Reconnaissance caractère; Reconnaissance optique caractère; Concordance forme; Classification; Mot; Langage naturel; Automate stochastique; Automate fini; Machine état fini; Linguistique; Reconnaissance forme; Traitement image; Segmentation; Approche probabiliste; Modélisation; Méthode graphe; Théorie graphe; .; Appariement image; Modèle n gramme
ED : Character recognition; Optical character recognition; Pattern matching; Classification; Word; Natural language; Stochastic automaton; Finite automaton; Finite state machine; Linguistics; Pattern recognition; Image processing; Segmentation; Probabilistic approach; Modeling; Graph method; Graph theory; Image matching; N gram model
SD : Reconocimiento carácter; Reconocimento óptico de caracteres; Clasificación; Palabra; Lenguaje natural; Autómata estocástico; Autómata estado finito; Máquina estado finito; Linguística; Reconocimiento patrón; Procesamiento imagen; Segmentación; Enfoque probabilista; Modelización; Método grafo; Teoría grafo; reconocimiento de patrones en imágenes; Modelo n grama
LO : INIST-26790.354000170255530050
ID : 10-0180818

Links to Exploration step

Pascal:10-0180818

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author>
<name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venu Govindaraju</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0180818</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0180818 INIST</idno>
<idno type="RBID">Pascal:10-0180818</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000194</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author>
<name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venu Govindaraju</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Classification</term>
<term>Finite automaton</term>
<term>Finite state machine</term>
<term>Graph method</term>
<term>Graph theory</term>
<term>Image matching</term>
<term>Image processing</term>
<term>Linguistics</term>
<term>Modeling</term>
<term>N gram model</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Pattern recognition</term>
<term>Probabilistic approach</term>
<term>Segmentation</term>
<term>Stochastic automaton</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Classification</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Automate stochastique</term>
<term>Automate fini</term>
<term>Machine état fini</term>
<term>Linguistique</term>
<term>Reconnaissance forme</term>
<term>Traitement image</term>
<term>Segmentation</term>
<term>Approche probabiliste</term>
<term>Modélisation</term>
<term>Méthode graphe</term>
<term>Théorie graphe</term>
<term>.</term>
<term>Appariement image</term>
<term>Modèle n gramme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>1433-2833</s0>
</fA01>
<fA03 i2="1">
<s0>Int. j. doc. anal. recognit. : (Print)</s0>
</fA03>
<fA05>
<s2>12</s2>
</fA05>
<fA06>
<s2>2</s2>
</fA06>
<fA08 i1="01" i2="1" l="ENG">
<s1>Devanagari OCR using a recognition driven segmentation framework and stochastic language models</s1>
</fA08>
<fA11 i1="01" i2="1">
<s1>KOMPALLI (Suryaprakash)</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>SETLUR (Srirangaraj)</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>GOVINDARAJU (Venu)</s1>
</fA11>
<fA14 i1="01">
<s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</fA14>
<fA20>
<s1>123-138</s1>
</fA20>
<fA21>
<s1>2009</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA43 i1="01">
<s1>INIST</s1>
<s2>26790</s2>
<s5>354000170255530050</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2010 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>47 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>10-0180818</s0>
</fA47>
<fA60>
<s1>P</s1>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>International journal on document analysis and recognition : (Print)</s0>
</fA64>
<fA66 i1="01">
<s0>DEU</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</s0>
</fC01>
<fC02 i1="01" i2="X">
<s0>001D02C03</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Reconnaissance caractère</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Character recognition</s0>
<s5>06</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Reconocimiento carácter</s0>
<s5>06</s5>
</fC03>
<fC03 i1="02" i2="X" l="FRE">
<s0>Reconnaissance optique caractère</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="ENG">
<s0>Optical character recognition</s0>
<s5>07</s5>
</fC03>
<fC03 i1="02" i2="X" l="SPA">
<s0>Reconocimento óptico de caracteres</s0>
<s5>07</s5>
</fC03>
<fC03 i1="03" i2="X" l="FRE">
<s0>Concordance forme</s0>
<s5>08</s5>
</fC03>
<fC03 i1="03" i2="X" l="ENG">
<s0>Pattern matching</s0>
<s5>08</s5>
</fC03>
<fC03 i1="04" i2="X" l="FRE">
<s0>Classification</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="ENG">
<s0>Classification</s0>
<s5>09</s5>
</fC03>
<fC03 i1="04" i2="X" l="SPA">
<s0>Clasificación</s0>
<s5>09</s5>
</fC03>
<fC03 i1="05" i2="X" l="FRE">
<s0>Mot</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="ENG">
<s0>Word</s0>
<s5>10</s5>
</fC03>
<fC03 i1="05" i2="X" l="SPA">
<s0>Palabra</s0>
<s5>10</s5>
</fC03>
<fC03 i1="06" i2="X" l="FRE">
<s0>Langage naturel</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="ENG">
<s0>Natural language</s0>
<s5>11</s5>
</fC03>
<fC03 i1="06" i2="X" l="SPA">
<s0>Lenguaje natural</s0>
<s5>11</s5>
</fC03>
<fC03 i1="07" i2="X" l="FRE">
<s0>Automate stochastique</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="ENG">
<s0>Stochastic automaton</s0>
<s5>12</s5>
</fC03>
<fC03 i1="07" i2="X" l="SPA">
<s0>Autómata estocástico</s0>
<s5>12</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Automate fini</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Finite automaton</s0>
<s5>13</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Autómata estado finito</s0>
<s5>13</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Machine état fini</s0>
<s5>14</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Finite state machine</s0>
<s5>14</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Máquina estado finito</s0>
<s5>14</s5>
</fC03>
<fC03 i1="10" i2="X" l="FRE">
<s0>Linguistique</s0>
<s5>15</s5>
</fC03>
<fC03 i1="10" i2="X" l="ENG">
<s0>Linguistics</s0>
<s5>15</s5>
</fC03>
<fC03 i1="10" i2="X" l="SPA">
<s0>Linguística</s0>
<s5>15</s5>
</fC03>
<fC03 i1="11" i2="X" l="FRE">
<s0>Reconnaissance forme</s0>
<s5>16</s5>
</fC03>
<fC03 i1="11" i2="X" l="ENG">
<s0>Pattern recognition</s0>
<s5>16</s5>
</fC03>
<fC03 i1="11" i2="X" l="SPA">
<s0>Reconocimiento patrón</s0>
<s5>16</s5>
</fC03>
<fC03 i1="12" i2="X" l="FRE">
<s0>Traitement image</s0>
<s5>17</s5>
</fC03>
<fC03 i1="12" i2="X" l="ENG">
<s0>Image processing</s0>
<s5>17</s5>
</fC03>
<fC03 i1="12" i2="X" l="SPA">
<s0>Procesamiento imagen</s0>
<s5>17</s5>
</fC03>
<fC03 i1="13" i2="X" l="FRE">
<s0>Segmentation</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="ENG">
<s0>Segmentation</s0>
<s5>23</s5>
</fC03>
<fC03 i1="13" i2="X" l="SPA">
<s0>Segmentación</s0>
<s5>23</s5>
</fC03>
<fC03 i1="14" i2="X" l="FRE">
<s0>Approche probabiliste</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="ENG">
<s0>Probabilistic approach</s0>
<s5>24</s5>
</fC03>
<fC03 i1="14" i2="X" l="SPA">
<s0>Enfoque probabilista</s0>
<s5>24</s5>
</fC03>
<fC03 i1="15" i2="X" l="FRE">
<s0>Modélisation</s0>
<s5>25</s5>
</fC03>
<fC03 i1="15" i2="X" l="ENG">
<s0>Modeling</s0>
<s5>25</s5>
</fC03>
<fC03 i1="15" i2="X" l="SPA">
<s0>Modelización</s0>
<s5>25</s5>
</fC03>
<fC03 i1="16" i2="X" l="FRE">
<s0>Méthode graphe</s0>
<s5>26</s5>
</fC03>
<fC03 i1="16" i2="X" l="ENG">
<s0>Graph method</s0>
<s5>26</s5>
</fC03>
<fC03 i1="16" i2="X" l="SPA">
<s0>Método grafo</s0>
<s5>26</s5>
</fC03>
<fC03 i1="17" i2="X" l="FRE">
<s0>Théorie graphe</s0>
<s5>27</s5>
</fC03>
<fC03 i1="17" i2="X" l="ENG">
<s0>Graph theory</s0>
<s5>27</s5>
</fC03>
<fC03 i1="17" i2="X" l="SPA">
<s0>Teoría grafo</s0>
<s5>27</s5>
</fC03>
<fC03 i1="18" i2="X" l="FRE">
<s0>.</s0>
<s4>INC</s4>
<s5>82</s5>
</fC03>
<fC03 i1="19" i2="X" l="FRE">
<s0>Appariement image</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="19" i2="X" l="ENG">
<s0>Image matching</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="19" i2="X" l="SPA">
<s0>reconocimiento de patrones en imágenes</s0>
<s4>CD</s4>
<s5>96</s5>
</fC03>
<fC03 i1="20" i2="X" l="FRE">
<s0>Modèle n gramme</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fC03 i1="20" i2="X" l="ENG">
<s0>N gram model</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fC03 i1="20" i2="X" l="SPA">
<s0>Modelo n grama</s0>
<s4>CD</s4>
<s5>97</s5>
</fC03>
<fN21>
<s1>123</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
</standard>
<server>
<NO>PASCAL 10-0180818 INIST</NO>
<ET>Devanagari OCR using a recognition driven segmentation framework and stochastic language models</ET>
<AU>KOMPALLI (Suryaprakash); SETLUR (Srirangaraj); GOVINDARAJU (Venu)</AU>
<AF>Department of Computer Science and Engineering, University at Buffalo, State University of New York/Buffalo/Etats-Unis (1 aut., 2 aut., 3 aut.)</AF>
<DT>Publication en série; Niveau analytique</DT>
<SO>International journal on document analysis and recognition : (Print); ISSN 1433-2833; Allemagne; Da. 2009; Vol. 12; No. 2; Pp. 123-138; Bibl. 47 ref.</SO>
<LA>Anglais</LA>
<EA>This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</EA>
<CC>001D02C03</CC>
<FD>Reconnaissance caractère; Reconnaissance optique caractère; Concordance forme; Classification; Mot; Langage naturel; Automate stochastique; Automate fini; Machine état fini; Linguistique; Reconnaissance forme; Traitement image; Segmentation; Approche probabiliste; Modélisation; Méthode graphe; Théorie graphe; .; Appariement image; Modèle n gramme</FD>
<ED>Character recognition; Optical character recognition; Pattern matching; Classification; Word; Natural language; Stochastic automaton; Finite automaton; Finite state machine; Linguistics; Pattern recognition; Image processing; Segmentation; Probabilistic approach; Modeling; Graph method; Graph theory; Image matching; N gram model</ED>
<SD>Reconocimiento carácter; Reconocimento óptico de caracteres; Clasificación; Palabra; Lenguaje natural; Autómata estocástico; Autómata estado finito; Máquina estado finito; Linguística; Reconocimiento patrón; Procesamiento imagen; Segmentación; Enfoque probabilista; Modelización; Método grafo; Teoría grafo; reconocimiento de patrones en imágenes; Modelo n grama</SD>
<LO>INIST-26790.354000170255530050</LO>
<ID>10-0180818</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000194 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000194 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:10-0180818
   |texte=   Devanagari OCR using a recognition driven segmentation framework and stochastic language models
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024