Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Multi-font printed Mongolian document recognition system

Identifieur interne : 000216 ( PascalFrancis/Corpus ); précédent : 000215; suivant : 000217

Multi-font printed Mongolian document recognition system

Auteurs : LIANGRUI PENG ; CHANGSONG LIU ; XIAOQING DING ; HUA WANG ; JIANMING JIN

Source :

RBID : Pascal:09-0372245

Descripteurs français

English descriptors

Abstract

Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

pA  
A01 01  1    @0 0277-786X
A02 01      @0 PSISDG
A03   1    @0 Proc. SPIE Int. Soc. Opt. Eng.
A05       @2 7247
A08 01  1  ENG  @1 Multi-font printed Mongolian document recognition system
A09 01  1  ENG  @1 Document recognition and retrieval XVI : 20-22 January 2009, San Jose, California, USA
A11 01  1    @1 LIANGRUI PENG
A11 02  1    @1 CHANGSONG LIU
A11 03  1    @1 XIAOQING DING
A11 04  1    @1 HUA WANG
A11 05  1    @1 JIANMING JIN
A12 01  1    @1 BERKNER (Kathrin) @9 ed.
A12 02  1    @1 LIKFORMAN-SULEM (Laurence) @9 ed.
A14 01      @1 Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems @2 Beijing, 100084 @3 CHN @Z 1 aut. @Z 2 aut. @Z 3 aut. @Z 4 aut. @Z 5 aut.
A18 01  1    @1 IS & T--the Society for Imaging Science and Technology @3 USA @9 org-cong.
A18 02  1    @1 SPIE @3 USA @9 org-cong.
A18 03  1    @1 Ricoh Innovations @3 INC @9 org-cong.
A20       @2 72470J.1-72470J.7
A21       @1 2009
A23 01      @0 ENG
A25 01      @1 SPIE @2 Bellingham WA
A25 02      @1 IS&T @2 Springfield VA
A26 01      @0 978-0-8194-7497-1
A26 02      @0 0-8194-7497-5
A43 01      @1 INIST @2 21760 @5 354000172953960180
A44       @0 0000 @1 © 2009 INIST-CNRS. All rights reserved.
A45       @0 5 ref.
A47 01  1    @0 09-0372245
A60       @1 P @2 C
A61       @0 A
A64 01  1    @0 Proceedings of SPIE, the International Society for Optical Engineering
A66 01      @0 USA
C01 01    ENG  @0 Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
C02 01  3    @0 001B00A30C
C02 02  3    @0 001B40B30V
C03 01  X  FRE  @0 Document imprimé @5 61
C03 01  X  ENG  @0 Printed document @5 61
C03 01  X  SPA  @0 Documento impreso @5 61
C03 02  3  FRE  @0 Chine @2 NG @5 62
C03 02  3  ENG  @0 China @2 NG @5 62
C03 03  3  FRE  @0 Bibliothèque électronique @5 63
C03 03  3  ENG  @0 Digital libraries @5 63
C03 04  3  FRE  @0 Reconnaissance optique caractère @5 64
C03 04  3  ENG  @0 Optical character recognition @5 64
C03 05  3  FRE  @0 Jeu caractère @5 65
C03 05  3  ENG  @0 Character sets @5 65
C03 06  3  FRE  @0 Système expert @5 66
C03 06  3  ENG  @0 Expert systems @5 66
C03 07  3  FRE  @0 Reconnaissance caractère @5 67
C03 07  3  ENG  @0 Character recognition @5 67
C03 08  X  FRE  @0 Système n niveaux @5 68
C03 08  X  ENG  @0 Multilevel system @5 68
C03 08  X  SPA  @0 Sistema n niveles @5 68
C03 09  X  FRE  @0 Classification automatique @5 69
C03 09  X  ENG  @0 Automatic classification @5 69
C03 09  X  SPA  @0 Clasificación automática @5 69
C03 10  3  FRE  @0 0130C @4 INC @5 83
C03 11  3  FRE  @0 Traitement image @4 INC @5 84
C03 12  3  FRE  @0 4230V @4 INC @5 91
N21       @1 264
N44 01      @1 OTO
N82       @1 OTO
pR  
A30 01  1  ENG  @1 Document recognition and retrieval @2 16 @3 San Jose CA USA @4 2009

Format Inist (serveur)

NO : PASCAL 09-0372245 INIST
ET : Multi-font printed Mongolian document recognition system
AU : LIANGRUI PENG; CHANGSONG LIU; XIAOQING DING; HUA WANG; JIANMING JIN; BERKNER (Kathrin); LIKFORMAN-SULEM (Laurence)
AF : Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut., 4 aut., 5 aut.)
DT : Publication en série; Congrès; Niveau analytique
SO : Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Coden PSISDG; Etats-Unis; Da. 2009; Vol. 7247; 72470J.1-72470J.7; Bibl. 5 ref.
LA : Anglais
EA : Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
CC : 001B00A30C; 001B40B30V
FD : Document imprimé; Chine; Bibliothèque électronique; Reconnaissance optique caractère; Jeu caractère; Système expert; Reconnaissance caractère; Système n niveaux; Classification automatique; 0130C; Traitement image; 4230V
ED : Printed document; China; Digital libraries; Optical character recognition; Character sets; Expert systems; Character recognition; Multilevel system; Automatic classification
SD : Documento impreso; Sistema n niveles; Clasificación automática
LO : INIST-21760.354000172953960180
ID : 09-0372245

Links to Exploration step

Pascal:09-0372245

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0372245</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0372245 INIST</idno>
<idno type="RBID">Pascal:09-0372245</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000216</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation>
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic classification</term>
<term>Character recognition</term>
<term>Character sets</term>
<term>China</term>
<term>Digital libraries</term>
<term>Expert systems</term>
<term>Multilevel system</term>
<term>Optical character recognition</term>
<term>Printed document</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Document imprimé</term>
<term>Chine</term>
<term>Bibliothèque électronique</term>
<term>Reconnaissance optique caractère</term>
<term>Jeu caractère</term>
<term>Système expert</term>
<term>Reconnaissance caractère</term>
<term>Système n niveaux</term>
<term>Classification automatique</term>
<term>0130C</term>
<term>Traitement image</term>
<term>4230V</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0277-786X</s0>
</fA01>
<fA02 i1="01">
<s0>PSISDG</s0>
</fA02>
<fA03 i2="1">
<s0>Proc. SPIE Int. Soc. Opt. Eng.</s0>
</fA03>
<fA05>
<s2>7247</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG">
<s1>Multi-font printed Mongolian document recognition system</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval XVI : 20-22 January 2009, San Jose, California, USA</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>LIANGRUI PENG</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>CHANGSONG LIU</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>XIAOQING DING</s1>
</fA11>
<fA11 i1="04" i2="1">
<s1>HUA WANG</s1>
</fA11>
<fA11 i1="05" i2="1">
<s1>JIANMING JIN</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>BERKNER (Kathrin)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>LIKFORMAN-SULEM (Laurence)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1">
<s1>IS & T--the Society for Imaging Science and Technology</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1">
<s1>SPIE</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="03" i2="1">
<s1>Ricoh Innovations</s1>
<s3>INC</s3>
<s9>org-cong.</s9>
</fA18>
<fA20>
<s2>72470J.1-72470J.7</s2>
</fA20>
<fA21>
<s1>2009</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA25 i1="01">
<s1>SPIE</s1>
<s2>Bellingham WA</s2>
</fA25>
<fA25 i1="02">
<s1>IS&T</s1>
<s2>Springfield VA</s2>
</fA25>
<fA26 i1="01">
<s0>978-0-8194-7497-1</s0>
</fA26>
<fA26 i1="02">
<s0>0-8194-7497-5</s0>
</fA26>
<fA43 i1="01">
<s1>INIST</s1>
<s2>21760</s2>
<s5>354000172953960180</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2009 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>5 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>09-0372245</s0>
</fA47>
<fA60>
<s1>P</s1>
<s2>C</s2>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Proceedings of SPIE, the International Society for Optical Engineering</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</s0>
</fC01>
<fC02 i1="01" i2="3">
<s0>001B00A30C</s0>
</fC02>
<fC02 i1="02" i2="3">
<s0>001B40B30V</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Document imprimé</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Printed document</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Documento impreso</s0>
<s5>61</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE">
<s0>Chine</s0>
<s2>NG</s2>
<s5>62</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG">
<s0>China</s0>
<s2>NG</s2>
<s5>62</s5>
</fC03>
<fC03 i1="03" i2="3" l="FRE">
<s0>Bibliothèque électronique</s0>
<s5>63</s5>
</fC03>
<fC03 i1="03" i2="3" l="ENG">
<s0>Digital libraries</s0>
<s5>63</s5>
</fC03>
<fC03 i1="04" i2="3" l="FRE">
<s0>Reconnaissance optique caractère</s0>
<s5>64</s5>
</fC03>
<fC03 i1="04" i2="3" l="ENG">
<s0>Optical character recognition</s0>
<s5>64</s5>
</fC03>
<fC03 i1="05" i2="3" l="FRE">
<s0>Jeu caractère</s0>
<s5>65</s5>
</fC03>
<fC03 i1="05" i2="3" l="ENG">
<s0>Character sets</s0>
<s5>65</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE">
<s0>Système expert</s0>
<s5>66</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG">
<s0>Expert systems</s0>
<s5>66</s5>
</fC03>
<fC03 i1="07" i2="3" l="FRE">
<s0>Reconnaissance caractère</s0>
<s5>67</s5>
</fC03>
<fC03 i1="07" i2="3" l="ENG">
<s0>Character recognition</s0>
<s5>67</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Système n niveaux</s0>
<s5>68</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Multilevel system</s0>
<s5>68</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Sistema n niveles</s0>
<s5>68</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Classification automatique</s0>
<s5>69</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Automatic classification</s0>
<s5>69</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Clasificación automática</s0>
<s5>69</s5>
</fC03>
<fC03 i1="10" i2="3" l="FRE">
<s0>0130C</s0>
<s4>INC</s4>
<s5>83</s5>
</fC03>
<fC03 i1="11" i2="3" l="FRE">
<s0>Traitement image</s0>
<s4>INC</s4>
<s5>84</s5>
</fC03>
<fC03 i1="12" i2="3" l="FRE">
<s0>4230V</s0>
<s4>INC</s4>
<s5>91</s5>
</fC03>
<fN21>
<s1>264</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
<pR>
<fA30 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval</s1>
<s2>16</s2>
<s3>San Jose CA USA</s3>
<s4>2009</s4>
</fA30>
</pR>
</standard>
<server>
<NO>PASCAL 09-0372245 INIST</NO>
<ET>Multi-font printed Mongolian document recognition system</ET>
<AU>LIANGRUI PENG; CHANGSONG LIU; XIAOQING DING; HUA WANG; JIANMING JIN; BERKNER (Kathrin); LIKFORMAN-SULEM (Laurence)</AU>
<AF>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut., 4 aut., 5 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Coden PSISDG; Etats-Unis; Da. 2009; Vol. 7247; 72470J.1-72470J.7; Bibl. 5 ref.</SO>
<LA>Anglais</LA>
<EA>Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</EA>
<CC>001B00A30C; 001B40B30V</CC>
<FD>Document imprimé; Chine; Bibliothèque électronique; Reconnaissance optique caractère; Jeu caractère; Système expert; Reconnaissance caractère; Système n niveaux; Classification automatique; 0130C; Traitement image; 4230V</FD>
<ED>Printed document; China; Digital libraries; Optical character recognition; Character sets; Expert systems; Character recognition; Multilevel system; Automatic classification</ED>
<SD>Documento impreso; Sistema n niveles; Clasificación automática</SD>
<LO>INIST-21760.354000172953960180</LO>
<ID>09-0372245</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000216 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000216 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:09-0372245
   |texte=   Multi-font printed Mongolian document recognition system
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024