OcrV1, PascalFrancis, Corpus, bibRecord, 000216

Multi-font printed Mongolian document recognition system

Identifieur interne : 000216 ( PascalFrancis/Corpus ); précédent : 000215; suivant : 000217

Multi-font printed Mongolian document recognition system

Auteurs : LIANGRUI PENG ; CHANGSONG LIU ; XIAOQING DING ; HUA WANG ; JIANMING JIN

Source :

Proceedings of SPIE, the International Society for Optical Engineering [ 0277-786X ] ; 2009.

RBID : Pascal:09-0372245

Descripteurs français

Pascal (Inist)
- Document imprimé, Chine, Bibliothèque électronique, Reconnaissance optique caractère, Jeu caractère, Système expert, Reconnaissance caractère, Système n niveaux, Classification automatique, 0130C, Traitement image, 4230V.

English descriptors

KwdEn :
- Automatic classification, Character recognition, Character sets, China, Digital libraries, Expert systems, Multilevel system, Optical character recognition, Printed document.

Abstract

Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.

Notice en format standard (ISO 2709)

Pour connaître la documentation sur le format Inist Standard.

A01	`01`	`1`		`@0 0277-786X`
A02	`01`			`@0 PSISDG`
A03		`1`		`@0 Proc. SPIE Int. Soc. Opt. Eng.`
A05				`@2 7247`
A08	`01`	`1`	`ENG`	`@1 Multi-font printed Mongolian document recognition system`
A09	`01`	`1`	`ENG`	`@1 Document recognition and retrieval XVI : 20-22 January 2009, San Jose, California, USA`
A11	`01`	`1`		`@1 LIANGRUI PENG`
A11	`02`	`1`		`@1 CHANGSONG LIU`
A11	`03`	`1`		`@1 XIAOQING DING`
A11	`04`	`1`		`@1 HUA WANG`
A11	`05`	`1`		`@1 JIANMING JIN`
A12	`01`	`1`		`@1 BERKNER (Kathrin) @9 ed.`
A12	`02`	`1`		`@1 LIKFORMAN-SULEM (Laurence) @9 ed.`
A14	`01`			`@1 Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems @2 Beijing, 100084 @3 CHN @Z 1 aut. @Z 2 aut. @Z 3 aut. @Z 4 aut. @Z 5 aut.`
A18	`01`	`1`		`@1 IS & T--the Society for Imaging Science and Technology @3 USA @9 org-cong.`
A18	`02`	`1`		`@1 SPIE @3 USA @9 org-cong.`
A18	`03`	`1`		`@1 Ricoh Innovations @3 INC @9 org-cong.`
A20				`@2 72470J.1-72470J.7`
A21				`@1 2009`
A23	`01`			`@0 ENG`
A25	`01`			`@1 SPIE @2 Bellingham WA`
A25	`02`			`@1 IS&T @2 Springfield VA`
A26	`01`			`@0 978-0-8194-7497-1`
A26	`02`			`@0 0-8194-7497-5`
A43	`01`			`@1 INIST @2 21760 @5 354000172953960180`
A44				`@0 0000 @1 © 2009 INIST-CNRS. All rights reserved.`
A45				`@0 5 ref.`
A47	`01`	`1`		`@0 09-0372245`
A60				`@1 P @2 C`
A61				`@0 A`
A64	`01`	`1`		`@0 Proceedings of SPIE, the International Society for Optical Engineering`
A66	`01`			`@0 USA`
C01	`01`		`ENG`	@0 Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
C02	`01`	`3`		`@0 001B00A30C`
C02	`02`	`3`		`@0 001B40B30V`
C03	`01`	`X`	`FRE`	`@0 Document imprimé @5 61`
C03	`01`	`X`	`ENG`	`@0 Printed document @5 61`
C03	`01`	`X`	`SPA`	`@0 Documento impreso @5 61`
C03	`02`	`3`	`FRE`	`@0 Chine @2 NG @5 62`
C03	`02`	`3`	`ENG`	`@0 China @2 NG @5 62`
C03	`03`	`3`	`FRE`	`@0 Bibliothèque électronique @5 63`
C03	`03`	`3`	`ENG`	`@0 Digital libraries @5 63`
C03	`04`	`3`	`FRE`	`@0 Reconnaissance optique caractère @5 64`
C03	`04`	`3`	`ENG`	`@0 Optical character recognition @5 64`
C03	`05`	`3`	`FRE`	`@0 Jeu caractère @5 65`
C03	`05`	`3`	`ENG`	`@0 Character sets @5 65`
C03	`06`	`3`	`FRE`	`@0 Système expert @5 66`
C03	`06`	`3`	`ENG`	`@0 Expert systems @5 66`
C03	`07`	`3`	`FRE`	`@0 Reconnaissance caractère @5 67`
C03	`07`	`3`	`ENG`	`@0 Character recognition @5 67`
C03	`08`	`X`	`FRE`	`@0 Système n niveaux @5 68`
C03	`08`	`X`	`ENG`	`@0 Multilevel system @5 68`
C03	`08`	`X`	`SPA`	`@0 Sistema n niveles @5 68`
C03	`09`	`X`	`FRE`	`@0 Classification automatique @5 69`
C03	`09`	`X`	`ENG`	`@0 Automatic classification @5 69`
C03	`09`	`X`	`SPA`	`@0 Clasificación automática @5 69`
C03	`10`	`3`	`FRE`	`@0 0130C @4 INC @5 83`
C03	`11`	`3`	`FRE`	`@0 Traitement image @4 INC @5 84`
C03	`12`	`3`	`FRE`	`@0 4230V @4 INC @5 91`
N21				`@1 264`
N44	`01`			`@1 OTO`
N82				`@1 OTO`

A30	`01`	`1`	`ENG`	`@1 Document recognition and retrieval @2 16 @3 San Jose CA USA @4 2009`

Format Inist (serveur)

NO :	PASCAL 09-0372245 INIST
ET :	Multi-font printed Mongolian document recognition system
AU :	LIANGRUI PENG; CHANGSONG LIU; XIAOQING DING; HUA WANG; JIANMING JIN; BERKNER (Kathrin); LIKFORMAN-SULEM (Laurence)
AF :	Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut., 4 aut., 5 aut.)
DT :	Publication en série; Congrès; Niveau analytique
SO :	Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Coden PSISDG; Etats-Unis; Da. 2009; Vol. 7247; 72470J.1-72470J.7; Bibl. 5 ref.
LA :	Anglais
EA :	Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
CC :	001B00A30C; 001B40B30V
FD :	Document imprimé; Chine; Bibliothèque électronique; Reconnaissance optique caractère; Jeu caractère; Système expert; Reconnaissance caractère; Système n niveaux; Classification automatique; 0130C; Traitement image; 4230V
ED :	Printed document; China; Digital libraries; Optical character recognition; Character sets; Expert systems; Character recognition; Multilevel system; Automatic classification
SD :	Documento impreso; Sistema n niveles; Clasificación automática
LO :	INIST-21760.354000172953960180
ID :	09-0372245

Links to Exploration step

Pascal:09-0372245

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author><name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">09-0372245</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0372245 INIST</idno>
<idno type="RBID">Pascal:09-0372245</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000216</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author><name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
<author><name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic classification</term>
<term>Character recognition</term>
<term>Character sets</term>
<term>China</term>
<term>Digital libraries</term>
<term>Expert systems</term>
<term>Multilevel system</term>
<term>Optical character recognition</term>
<term>Printed document</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Document imprimé</term>
<term>Chine</term>
<term>Bibliothèque électronique</term>
<term>Reconnaissance optique caractère</term>
<term>Jeu caractère</term>
<term>Système expert</term>
<term>Reconnaissance caractère</term>
<term>Système n niveaux</term>
<term>Classification automatique</term>
<term>0130C</term>
<term>Traitement image</term>
<term>4230V</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</div>
</front>
</TEI>
<inist><standard h6="B"><pA><fA01 i1="01" i2="1"><s0>0277-786X</s0>
</fA01>
<fA02 i1="01"><s0>PSISDG</s0>
</fA02>
<fA03 i2="1"><s0>Proc. SPIE Int. Soc. Opt. Eng.</s0>
</fA03>
<fA05><s2>7247</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG"><s1>Multi-font printed Mongolian document recognition system</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG"><s1>Document recognition and retrieval XVI : 20-22 January 2009, San Jose, California, USA</s1>
</fA09>
<fA11 i1="01" i2="1"><s1>LIANGRUI PENG</s1>
</fA11>
<fA11 i1="02" i2="1"><s1>CHANGSONG LIU</s1>
</fA11>
<fA11 i1="03" i2="1"><s1>XIAOQING DING</s1>
</fA11>
<fA11 i1="04" i2="1"><s1>HUA WANG</s1>
</fA11>
<fA11 i1="05" i2="1"><s1>JIANMING JIN</s1>
</fA11>
<fA12 i1="01" i2="1"><s1>BERKNER (Kathrin)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1"><s1>LIKFORMAN-SULEM (Laurence)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1"><s1>IS & T--the Society for Imaging Science and Technology</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1"><s1>SPIE</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="03" i2="1"><s1>Ricoh Innovations</s1>
<s3>INC</s3>
<s9>org-cong.</s9>
</fA18>
<fA20><s2>72470J.1-72470J.7</s2>
</fA20>
<fA21><s1>2009</s1>
</fA21>
<fA23 i1="01"><s0>ENG</s0>
</fA23>
<fA25 i1="01"><s1>SPIE</s1>
<s2>Bellingham WA</s2>
</fA25>
<fA25 i1="02"><s1>IS&T</s1>
<s2>Springfield VA</s2>
</fA25>
<fA26 i1="01"><s0>978-0-8194-7497-1</s0>
</fA26>
<fA26 i1="02"><s0>0-8194-7497-5</s0>
</fA26>
<fA43 i1="01"><s1>INIST</s1>
<s2>21760</s2>
<s5>354000172953960180</s5>
</fA43>
<fA44><s0>0000</s0>
<s1>© 2009 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45><s0>5 ref.</s0>
</fA45>
<fA47 i1="01" i2="1"><s0>09-0372245</s0>
</fA47>
<fA60><s1>P</s1>
<s2>C</s2>
</fA60>
<fA61><s0>A</s0>
</fA61>
<fA64 i1="01" i2="1"><s0>Proceedings of SPIE, the International Society for Optical Engineering</s0>
</fA64>
<fA66 i1="01"><s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG"><s0>Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</s0>
</fC01>
<fC02 i1="01" i2="3"><s0>001B00A30C</s0>
</fC02>
<fC02 i1="02" i2="3"><s0>001B40B30V</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE"><s0>Document imprimé</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG"><s0>Printed document</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA"><s0>Documento impreso</s0>
<s5>61</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE"><s0>Chine</s0>
<s2>NG</s2>
<s5>62</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG"><s0>China</s0>
<s2>NG</s2>
<s5>62</s5>
</fC03>
<fC03 i1="03" i2="3" l="FRE"><s0>Bibliothèque électronique</s0>
<s5>63</s5>
</fC03>
<fC03 i1="03" i2="3" l="ENG"><s0>Digital libraries</s0>
<s5>63</s5>
</fC03>
<fC03 i1="04" i2="3" l="FRE"><s0>Reconnaissance optique caractère</s0>
<s5>64</s5>
</fC03>
<fC03 i1="04" i2="3" l="ENG"><s0>Optical character recognition</s0>
<s5>64</s5>
</fC03>
<fC03 i1="05" i2="3" l="FRE"><s0>Jeu caractère</s0>
<s5>65</s5>
</fC03>
<fC03 i1="05" i2="3" l="ENG"><s0>Character sets</s0>
<s5>65</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE"><s0>Système expert</s0>
<s5>66</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG"><s0>Expert systems</s0>
<s5>66</s5>
</fC03>
<fC03 i1="07" i2="3" l="FRE"><s0>Reconnaissance caractère</s0>
<s5>67</s5>
</fC03>
<fC03 i1="07" i2="3" l="ENG"><s0>Character recognition</s0>
<s5>67</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE"><s0>Système n niveaux</s0>
<s5>68</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG"><s0>Multilevel system</s0>
<s5>68</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA"><s0>Sistema n niveles</s0>
<s5>68</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE"><s0>Classification automatique</s0>
<s5>69</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG"><s0>Automatic classification</s0>
<s5>69</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA"><s0>Clasificación automática</s0>
<s5>69</s5>
</fC03>
<fC03 i1="10" i2="3" l="FRE"><s0>0130C</s0>
<s4>INC</s4>
<s5>83</s5>
</fC03>
<fC03 i1="11" i2="3" l="FRE"><s0>Traitement image</s0>
<s4>INC</s4>
<s5>84</s5>
</fC03>
<fC03 i1="12" i2="3" l="FRE"><s0>4230V</s0>
<s4>INC</s4>
<s5>91</s5>
</fC03>
<fN21><s1>264</s1>
</fN21>
<fN44 i1="01"><s1>OTO</s1>
</fN44>
<fN82><s1>OTO</s1>
</fN82>
</pA>
<pR><fA30 i1="01" i2="1" l="ENG"><s1>Document recognition and retrieval</s1>
<s2>16</s2>
<s3>San Jose CA USA</s3>
<s4>2009</s4>
</fA30>
</pR>
</standard>
<server><NO>PASCAL 09-0372245 INIST</NO>
<ET>Multi-font printed Mongolian document recognition system</ET>
<AU>LIANGRUI PENG; CHANGSONG LIU; XIAOQING DING; HUA WANG; JIANMING JIN; BERKNER (Kathrin); LIKFORMAN-SULEM (Laurence)</AU>
<AF>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems/Beijing, 100084/Chine (1 aut., 2 aut., 3 aut., 4 aut., 5 aut.)</AF>
<DT>Publication en série; Congrès; Niveau analytique</DT>
<SO>Proceedings of SPIE, the International Society for Optical Engineering; ISSN 0277-786X; Coden PSISDG; Etats-Unis; Da. 2009; Vol. 7247; 72470J.1-72470J.7; Bibl. 5 ref.</SO>
<LA>Anglais</LA>
<EA>Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</EA>
<CC>001B00A30C; 001B40B30V</CC>
<FD>Document imprimé; Chine; Bibliothèque électronique; Reconnaissance optique caractère; Jeu caractère; Système expert; Reconnaissance caractère; Système n niveaux; Classification automatique; 0130C; Traitement image; 4230V</FD>
<ED>Printed document; China; Digital libraries; Optical character recognition; Character sets; Expert systems; Character recognition; Multilevel system; Automatic classification</ED>
<SD>Documento impreso; Sistema n niveles; Clasificación automática</SD>
<LO>INIST-21760.354000172953960180</LO>
<ID>09-0372245</ID>
</server>
</inist>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Corpus

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000216 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Corpus/biblio.hfd -nk 000216 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Corpus
   |type=    RBID
   |clé=     Pascal:09-0372245
   |texte=   Multi-font printed Mongolian document recognition system
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Multi-font printed Mongolian document recognition system

Multi-font printed Mongolian document recognition system

Source :

Descripteurs français

English descriptors

Abstract

Notice en format standard (ISO 2709)

Format Inist (serveur)

Links to Exploration step

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri