Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Multi-font printed Mongolian document recognition system

Identifieur interne : 000190 ( PascalFrancis/Checkpoint ); précédent : 000189; suivant : 000191

Multi-font printed Mongolian document recognition system

Auteurs : LIANGRUI PENG [République populaire de Chine] ; CHANGSONG LIU [République populaire de Chine] ; XIAOQING DING [République populaire de Chine] ; HUA WANG [République populaire de Chine] ; JIANMING JIN [République populaire de Chine]

Source :

RBID : Pascal:09-0372245

Descripteurs français

English descriptors

Abstract

Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

Pascal:09-0372245

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0372245</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0372245 INIST</idno>
<idno type="RBID">Pascal:09-0372245</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000216</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000563</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000190</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic classification</term>
<term>Character recognition</term>
<term>Character sets</term>
<term>China</term>
<term>Digital libraries</term>
<term>Expert systems</term>
<term>Multilevel system</term>
<term>Optical character recognition</term>
<term>Printed document</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Document imprimé</term>
<term>Chine</term>
<term>Bibliothèque électronique</term>
<term>Reconnaissance optique caractère</term>
<term>Jeu caractère</term>
<term>Système expert</term>
<term>Reconnaissance caractère</term>
<term>Système n niveaux</term>
<term>Classification automatique</term>
<term>0130C</term>
<term>Traitement image</term>
<term>4230V</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</div>
</front>
</TEI>
<inist>
<standard h6="B">
<pA>
<fA01 i1="01" i2="1">
<s0>0277-786X</s0>
</fA01>
<fA02 i1="01">
<s0>PSISDG</s0>
</fA02>
<fA03 i2="1">
<s0>Proc. SPIE Int. Soc. Opt. Eng.</s0>
</fA03>
<fA05>
<s2>7247</s2>
</fA05>
<fA08 i1="01" i2="1" l="ENG">
<s1>Multi-font printed Mongolian document recognition system</s1>
</fA08>
<fA09 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval XVI : 20-22 January 2009, San Jose, California, USA</s1>
</fA09>
<fA11 i1="01" i2="1">
<s1>LIANGRUI PENG</s1>
</fA11>
<fA11 i1="02" i2="1">
<s1>CHANGSONG LIU</s1>
</fA11>
<fA11 i1="03" i2="1">
<s1>XIAOQING DING</s1>
</fA11>
<fA11 i1="04" i2="1">
<s1>HUA WANG</s1>
</fA11>
<fA11 i1="05" i2="1">
<s1>JIANMING JIN</s1>
</fA11>
<fA12 i1="01" i2="1">
<s1>BERKNER (Kathrin)</s1>
<s9>ed.</s9>
</fA12>
<fA12 i1="02" i2="1">
<s1>LIKFORMAN-SULEM (Laurence)</s1>
<s9>ed.</s9>
</fA12>
<fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</fA14>
<fA18 i1="01" i2="1">
<s1>IS & T--the Society for Imaging Science and Technology</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="02" i2="1">
<s1>SPIE</s1>
<s3>USA</s3>
<s9>org-cong.</s9>
</fA18>
<fA18 i1="03" i2="1">
<s1>Ricoh Innovations</s1>
<s3>INC</s3>
<s9>org-cong.</s9>
</fA18>
<fA20>
<s2>72470J.1-72470J.7</s2>
</fA20>
<fA21>
<s1>2009</s1>
</fA21>
<fA23 i1="01">
<s0>ENG</s0>
</fA23>
<fA25 i1="01">
<s1>SPIE</s1>
<s2>Bellingham WA</s2>
</fA25>
<fA25 i1="02">
<s1>IS&T</s1>
<s2>Springfield VA</s2>
</fA25>
<fA26 i1="01">
<s0>978-0-8194-7497-1</s0>
</fA26>
<fA26 i1="02">
<s0>0-8194-7497-5</s0>
</fA26>
<fA43 i1="01">
<s1>INIST</s1>
<s2>21760</s2>
<s5>354000172953960180</s5>
</fA43>
<fA44>
<s0>0000</s0>
<s1>© 2009 INIST-CNRS. All rights reserved.</s1>
</fA44>
<fA45>
<s0>5 ref.</s0>
</fA45>
<fA47 i1="01" i2="1">
<s0>09-0372245</s0>
</fA47>
<fA60>
<s1>P</s1>
<s2>C</s2>
</fA60>
<fA61>
<s0>A</s0>
</fA61>
<fA64 i1="01" i2="1">
<s0>Proceedings of SPIE, the International Society for Optical Engineering</s0>
</fA64>
<fA66 i1="01">
<s0>USA</s0>
</fA66>
<fC01 i1="01" l="ENG">
<s0>Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</s0>
</fC01>
<fC02 i1="01" i2="3">
<s0>001B00A30C</s0>
</fC02>
<fC02 i1="02" i2="3">
<s0>001B40B30V</s0>
</fC02>
<fC03 i1="01" i2="X" l="FRE">
<s0>Document imprimé</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="X" l="ENG">
<s0>Printed document</s0>
<s5>61</s5>
</fC03>
<fC03 i1="01" i2="X" l="SPA">
<s0>Documento impreso</s0>
<s5>61</s5>
</fC03>
<fC03 i1="02" i2="3" l="FRE">
<s0>Chine</s0>
<s2>NG</s2>
<s5>62</s5>
</fC03>
<fC03 i1="02" i2="3" l="ENG">
<s0>China</s0>
<s2>NG</s2>
<s5>62</s5>
</fC03>
<fC03 i1="03" i2="3" l="FRE">
<s0>Bibliothèque électronique</s0>
<s5>63</s5>
</fC03>
<fC03 i1="03" i2="3" l="ENG">
<s0>Digital libraries</s0>
<s5>63</s5>
</fC03>
<fC03 i1="04" i2="3" l="FRE">
<s0>Reconnaissance optique caractère</s0>
<s5>64</s5>
</fC03>
<fC03 i1="04" i2="3" l="ENG">
<s0>Optical character recognition</s0>
<s5>64</s5>
</fC03>
<fC03 i1="05" i2="3" l="FRE">
<s0>Jeu caractère</s0>
<s5>65</s5>
</fC03>
<fC03 i1="05" i2="3" l="ENG">
<s0>Character sets</s0>
<s5>65</s5>
</fC03>
<fC03 i1="06" i2="3" l="FRE">
<s0>Système expert</s0>
<s5>66</s5>
</fC03>
<fC03 i1="06" i2="3" l="ENG">
<s0>Expert systems</s0>
<s5>66</s5>
</fC03>
<fC03 i1="07" i2="3" l="FRE">
<s0>Reconnaissance caractère</s0>
<s5>67</s5>
</fC03>
<fC03 i1="07" i2="3" l="ENG">
<s0>Character recognition</s0>
<s5>67</s5>
</fC03>
<fC03 i1="08" i2="X" l="FRE">
<s0>Système n niveaux</s0>
<s5>68</s5>
</fC03>
<fC03 i1="08" i2="X" l="ENG">
<s0>Multilevel system</s0>
<s5>68</s5>
</fC03>
<fC03 i1="08" i2="X" l="SPA">
<s0>Sistema n niveles</s0>
<s5>68</s5>
</fC03>
<fC03 i1="09" i2="X" l="FRE">
<s0>Classification automatique</s0>
<s5>69</s5>
</fC03>
<fC03 i1="09" i2="X" l="ENG">
<s0>Automatic classification</s0>
<s5>69</s5>
</fC03>
<fC03 i1="09" i2="X" l="SPA">
<s0>Clasificación automática</s0>
<s5>69</s5>
</fC03>
<fC03 i1="10" i2="3" l="FRE">
<s0>0130C</s0>
<s4>INC</s4>
<s5>83</s5>
</fC03>
<fC03 i1="11" i2="3" l="FRE">
<s0>Traitement image</s0>
<s4>INC</s4>
<s5>84</s5>
</fC03>
<fC03 i1="12" i2="3" l="FRE">
<s0>4230V</s0>
<s4>INC</s4>
<s5>91</s5>
</fC03>
<fN21>
<s1>264</s1>
</fN21>
<fN44 i1="01">
<s1>OTO</s1>
</fN44>
<fN82>
<s1>OTO</s1>
</fN82>
</pA>
<pR>
<fA30 i1="01" i2="1" l="ENG">
<s1>Document recognition and retrieval</s1>
<s2>16</s2>
<s3>San Jose CA USA</s3>
<s4>2009</s4>
</fA30>
</pR>
</standard>
</inist>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
</noRegion>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/PascalFrancis/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000190 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/PascalFrancis/Checkpoint/biblio.hfd -nk 000190 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    PascalFrancis
   |étape=   Checkpoint
   |type=    RBID
   |clé=     Pascal:09-0372245
   |texte=   Multi-font printed Mongolian document recognition system
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024