Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Multi-font printed Mongolian document recognition system

Identifieur interne : 000A75 ( Main/Exploration ); précédent : 000A74; suivant : 000A76

Multi-font printed Mongolian document recognition system

Auteurs : LIANGRUI PENG [République populaire de Chine] ; CHANGSONG LIU [République populaire de Chine] ; XIAOQING DING [République populaire de Chine] ; HUA WANG [République populaire de Chine] ; JIANMING JIN [République populaire de Chine]

Source :

RBID : Pascal:09-0372245

Descripteurs français

English descriptors

Abstract

Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0372245</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0372245 INIST</idno>
<idno type="RBID">Pascal:09-0372245</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000216</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000563</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000190</idno>
<idno type="wicri:doubleKey">0277-786X:2009:Liangrui Peng:multi:font:printed</idno>
<idno type="wicri:Area/Main/Merge">000A84</idno>
<idno type="wicri:Area/Main/Curation">000A75</idno>
<idno type="wicri:Area/Main/Exploration">000A75</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Automatic classification</term>
<term>Character recognition</term>
<term>Character sets</term>
<term>China</term>
<term>Digital libraries</term>
<term>Expert systems</term>
<term>Multilevel system</term>
<term>Optical character recognition</term>
<term>Printed document</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Document imprimé</term>
<term>Chine</term>
<term>Bibliothèque électronique</term>
<term>Reconnaissance optique caractère</term>
<term>Jeu caractère</term>
<term>Système expert</term>
<term>Reconnaissance caractère</term>
<term>Système n niveaux</term>
<term>Classification automatique</term>
<term>0130C</term>
<term>Traitement image</term>
<term>4230V</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
</noRegion>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A75 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A75 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:09-0372245
   |texte=   Multi-font printed Mongolian document recognition system
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024