Multi-font printed Mongolian document recognition system
Identifieur interne : 000A84 ( Main/Merge ); précédent : 000A83; suivant : 000A85Multi-font printed Mongolian document recognition system
Auteurs : LIANGRUI PENG [République populaire de Chine] ; CHANGSONG LIU [République populaire de Chine] ; XIAOQING DING [République populaire de Chine] ; HUA WANG [République populaire de Chine] ; JIANMING JIN [République populaire de Chine]Source :
- Proceedings of SPIE, the International Society for Optical Engineering [ 0277-786X ] ; 2009.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000216
- to stream PascalFrancis, to step Curation: 000563
- to stream PascalFrancis, to step Checkpoint: 000190
Links to Exploration step
Pascal:09-0372245Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author><name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">09-0372245</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 09-0372245 INIST</idno>
<idno type="RBID">Pascal:09-0372245</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000216</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000563</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000190</idno>
<idno type="wicri:doubleKey">0277-786X:2009:Liangrui Peng:multi:font:printed</idno>
<idno type="wicri:Area/Main/Merge">000A84</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Multi-font printed Mongolian document recognition system</title>
<author><name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electronic Engineering, Tsinghua University, Tsinghua National Laboratory for Information Science and Technology State Key Laboratory of Intelligent Technology and Systems</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Automatic classification</term>
<term>Character recognition</term>
<term>Character sets</term>
<term>China</term>
<term>Digital libraries</term>
<term>Expert systems</term>
<term>Multilevel system</term>
<term>Optical character recognition</term>
<term>Printed document</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Document imprimé</term>
<term>Chine</term>
<term>Bibliothèque électronique</term>
<term>Reconnaissance optique caractère</term>
<term>Jeu caractère</term>
<term>Système expert</term>
<term>Reconnaissance caractère</term>
<term>Système n niveaux</term>
<term>Classification automatique</term>
<term>0130C</term>
<term>Traitement image</term>
<term>4230V</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Mongolian is one of the major ethnic languages in China. Large amount of Mongolian printed documents need to be digitized in digital library and various applications. Traditional Mongolian script has unique writing style and multi-font-type variations, which bring challenges to Mongolian OCR research. As traditional Mongolian script has some characteristics, for example, one character may be part of another character, we define the character set for recognition according to the segmented components, and the components are combined into characters by rule-based post-processing module. For character recognition, a method based on visual directional feature and multi-level classifiers is presented. For character segmentation, a scheme is used to find the segmentation point by analyzing the properties of proj ection and connected components. As Mongolian has different font-types which are categorized into two major groups, the parameter of segmentation is adjusted for each group. A font-type classification method for the two font-type group is introduced. For recognition of Mongolian text mixed with Chinese and English, language identification and relevant character recognition kernels are integrated. Experiments show that the presented methods are effective. The text recognition rate is 96.9% on the test samples from practical documents with multi-font-types and mixed scripts.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
</country>
</list>
<tree><country name="République populaire de Chine"><noRegion><name sortKey="Liangrui Peng" sort="Liangrui Peng" uniqKey="Liangrui Peng" last="Liangrui Peng">LIANGRUI PENG</name>
</noRegion>
<name sortKey="Changsong Liu" sort="Changsong Liu" uniqKey="Changsong Liu" last="Changsong Liu">CHANGSONG LIU</name>
<name sortKey="Hua Wang" sort="Hua Wang" uniqKey="Hua Wang" last="Hua Wang">HUA WANG</name>
<name sortKey="Jianming Jin" sort="Jianming Jin" uniqKey="Jianming Jin" last="Jianming Jin">JIANMING JIN</name>
<name sortKey="Xiaoqing Ding" sort="Xiaoqing Ding" uniqKey="Xiaoqing Ding" last="Xiaoqing Ding">XIAOQING DING</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A84 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 000A84 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= Pascal:09-0372245 |texte= Multi-font printed Mongolian document recognition system }}
![]() | This area was generated with Dilib version V0.6.32. | ![]() |