Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A language model based on semantically clustered words in a Chinese character recognition system

Identifieur interne : 002581 ( Main/Merge ); précédent : 002580; suivant : 002582

A language model based on semantically clustered words in a Chinese character recognition system

Auteurs : Hsi-Jian Lee [République populaire de Chine, Taïwan] ; Cheng-Huang Tung [République populaire de Chine]

Source :

RBID : ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DEC

Abstract

This paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept to a reasonable value. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to calculate the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the Behavior dictionary, which has a rather complete word set. Then, the word classes are clustered to m groups according to the semantic measurement by a greedy method. The words in the Behavior dictionary can finally be assigned to the m groups. The parameter space for the bigram contextual information of the character recognition system is m2. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.

Url:
DOI: 10.1016/S0031-3203(96)00154-9

Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DEC

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>A language model based on semantically clustered words in a Chinese character recognition system</title>
<author>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
<author>
<name sortKey="Tung, Cheng Huang" sort="Tung, Cheng Huang" uniqKey="Tung C" first="Cheng-Huang" last="Tung">Cheng-Huang Tung</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DEC</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1016/S0031-3203(96)00154-9</idno>
<idno type="url">https://api.istex.fr/document/BC725FD2E89F5F577C2A555911A311F69C4F9DEC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001C34</idno>
<idno type="wicri:Area/Istex/Curation">001B20</idno>
<idno type="wicri:Area/Istex/Checkpoint">001902</idno>
<idno type="wicri:doubleKey">0031-3203:1997:Lee H:a:language:model</idno>
<idno type="wicri:Area/Main/Merge">002581</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">A language model based on semantically clustered words in a Chinese character recognition system</title>
<author>
<name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
<affiliation wicri:level="1">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
<author>
<name sortKey="Tung, Cheng Huang" sort="Tung, Cheng Huang" uniqKey="Tung C" first="Cheng-Huang" last="Tung">Cheng-Huang Tung</name>
<affiliation wicri:level="1">
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">30</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="1339">1339</biblScope>
<biblScope unit="page" to="1346">1346</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">BC725FD2E89F5F577C2A555911A311F69C4F9DEC</idno>
<idno type="DOI">10.1016/S0031-3203(96)00154-9</idno>
<idno type="PII">S0031-3203(96)00154-9</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept to a reasonable value. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to calculate the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the Behavior dictionary, which has a rather complete word set. Then, the word classes are clustered to m groups according to the semantic measurement by a greedy method. The words in the Behavior dictionary can finally be assigned to the m groups. The parameter space for the bigram contextual information of the character recognition system is m2. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.</div>
</front>
</TEI>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002581 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002581 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Merge
   |type=    RBID
   |clé=     ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DEC
   |texte=   A language model based on semantically clustered words in a Chinese character recognition system
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024