A language model based on semantically clustered words in a Chinese character recognition system
Identifieur interne : 002581 ( Main/Merge ); précédent : 002580; suivant : 002582A language model based on semantically clustered words in a Chinese character recognition system
Auteurs : Hsi-Jian Lee [République populaire de Chine, Taïwan] ; Cheng-Huang Tung [République populaire de Chine]Source :
- Pattern Recognition [ 0031-3203 ] ; 1996.
Abstract
This paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept to a reasonable value. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to calculate the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the Behavior dictionary, which has a rather complete word set. Then, the word classes are clustered to m groups according to the semantic measurement by a greedy method. The words in the Behavior dictionary can finally be assigned to the m groups. The parameter space for the bigram contextual information of the character recognition system is m2. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.
Url:
DOI: 10.1016/S0031-3203(96)00154-9
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001C34
- to stream Istex, to step Curation: 001B20
- to stream Istex, to step Checkpoint: 001902
Links to Exploration step
ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DECLe document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title>A language model based on semantically clustered words in a Chinese character recognition system</title>
<author><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
</author>
<author><name sortKey="Tung, Cheng Huang" sort="Tung, Cheng Huang" uniqKey="Tung C" first="Cheng-Huang" last="Tung">Cheng-Huang Tung</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DEC</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1016/S0031-3203(96)00154-9</idno>
<idno type="url">https://api.istex.fr/document/BC725FD2E89F5F577C2A555911A311F69C4F9DEC/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001C34</idno>
<idno type="wicri:Area/Istex/Curation">001B20</idno>
<idno type="wicri:Area/Istex/Checkpoint">001902</idno>
<idno type="wicri:doubleKey">0031-3203:1997:Lee H:a:language:model</idno>
<idno type="wicri:Area/Main/Merge">002581</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a">A language model based on semantically clustered words in a Chinese character recognition system</title>
<author><name sortKey="Lee, Hsi Jian" sort="Lee, Hsi Jian" uniqKey="Lee H" first="Hsi-Jian" last="Lee">Hsi-Jian Lee</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Taïwan</country>
</affiliation>
</author>
<author><name sortKey="Tung, Cheng Huang" sort="Tung, Cheng Huang" uniqKey="Tung C" first="Cheng-Huang" last="Tung">Cheng-Huang Tung</name>
<affiliation wicri:level="1"><country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan 30050</wicri:regionArea>
<wicri:noRegion>Taiwan 30050</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="j">Pattern Recognition</title>
<title level="j" type="abbrev">PR</title>
<idno type="ISSN">0031-3203</idno>
<imprint><publisher>ELSEVIER</publisher>
<date type="published" when="1996">1996</date>
<biblScope unit="volume">30</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="1339">1339</biblScope>
<biblScope unit="page" to="1346">1346</biblScope>
</imprint>
<idno type="ISSN">0031-3203</idno>
</series>
<idno type="istex">BC725FD2E89F5F577C2A555911A311F69C4F9DEC</idno>
<idno type="DOI">10.1016/S0031-3203(96)00154-9</idno>
<idno type="PII">S0031-3203(96)00154-9</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper presents a new method for clustering the words in a dictionary into word groups. A Chinese character recognition system can then use these groups in a language model to improve the recognition accuracy. In the language model, the number of parameters we must train beforehand can be kept to a reasonable value. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to calculate the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the Behavior dictionary, which has a rather complete word set. Then, the word classes are clustered to m groups according to the semantic measurement by a greedy method. The words in the Behavior dictionary can finally be assigned to the m groups. The parameter space for the bigram contextual information of the character recognition system is m2. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.</div>
</front>
</TEI>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Merge
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 002581 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Merge/biblio.hfd -nk 002581 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Merge |type= RBID |clé= ISTEX:BC725FD2E89F5F577C2A555911A311F69C4F9DEC |texte= A language model based on semantically clustered words in a Chinese character recognition system }}
This area was generated with Dilib version V0.6.32. |