Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A lexicon for Vietnamese language processing

Identifieur interne : 005628 ( Main/Exploration ); précédent : 005627; suivant : 005629

A lexicon for Vietnamese language processing

Auteurs : THI MINH HUYEN NGUYEN [Viêt Nam] ; Laurent Romary [France] ; Mathias Rossignol [Viêt Nam] ; XUAN LUONG VU [Viêt Nam]

Source :

RBID : Francis:09-0057661

Descripteurs français

English descriptors

Abstract

Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as part-of-speech tagging, parsing, etc., are very difficult tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by each research team is a real obstacle to the development of Vietnamese language processing. The aim of our projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP applications. We emphasize the standardization aspect of the lexicon representation. We especially propose an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management).


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">A lexicon for Vietnamese language processing</title>
<author>
<name sortKey="Thi Minh Huyen Nguyen" sort="Thi Minh Huyen Nguyen" uniqKey="Thi Minh Huyen Nguyen" last="Thi Minh Huyen Nguyen">THI MINH HUYEN NGUYEN</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Faculty of Mathematics, Mechanics and Informatics, Hanoi University of Science, 334 Nguyen Trai</s1>
<s2>Hanoi, 10000</s2>
<s3>VNM</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Viêt Nam</country>
<wicri:noRegion>Hanoi, 10000</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>LORIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>International Research Center MICA</s1>
<s2>Hanoi</s2>
<s3>VNM</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Viêt Nam</country>
<wicri:noRegion>International Research Center MICA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xuan Luong Vu" sort="Xuan Luong Vu" uniqKey="Xuan Luong Vu" last="Xuan Luong Vu">XUAN LUONG VU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="04">
<s1>Vietnam Lexicography Center</s1>
<s2>Hanoi</s2>
<s3>VNM</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Viêt Nam</country>
<wicri:noRegion>Vietnam Lexicography Center</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">09-0057661</idno>
<date when="2006">2006</date>
<idno type="stanalyst">FRANCIS 09-0057661 INIST</idno>
<idno type="RBID">Francis:09-0057661</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000294</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000740</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000419</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000419</idno>
<idno type="wicri:doubleKey">1574-020X:2006:Thi Minh Huyen Nguyen:a:lexicon:for</idno>
<idno type="wicri:Area/Main/Merge">005824</idno>
<idno type="wicri:Area/Main/Curation">005628</idno>
<idno type="wicri:Area/Main/Exploration">005628</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">A lexicon for Vietnamese language processing</title>
<author>
<name sortKey="Thi Minh Huyen Nguyen" sort="Thi Minh Huyen Nguyen" uniqKey="Thi Minh Huyen Nguyen" last="Thi Minh Huyen Nguyen">THI MINH HUYEN NGUYEN</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Faculty of Mathematics, Mechanics and Informatics, Hanoi University of Science, 334 Nguyen Trai</s1>
<s2>Hanoi, 10000</s2>
<s3>VNM</s3>
<sZ>1 aut.</sZ>
</inist:fA14>
<country>Viêt Nam</country>
<wicri:noRegion>Hanoi, 10000</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
<affiliation wicri:level="3">
<inist:fA14 i1="02">
<s1>LORIA</s1>
<s2>Nancy</s2>
<s3>FRA</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>France</country>
<placeName>
<region type="region">Grand Est</region>
<region type="old region">Lorraine (région)</region>
<settlement type="city">Nancy</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
<affiliation wicri:level="1">
<inist:fA14 i1="03">
<s1>International Research Center MICA</s1>
<s2>Hanoi</s2>
<s3>VNM</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Viêt Nam</country>
<wicri:noRegion>International Research Center MICA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xuan Luong Vu" sort="Xuan Luong Vu" uniqKey="Xuan Luong Vu" last="Xuan Luong Vu">XUAN LUONG VU</name>
<affiliation wicri:level="1">
<inist:fA14 i1="04">
<s1>Vietnam Lexicography Center</s1>
<s2>Hanoi</s2>
<s3>VNM</s3>
<sZ>4 aut.</sZ>
</inist:fA14>
<country>Viêt Nam</country>
<wicri:noRegion>Vietnam Lexicography Center</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Language resources and evaluation </title>
<idno type="ISSN">1574-020X</idno>
<imprint>
<date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Language resources and evaluation </title>
<idno type="ISSN">1574-020X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Lexicon</term>
<term>Linguistic resources</term>
<term>Natural language processing</term>
<term>Vietnamese</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Traitement automatique des langues naturelles</term>
<term>Lexique</term>
<term>Ressources linguistiques</term>
<term>Vietnamien</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Only very recently have Vietnamese researchers begun to be involved in the domain of Natural Language Processing (NLP). As there does not exist any published work in formal linguistics nor any recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for automatic Vietnamese language processing, such as part-of-speech tagging, parsing, etc., are very difficult tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by each research team is a real obstacle to the development of Vietnamese language processing. The aim of our projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP applications. We emphasize the standardization aspect of the lexicon representation. We especially propose an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management).</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>Viêt Nam</li>
</country>
<region>
<li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement>
<li>Nancy</li>
</settlement>
</list>
<tree>
<country name="Viêt Nam">
<noRegion>
<name sortKey="Thi Minh Huyen Nguyen" sort="Thi Minh Huyen Nguyen" uniqKey="Thi Minh Huyen Nguyen" last="Thi Minh Huyen Nguyen">THI MINH HUYEN NGUYEN</name>
</noRegion>
<name sortKey="Rossignol, Mathias" sort="Rossignol, Mathias" uniqKey="Rossignol M" first="Mathias" last="Rossignol">Mathias Rossignol</name>
<name sortKey="Xuan Luong Vu" sort="Xuan Luong Vu" uniqKey="Xuan Luong Vu" last="Xuan Luong Vu">XUAN LUONG VU</name>
</country>
<country name="France">
<region name="Grand Est">
<name sortKey="Romary, Laurent" sort="Romary, Laurent" uniqKey="Romary L" first="Laurent" last="Romary">Laurent Romary</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 005628 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 005628 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Francis:09-0057661
   |texte=   A lexicon for Vietnamese language processing
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022