Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Comparison of Topic Identification methods for Arabic Language

Identifieur interne : 005E17 ( Main/Exploration ); précédent : 005E16; suivant : 005E18

Comparison of Topic Identification methods for Arabic Language

Auteurs : Mourad Abbas ; Kamel Smaïli

Source :

RBID : CRIN:abbas05a

English descriptors

Abstract

In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" wicri:score="398">Comparison of Topic Identification methods for Arabic Language</title>
</titleStmt>
<publicationStmt>
<idno type="RBID">CRIN:abbas05a</idno>
<date when="2005" year="2005">2005</date>
<idno type="wicri:Area/Crin/Corpus">004223</idno>
<idno type="wicri:Area/Crin/Curation">004223</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Curation">004223</idno>
<idno type="wicri:Area/Crin/Checkpoint">000265</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Checkpoint">000265</idno>
<idno type="wicri:Area/Main/Merge">006040</idno>
<idno type="wicri:Area/Main/Curation">005E17</idno>
<idno type="wicri:Area/Main/Exploration">005E17</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Comparison of Topic Identification methods for Arabic Language</title>
<author>
<name sortKey="Abbas, Mourad" sort="Abbas, Mourad" uniqKey="Abbas M" first="Mourad" last="Abbas">Mourad Abbas</name>
</author>
<author>
<name sortKey="Smaili, Kamel" sort="Smaili, Kamel" uniqKey="Smaili K" first="Kamel" last="Smaïli">Kamel Smaïli</name>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>svm</term>
<term>tfidf</term>
<term>topic identification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en" wicri:score="2520">In this paper we present two well-known methods for topic identification. The first one is a TFIDF classifier approach, and the second one is a based machine learning approach which is called Support Vector Machines (SVM). In our knowledge, we do not know several works on Arabic topic identification. So that we decide to investigate in this article. The corpus we used is extracted from the daily Arabic newspaper it Akhbar Al Khaleej, it includes 5120 news articles corresponding to 2.855.069 words covering four topics : sport, local news, international news and economy. According to our experiments, the results are encouraging both for SVM and TFIDF classifier, however we have noticed the superiority of the SVM classifier and its high capability to distinguish topics.</div>
</front>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Abbas, Mourad" sort="Abbas, Mourad" uniqKey="Abbas M" first="Mourad" last="Abbas">Mourad Abbas</name>
<name sortKey="Smaili, Kamel" sort="Smaili, Kamel" uniqKey="Smaili K" first="Kamel" last="Smaïli">Kamel Smaïli</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 005E17 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 005E17 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     CRIN:abbas05a
   |texte=   Comparison of Topic Identification methods for Arabic Language
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022