Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A study of spam filtering using support vector machines

Identifieur interne : 000621 ( Istex/Checkpoint ); précédent : 000620; suivant : 000622

A study of spam filtering using support vector machines

Auteurs : Ola Amayri [Canada] ; Nizar Bouguila [Canada]

Source :

RBID : ISTEX:48793E6D30029B98D091A5E7073D9106295DE2B8

English descriptors

Abstract

Abstract: Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as spam emails. A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. This paper presents thorough investigation of several distance-based kernels and specify spam filtering behaviors using SVM. The majority of used kernels in recent studies concern continuous data and neglect the structure of the text. In contrast to classical kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variants in TC that yield improved performance for the standard SVM in filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates.

Url:
DOI: 10.1007/s10462-010-9166-x


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:48793E6D30029B98D091A5E7073D9106295DE2B8

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A study of spam filtering using support vector machines</title>
<author>
<name sortKey="Amayri, Ola" sort="Amayri, Ola" uniqKey="Amayri O" first="Ola" last="Amayri">Ola Amayri</name>
</author>
<author>
<name sortKey="Bouguila, Nizar" sort="Bouguila, Nizar" uniqKey="Bouguila N" first="Nizar" last="Bouguila">Nizar Bouguila</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:48793E6D30029B98D091A5E7073D9106295DE2B8</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/s10462-010-9166-x</idno>
<idno type="url">https://api.istex.fr/ark:/67375/VQC-R6JXGQ2W-P/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000537</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000537</idno>
<idno type="wicri:Area/Istex/Curation">000537</idno>
<idno type="wicri:Area/Istex/Checkpoint">000621</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000621</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A study of spam filtering using support vector machines</title>
<author>
<name sortKey="Amayri, Ola" sort="Amayri, Ola" uniqKey="Amayri O" first="Ola" last="Amayri">Ola Amayri</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Canada</country>
<wicri:regionArea>Electrical and Computer Engineering, Concordia University, Montreal, QC</wicri:regionArea>
<wicri:noRegion>QC</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Canada</country>
</affiliation>
</author>
<author>
<name sortKey="Bouguila, Nizar" sort="Bouguila, Nizar" uniqKey="Bouguila N" first="Nizar" last="Bouguila">Nizar Bouguila</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Canada</country>
<wicri:regionArea>Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC</wicri:regionArea>
<wicri:noRegion>QC</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Canada</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Artificial Intelligence Review</title>
<title level="j" type="sub">An International Science and Engineering Journal</title>
<title level="j" type="abbrev">Artif Intell Rev</title>
<idno type="ISSN">0269-2821</idno>
<idno type="eISSN">1573-7462</idno>
<imprint>
<publisher>Springer Netherlands</publisher>
<pubPlace>Dordrecht</pubPlace>
<date type="published" when="2010-06-01">2010-06-01</date>
<biblScope unit="volume">34</biblScope>
<biblScope unit="issue">1</biblScope>
<biblScope unit="page" from="73">73</biblScope>
<biblScope unit="page" to="108">108</biblScope>
</imprint>
<idno type="ISSN">0269-2821</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0269-2821</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Feature mapping</term>
<term>Online active</term>
<term>Spam filtering</term>
<term>String kernels</term>
<term>Support vector machines</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: Electronic mail is a major revolution taking place over traditional communication systems due to its convenient, economical, fast, and easy to use nature. A major bottleneck in electronic communications is the enormous dissemination of unwanted, harmful emails known as spam emails. A major concern is the developing of suitable filters that can adequately capture those emails and achieve high performance rate. Machine learning (ML) researchers have developed many approaches in order to tackle this problem. Within the context of machine learning, support vector machines (SVM) have made a large contribution to the development of spam email filtering. Based on SVM, different schemes have been proposed through text classification approaches (TC). A crucial problem when using SVM is the choice of kernels as they directly affect the separation of emails in the feature space. This paper presents thorough investigation of several distance-based kernels and specify spam filtering behaviors using SVM. The majority of used kernels in recent studies concern continuous data and neglect the structure of the text. In contrast to classical kernels, we propose the use of various string kernels for spam filtering. We show how effectively string kernels suit spam filtering problem. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variants in TC that yield improved performance for the standard SVM in filtering task. Furthermore, to cope for realtime scenarios we propose an online active framework for spam filtering. We present empirical results from an extensive study of online, transductive, and online active methods for classifying spam emails in real time. We show that active online method using string kernels achieves higher precision and recall rates.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Canada</li>
</country>
</list>
<tree>
<country name="Canada">
<noRegion>
<name sortKey="Amayri, Ola" sort="Amayri, Ola" uniqKey="Amayri O" first="Ola" last="Amayri">Ola Amayri</name>
</noRegion>
<name sortKey="Amayri, Ola" sort="Amayri, Ola" uniqKey="Amayri O" first="Ola" last="Amayri">Ola Amayri</name>
<name sortKey="Bouguila, Nizar" sort="Bouguila, Nizar" uniqKey="Bouguila N" first="Nizar" last="Bouguila">Nizar Bouguila</name>
<name sortKey="Bouguila, Nizar" sort="Bouguila, Nizar" uniqKey="Bouguila N" first="Nizar" last="Bouguila">Nizar Bouguila</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Istex/Checkpoint
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000621 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Istex/Checkpoint/biblio.hfd -nk 000621 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Istex
   |étape=   Checkpoint
   |type=    RBID
   |clé=     ISTEX:48793E6D30029B98D091A5E7073D9106295DE2B8
   |texte=   A study of spam filtering using support vector machines
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021