Detecting Spam by Weighting Message Words

Salman, Nael; Abdoh, Mousa; Musa, Mohammad

Please use this identifier to cite or link to this item: https://scholar.ptuk.edu.ps/handle/123456789/547

cc-by

Title:	Detecting Spam by Weighting Message Words
Authors:	Salman, Nael Abdoh, Mousa Musa, Mohammad
Keywords:	Spam;Word frequency;Word weight;Classification
Issue Date:	Aug-2009
Publisher:	Çankaya Üniversitesi Fen-Edebiyat Fakültesi, Journal of Arts and Sciences
Citation:	M. Abdoh, M. Musa, N. Salman, "Detecting Spam by Weighting Message Words",Çankaya Üniversitesi , Journal of Arts and Sciences, Vol 11, 2009, pp. 1-14
Series/Report no.:	11;
Abstract:	The huge number of spam e-mail received daily by users account, made the necessity of existence of some sort of automated spam filter to detect and remove these unwanted e-mails. Most of the existing spam filters are based on naïve Bayesian methods. The work presented in this paper introduces a new automated filter based on naïve Bayesian method. The basic idea of this filter is to give each word appears in e-mails a weight based on its frequency in both spam and legitimate mails. This weight value indicates its probable belongings to spam or legitimate. The proposed filter has a preprocessing component which removes all common words. In the training phase a set of 1300 e-mails (legitimate and spam) has been used for giving weights for non common words. The classifier uses the weight table generated in the training phase to classify a given e-mail as spam or legitimate. During testing we used 400 e-mails, 200 of them are spam and 200 of them are legitimate, the proposed algorithm achieved a 95% rate of accuracy.
URI:	https://scholar.ptuk.edu.ps/handle/123456789/547
ISSN:	ISSN 1309-6788 \| e-ISSN 2564-7954
Appears in Collections:	Engineering and Technology Faculty

Files in This Item:

File	Description	Size	Format
DetectingSpamWeightingMessageWords2009.pdf		899.45 kB	Adobe PDF	View/Open

Show full item record

The Digital Repository of Palestine Technical University