Please use this identifier to cite or link to this item:
Title: Detecting Spam by Weighting Message Words
Authors: Salman, Nael
Abdoh, Mousa
Musa, Mohammad
Keywords: Spam
Word frequency
Word weight
Issue Date: Aug-2009
Publisher: Çankaya Üniversitesi Fen-Edebiyat Fakültesi, Journal of Arts and Sciences
Citation: M. Abdoh, M. Musa, N. Salman, "Detecting Spam by Weighting Message Words",Çankaya Üniversitesi , Journal of Arts and Sciences, Vol 11, 2009, pp. 1-14
Series/Report no.: 11;
Abstract: The huge number of spam e-mail received daily by users account, made the necessity of existence of some sort of automated spam filter to detect and remove these unwanted e-mails. Most of the existing spam filters are based on naïve Bayesian methods. The work presented in this paper introduces a new automated filter based on naïve Bayesian method. The basic idea of this filter is to give each word appears in e-mails a weight based on its frequency in both spam and legitimate mails. This weight value indicates its probable belongings to spam or legitimate. The proposed filter has a preprocessing component which removes all common words. In the training phase a set of 1300 e-mails (legitimate and spam) has been used for giving weights for non common words. The classifier uses the weight table generated in the training phase to classify a given e-mail as spam or legitimate. During testing we used 400 e-mails, 200 of them are spam and 200 of them are legitimate, the proposed algorithm achieved a 95% rate of accuracy.
ISSN: ISSN 1309-6788 | e-ISSN 2564-7954
Appears in Collections:Engineering and Technology Faculty

Files in This Item:
File Description SizeFormat 
DetectingSpamWeightingMessageWords2009.pdf899.45 kBAdobe PDFThumbnail

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.