Please use this identifier to cite or link to this item:
https://scholar.ptuk.edu.ps/handle/123456789/547
cc-by
Title: | Detecting Spam by Weighting Message Words |
Authors: | Salman, Nael Abdoh, Mousa Musa, Mohammad |
Keywords: | Spam;Word frequency;Word weight;Classification |
Issue Date: | Aug-2009 |
Publisher: | Çankaya Üniversitesi Fen-Edebiyat Fakültesi, Journal of Arts and Sciences |
Citation: | M. Abdoh, M. Musa, N. Salman, "Detecting Spam by Weighting Message Words",Çankaya Üniversitesi , Journal of Arts and Sciences, Vol 11, 2009, pp. 1-14 |
Series/Report no.: | 11; |
Abstract: | The huge number of spam e-mail received daily by users account, made the necessity of existence of some sort of automated spam filter to detect and remove these unwanted e-mails. Most of the existing spam filters are based on naïve Bayesian methods. The work presented in this paper introduces a new automated filter based on naïve Bayesian method. The basic idea of this filter is to give each word appears in e-mails a weight based on its frequency in both spam and legitimate mails. This weight value indicates its probable belongings to spam or legitimate. The proposed filter has a preprocessing component which removes all common words. In the training phase a set of 1300 e-mails (legitimate and spam) has been used for giving weights for non common words. The classifier uses the weight table generated in the training phase to classify a given e-mail as spam or legitimate. During testing we used 400 e-mails, 200 of them are spam and 200 of them are legitimate, the proposed algorithm achieved a 95% rate of accuracy. |
URI: | https://scholar.ptuk.edu.ps/handle/123456789/547 |
ISSN: | ISSN 1309-6788 | e-ISSN 2564-7954 |
Appears in Collections: | Engineering and Technology Faculty |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
DetectingSpamWeightingMessageWords2009.pdf | 899.45 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.