Indexed by:
Abstract:
The growing problem of unsolicited bulk e-mail, also known as 'spam', has generated a need for reliable anti-spam e-mail filters. We introduce seven filtering algorithms: Naive Bayesian (NB), Decision Tree (DT), AdaBoost, ANN, SVM, VSM and KNN. Design considerations and implementation issues of these filters are discussed, such as how to get cost-sensitive NB, SVM, VSM, KNN. Using two relatively large amounts of real personal Email data, a comprehensive comparative study based on a cost-sensitive measure we approved was conducted using above seven filters. The study includes the effect of feature subset size, training-corpus distribution, issues that have not been explored in previous experiments. The comparative results show that cost-sensitive filters such as NB, SVM, VSM and KNN have fewer count of misclassifying legitimate when relative parameters, feature subset size and training dataset's distribution are reasonable.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2005
Page: 325-334
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 13
Affiliated Colleges: