Hierarchical E-mail Spam Filtering Using AI & Data Mining Techniques

Bok av Ismail M Khater
Email spam continues to be a major problem in the Internet. With the spread of malware combined with the power of botnets, spammers are now able to launch large scale spam campaigns covering wide range of topics causing major traffic increase and leading to enormous economical loss. There have been great research efforts to combat email spam. However, a major problem in most email spam filters is that they may result in filtering some legitimate emails. Such a problem could be prohibitively expensive in practice especially if the misclassified email is of a great importance to the recipient. For this reason, it is important to build an email spam filter that is capable of efficiently filtering spam while minimizing collateral damage. Since header-based and content-based email spam filtering are the two main approaches for email spam filtering, we propose to combine both approaches (i.e., header-based and content-based) in such a way that achieve the best of both worlds. That is to build fast, efficient and highly accurate email spam classifier. In particular, we propose a Hierarchical E-mail Spam Filtering (HESF) system that is composed of two main phases.