Search

Browse by Topic



A spam corpus is a collection of real email messages (both spam and ham), used for spam control filter testing, research, and development. This new corpus was created for the Text REtrieval Conference (TREC) 2005 Spam Evaluation Track. It contains approximately 92,000 messages, of which 42,000 are ham and 50,000 are spam. The corpus is primarily intended for academic research and development of anti-spam filters and has significant restrictions on its use. This collection is important as it provides a standardized collection to test and compare spam filters in both academic and commercial contexts.

There is an academic paper describing the creation process called "Spam Corpus Creation for TREC." It was presented by Gordon Cormack and Thomas Lynam at the Second Conference on Email and Anti-Spam, CEAS 2005. Papers presented at the Spam Track of TREC 2005 that use the public corpus are available here.

Ben Gross (editor: Richi Jennings)


  1. 1 Tarek rashed

    iam graduate student , my subject thesis in master degre is neural nework and genetic algorithm for spam filter, i use windows platform for my PC , i was download Spam 2005 corpus , is tha corpus working under windows OS and how can i uncompress it, plz help me

    1. 1 Security Incite: Analysis on Information Security


Leave a Reply