Analyzing Words in Spam Emails
Updated on November 02, 2021
By Pete Freitag
By Pete Freitag
We recently did some analysis on our bayesian spam filter corpus (spam assassin token database), and came up with a list of words with a high spam/ham ratio.
By using the spam/ham ratio, and not the spam count, we came up with a better list of words to avoid. Most lists would have you avoid words like click
and here
, but they are used so much in legitimate email, that they have a lot spam/ham ratio.
Analyzing Words in Spam Emails was first published on August 03, 2005.
If you like reading about spam, email, bayesian, deliverability, or semantics then you might also like:
- Battling Comment Spam
- Trick or Treat - Web 2.0 Goodies for ColdFusion
- Spammers now using ASCII Art
- ReturnPath aquires BondedSender