Bayesian Spam Filter Trainer

by Viper 20. June 2009 20:11

Download SpamTrainer Binaries

Download SpamTrainer Source

As more and more people are tweeting, spam is growing with it as well. Every time I search for some topic, almost half of the messages seem to fall in one of the following categories:

  • Somebody is trying to sell something
  • Somebody is posting links to get you affiliate web sites to make some money
  • Job agencies are posting jobs
  • .... and more

This week I decided to use Bayesian spam filter, that is used in most email servers to filter spam, on twitter messages. While searching around I found Bayesian Spam Filter for C#. That gave a good starting point. Without making any changes or training with any additional corpus, I was able to get very good filtering results. I observed close to 90% spam detection. I studied the messages that fell through the cracks and also studies false positives. Based on the observations I figured that issue is very limited context of 140 characters in twitter. A lot of good and spam twitter messages look pretty much the same. So the key to improving spam filtering results was to train the filter with twitter messages and not use just rely on corpus taken from emails or things like that. So I decided to build an application that I could use to generate corpus that is classified as spam and good twitter messages.

How does it work

  • Start the application.

  • Enter a search term and click on "More Data" button.
  • Application will do initial classification of messages. All spam messages are displayed in Orange or light blue color.
  • Double on any message to change its classification.

  • Once you are satisfied with the results, click on "Accept" button and results are saved in appropriate good and spam files.
  • You can load the new corpus results by clicking on "Reload Corpus".

Spam Filter Service

I have created a service that you can use to classify your text if you do not want to build one of your own. Following link provides more details about the service.

Spam Filter Service

Give your advice to big bosses and make money

Views: 1221

Tags: , , ,

.Net | Twitter

Comments

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.5.1.7
Theme by Naveen Kohli

By Categories