Thursday, August 16, 2007

The downside to popularity

As many people reading Planet IM know, I am a developer on the Guifications project and the Purple Plugin Pack, and a "Crazy Patch Writer" for Pidgin. Over the course of my time with these projects, we've seen our fair share of...well, pretty much everything. New developers; departing developers; people stepping back due to frustrations, busy schedules, etc.; and the biggest of them all, spam.

Not too long ago, Guifications and the Plugin Pack were hosted at SourceForge. Back then, things were pretty good--our website was pretty well static, although it was done in PHP so we could do cool stuff like pull in our SourceForge project news feed to use as our home page and have extensible menus with XML and whatnot. We also operated under one project at SourceForge. Over time, we (Gary most of all) became displeased with and irritated by things that we saw happen and things we experienced while there, and we eventually decided to leave SourceForge for self-hosting.

When we began self-hosting, Gary started renting a virtual private server from Steadfast Networks while waiting for a dedicated server to become available. It was a bad time for us to have been VPS customers, though, as performance was bad due to overloading. We were finally able to move to a dedicated box and couldn't be happier with it. I personally rented a VPS from them as well, and initially had similar frustrations. In the 9+ months since we've had Guifications and the Plugin Pack hosted on the dedicated server, the VPS performance issues have been resolved. The VPS I have is fast enough that I can barely tell the difference between it and a real box.

With the move to self-hosting, we needed something that could replace SourceForge's trackers. The solution was simple--we'd heard quite a few good things about Trac and tried it out. We've been with it ever since, and Gary even went to the trouble to update the SourceForge to Trac migration script to migrate our tracker items into tickets.

Over time, we discovered issues on one of the two Trac environments we were running--spammers were attacking us. At first it was comment bombing, where the spammers would flood us with links in comments on tickets--primarily closed tickets, but any ticket was a target. After the third attack, we cracked down and locked our Tracs to the point that only developers could open tickets or edit the wiki. This caused some friction in our user community, because we had bugs that no one could report and no one could make feature requests. Our apache logs showed a continued series of attempts to attack, growing similarly to our popularity with Pidgin users.

Thankfully, several people pointed us toward the Spam Filter plugin. It's a great plugin because now we can have authenticated users making ticket submissions, comments, wiki changes, etc. but at the same time we have blacklisted regular expression filtering, IP throttling, evaluation via Akismet, and external links filtering. Since installing and configuring this plugin we've had a whopping three successful spams. The monitoring feature is useful for adding blacklisted regexes, as well.

The only problem with the spam filter is that it requires work. That is, it's not perfect. On occasion it will have false positives, although this has happened significantly less frequently since we moved to requiring new registered users supply a first name, a last name, and an email address at the time of registration. Also, on occasion, we have to teach it about new bad expressions by editing the BadContent wiki page. It is sometimes funny, though, to see the karma scores assigned to some of the spam attempts. The karma can go wildly negative, especially if multiple filters are invoked multiple times on the same message. Currently the record for our Tracs is -41, due mostly to external links subtracting 35 points from the post's initial karma.

And this, my friends, is the downside of popularity--the spammers are out and ready to strike, and we have to fight them at every step of the way. Thankfully, for every group of spammers that don't deserve the privilege of using a computer, much less the privilege of an internet connection, we also have at least one talented programmer creating a tool to help in the battle against spam. It does make you wonder at times, though, what the spammers can possibly gain from attacking us.