Tuesday, October 7, 2008

The Hosting Game

Well, by now probably everyone's aware that our (Pidgin's) sites had been down for several days. Here's the basic rundown of what's happened with our servers over the last couple years.

Prior to our going public with our rename, Luke Schierer secured a donation of a virtual server from DVLabs. This server was intended to enable us to migrate away from SourceForge's web platform and tracker. It allowed us to have our own domains, and thus host and control our own e-mail services, mailing lists, and Trac. This server did its job for us quite well, even though we did occasionally push the server to or beyond its limits. (Note that in these instances real hardware would actually not have made a noticable difference at all, as all the resources of the real hardware would have been utilized just as we had experienced in the virtual machine.)

Even though we were extremely grateful to DVLabs for providing us a server, we did realize that having only a single server presented us with some limitations, the most important of which is "graceful" failure of core services if the server goes down. It also presented a single point of failure in that the web server getting pounded (such as a posting to Slashdot causes) could potentially cause all our services, including monotone and mail, to grind to a halt. Because of this we had tossed around the idea of finding another host two or three times.

When we most recently thought about hosting, Evan Schoenberg from the Adium project (who also happens to be a libpurple developer) put us in contact with NetworkRedux, who generously provides Adium's hosting. Thanks to Evan's intervention, we spoke with NetworkRedux about our hosting, and they were willing to donate not just one, but two servers and a ton of bandwidth to our project. Thanks to NetworkRedux's generous offer, we're getting the potential to have a lot more raw computing power at our disposal (including more memory to help increase caches, thus hopefully improving speed), as well as the ability to spread our services out a little so that a heavy hit to the website or to trac won't have a detrimental effect on all our services.

We were still discussing details of migrating to these two new servers when the server at DVLabs unexpectedly went down. It turns out that some miscommunication and bad timing caused the guys at DVLabs, who were migrating their own stuff, to down the server hosting our virtual machine, thinking we were no longer using it. We were, of course, confused at first, but once we determined what was happening we were much more at ease. None of our data had been lost. DVLabs were kind enough to supply us with the raw VM image for our own use to recover our data and complete our migration.

From these events we have learned a lesson--redundancy is not only good, it's pretty much a must-have. In the interests of redundancy and graceful failover, we've taken some steps to help prevent future long-term outages. One of those steps is that Gary Kramlich and I decided that we would help out by providing some service redundancy on our server, guifications.org. The biggest, and most important, service we are helping with is monotone. If for some reason the monotone server is down, guifications.org can seamlessly stand in for the mtn.pidgin.im server through the magic of DNS. We're also looking at potentially acting as a backup for more services.

So all in all, the commotion about our sites being down was really just a lot of noise about nothing truly significant. Yes, our sites were down, but this wasn't the end of the world. Google's massive cache had pretty much all the critical information anyone could need from our wiki. Yes, monotone was down, but guifications.org was ready to help--I had been running a read-only mirror for nearly a year which was ready and able to stand in as a production service if needed. Our mailing lists were down as well, but again, it wasn't the end of the world. Overall, we have weathered a prolonged service outage pretty well and taken some lessons from it to help us in the future.

In closing, I would like to extend my deepest, most heartfelt thanks on behalf of all Pidgin developers to DVLabs for their hosting services over the past two years and to NetworkRedux for the services they are now providing for us. We truly appreciate these donations!