Thursday, July 31, 2008

The Holy War of Tool Choice

Tools are vital to any open-source project. Because of that, tool choice is critical. You really don't want to be switching tools frequently, because doing so is more work than you'll ever be able to put into the project. By that same token, you don't want to pick tools no one in the project wants to use. This leads to the problem--tool choice is often a "Holy War", so to speak, because everyone has their pet tool and thinks everything else to be inferior.

Let's use Pidgin as a practical example. Prior to our most recent legal problems (which prompted our rename), we had been evaluating new version control software for some time. At that time, we were using CVS. Almost everyone who has developed on a project as large as Pidgin will agree that CVS sucks. You can't rename files unless you rename them in the repository itself, branching and merging are a pain, etc. I wasn't involved with the project at this point in time, but I do know that every distributed VCS in existence at the time was evaluated. After the evaluation, Monotone was the preferred choice. Keep in mind that this was over two years ago, more likely closer to three.

We continued to use CVS, and later made an ill-advised switch to Subversion. Sticking with these tools was mainly due to our complete reliance on SourceForge, who offered only CVS and SVN. Finally, when it came time to rename our project to Pidgin, we had a donated virtual server on which we could run (essentially) whatever we wanted. Ethan Blanton used tailor to convert our SVN repository into a monotone database.

We also set up Trac for bug, patch, and feature request tracking. The built-in wiki was a bonus. We ended up installing a number of plugins for Trac to customize it and provide some additional features, such as restricting some ticket fields from those who we feel should not be able to modify them.

All was well initially. We had a few complaints because we chose to use monotone instead of sticking with SVN, but these were users who had no intention of contributing patches, so we were (mostly) happy to see them angrily go back to using our "blessed" releases. We did, however, have a lot of confusion initially about what monotone was and how it related to Pidgin. A few hundred explanations later and those questions faded too.

Trac proved to have some scalability and performance issues for us on numerous occasions. Eventually we were able to narrow the biggest issues down to their causes and implement fixes. Some of this involved a LOT of poking and prodding, as well as some assistance from Trac developers, who we thank profusely for their time in resolving the issues. I am quite surprised, overall, that we haven't had any real complaints about Trac. For a while it was frequently completely unusable, and of course we saw a number of complaints related to that, but since the issues have been resolved we've seen no real complaints.

As for monotone, fast-forward a year from the announcement of the rename. We consistently get complaints about having to download a "huge" database just to get the latest development source for Pidgin. We do also get a few complaints that it's hard to use, but almost all of these are solved simply by pointing people at the UsingPidginMonotone wiki page. As for the huge database, we do acknowledge that it is inconvenient for some people, but "shallow-pull" support is coming to monotone. A shallow pull would be similar to an svn checkout.

I don't mind the database, myself. I have 11 working copies (checkouts) from my single pidgin database (8 distinct branches, plus duplicates of the last three branches I worked on or tested with). Each clean checkout (that is, a checkout prior to running autogen.sh and building) is approximately 61 MB. If this were SVN, each working copy would be approximately 122 MB due to svn keeping a pristine copy of every file to facilitate 'svn diff' and 'svn revert' without needing to contact the server the working copy was pulled from. Now, let's add that up. For SVN, I would have 11 times 122 MB, or 1342 MB, just in working copies. For monotone, I have 11 times 61 MB for the working copies (671 MB), plus 229 MB for the database, for a grand total of 900 MB. For me, this is an excellent bargain, as I save 442 MB of disk space thanks to the monotone model. For another compelling comparison that's sure to ruffle a few feathers, let's compare to git. If I clone the git mirror of our monotone repository, I find a checkout size of 148 MB after git-repack--running git-gc also increased the size by 2 MB, but I'll stick with the initial checkout size for fairness. If I multiply this by my 11 checkouts, I will have 1628 MB. This is even more compelling for me, as I now save 728 MB of disk space with monotone.

Richard Laager also took the time to point out an interesting, but as yet unexploited feature of monotone--the use of certs allow us to do a number of useful things while being backed by the cryptographic signing of certs. For one, we could implement something similar to the kernel's Signed-Off-By markers. We could also have an automated test suite run compiles of each revision and add a cert indicating whether or not the test succeeded. We could also include a test suite using check (we do already have some tests) and use more certs to indicate success or failure. The extensibility of monotone using lua and the certificates is truly mind-boggling.

I think I've picked on version control enough here, but before I move on, I'd like to say that I don't specifically hate any one version control system. Ok, maybe I hate CVS and Subversion, but I'm certainly not alone in that. I don't, however, hate git, bzr, hg, etc. In fact, I think hg looks quite promising--the Adium folks have decided to switch to it, and are currently in the process of trying to convert from SVN. However, given that for us monotone was the best choice when we made our decision, and given the extensibility of monotone, I think it would be foolish of us to choose another tool at this point. Perhaps in a few more years when all the DVCSes have had time to mature further it will make sense to revisit this decision.

Another tool choice involves communication between developers and users. Specifically, forums. We have elected not to have forums for Pidgin, although as a legacy of our continued involvement with Sourceforge for download hosting we are stuck with the forums they provided us. We are quite frequently belittled for our choice to forego widely visible forums. At least four of us that I am aware of have a strong distaste for forums. I find them to be worse than the ticket system for attracting duplicated questions which most users won't search beforehand. All the useless extra features, such as emoticons, formatting, etc., are a waste, as well.

It's also been my experience with forums that one or two people will start answering others' questions with misinformation that will sound legitimate to those receiving the bad answers. I've seen this numerous times, but it comes to mind so quickly as I've seen it quite recently on Ubuntu's Launchpad "Questions" forum or tracker or whatever you call it for Pidgin.

Instead of forums, we have chosen to use mailing lists. We receive complaints about this too. These complaints run the gamut from "mailing lists are hard to use" to "but you can't search the mailing list!" In reality, the mailing lists are searchable--Google indexes our mailing lists, as do a number of other search engines. Perhaps we could provide a nice search box or something that does the necessary google magic (make the search query "terms site:pidgin.im") to make it a bit friendlier, but searchability exists. Not that it matters much in my experience, but hey, I'm just a developer; what do I know?

I'd also argue that mailing lists are easier to use than forums, as all you have to do is, y'know, send an e-mail. With a forum, you have to register, log in, find the right forum if there is more than one, then post your question, and check back for answers. With the mailing list, those genuinely interested in providing assistance will helpfully use the "Reply All" feature of their mail client to provide the answer both to the user requesting help and the mailing list's archive, thus providing an answer that is both helpful and searchable.

Even so, all the choices made during the evolution of a project will result in people trying to start a crusade, holy war, revolt, ..., over the chosen tool. Unfortunately, it's a fact of life. The best we can do is take it in stride, justify our choices, and move on.

Note: It's come to my attention that I had missed the ability to share a git database across multiple working copies. In that scenario, the total size of the database and 11 working copies is slightly under 750 MB, and thus a space savings in the neighborhood of 150 MB over monotone. It had been my understanding that I needed a copy of the database per working copy. I stand corrected. I don't use git on a daily basis, as the projects I work with currently use CVS, SVN, or monotone, so I am bound to miss finer details of git here and there. There are other reasons I prefer to stick with monotone, but I won't get into them here, as they're not important to the point of this post.

Sunday, July 13, 2008

"State of the MSN Plugin Address"

Yes, the title is intended to be half joking. Just a few days ago I made a comment to a fellow developer that "someone needs to make a 'state of the msn plugin' address," meaning that someone needs to let people know what's happening. Well, looks like that someone's going to be me. So here's the outsider's summary of what's happened.

In current Pidgin releases, the default MSN plugin uses the old MSNP9 protocol. This protocol doesn't support Personal Status Messages (PSM, also called "personal messages"), nor does it support offline IM, current media (a field that shows what music you're listening to or what video you're watching), or messaging while invisible.

Two years ago, for the 2006 Summer of Code, we had a student whose project was to update our MSN code to MSNP13. MSNP13 supported the PSM and current media, although at the time libpurple didn't support the status attributes for "now playing" or "current media" or whatever you prefer to call it. MSNP13 may also have supported messaging while offline, but I can't recall for certain and I don't, at the moment, feel like looking it up in the documentation we have online. Later this was updated to MSNP14 for Yahoo! interoperability support. At the end of the summer, the code wasn't of a quality we were ready to release. The code sat.

Last year, we had another student take on MSN protocol updating. This time the student updated the code to support MSNP14, starting from the previous work. This supported offline messaging for sure, and there were a lot of commits that dealt with the OIM features. Again, at the end of the summer, the code wasn't release-ready in our opinion, so again the code sat.

Earlier this year, one of our developers started working on the MSNP14 code in a side branch, and eventually merged it to the im.pidgin.pidgin branch in our monotone repository. Because the code still wasn't release-ready, we disabled it and forced the MSNP9 plugin to build by default. The more clever users were still able to figure out how to get MSNP14 going, but there weren't too many until we introduced an argument to the configure script to enable the MSNP14 plugin.

A new "crazy patch writer" appeared and started to work on MSNP15 support. We eventually gave him write access to the central monotone repository at pidgin.im. This has paid off for us very well. Elliott has done a LOT of work, including some necessary cleanup, stability improvements, etc. MSNP15 was finally merged to im.pidgin.pidgin this morning. Currently it is also the default MSN plugin, and I hope it stays that way, but it might be changed if anyone objects to it.

This would not have been possible without the work of Elliott, the two Summer of Code students, one other crazy patch writer (known as Masca on #pidgin), our XMPP Voice and Video Summer of Code student for this year (known as Maiku on #pidgin), and Dimmuxx and other Adium users' testing. I'm sure there are a number of other people I've forgotten to mention who have made contributions to this progress, and I apologize. My memory isn't what it used to be. To all these people who have helped with the long road of updated MSN protocol support, I give well-deserved thanks.

So what does all this mean? Well, simply, a lot of the feature requests we've been getting are finally resolved. Ideally we also will stop seeing people asking why they don't receive offline MSN messages, why they can't message while invisible, and why they can't see personal messages. For these reasons alone, I can't wait for the release of Pidgin 2.5.0!

Friday, July 4, 2008

Pidgin 2.4.3 and ICQ

Well, this week has certainly been interesting with Pidgin development. On July 1, AOL made a small change to the OSCAR servers that affected our ICQ users. A lot of misinformation was spread in #pidgin about what really happened, so here's my best attempt to explain it.

Libpurple (and thus Pidgin and Finch) identified itself as ICQ basic 14.34.3000. Of course, we sent a version string that included our real identification ("Purple/2.4.2" for example). This has worked for us for years. We send the version bytes from official clients to the server because if we don't mimic an official client, we get locked out of some features. Which features, I'm not really sure, since I don't really know or care about the internals of the OSCAR protocol.

Suddenly we started seeing reports about ICQ no longer working in #pidgin. Upon investigation we discovered that apparently the client ID has been "retired", and any logins from that client ID are immediately met with a message from the server demanding an upgrade to ICQ6. The server then forcibly disconnects us. We catch this error and display a somewhat different message instructing our users to go to pidgin.im to get an update.

Stu Tomlinson found the version identification information for the most current official ICQ client and changed our identification. Magically logging in and holding a conversation works. This patch became the reason for the release of Pidgin 2.4.3.

So to summarize, this was NOT a protocol change, merely a tweak on the server side to disallow logins from "old" clients. I suspect this is to try to weed out older, buggier clients and get the users on current software.