Posts from November 2008
Archives
Stats
About CIA.vc
Development
Oh no, another post within the same month! Horsemen spotted in the sky, flying pigs imminent!
Anyway, where was I...
Right, there's now a way to get commits from Google Code into CIA.vc that doesn't go through the hackish SVN poller. Read on for details.
As we all know, there's a new player on the block of public open-source hosting: Google Code. Apparently, as a whole bunch of people are using it, they do their stuff fair enough: SVN, wiki, bugtracking, downloads, the works.
Wait, no CIA.vc hooks? Nope. Back in the day with sourceforge and CVS, you could install your own hooks, so things worked. When sourceforge added SVN, people couldn't add their own hooks, but I gather the web config interface has a "CIA.vc hook" checkbox. But without hosting provider support, SVN users are out of luck when it comes to custom hooks.
That's why we added the SVN repository poller. It wakes up a few times an hour (or whenever any mail arrives on a special address) and scans the configured repository, checking if there were any new revisions since it last looked. If there are, it enters a new commit with the right data into the system. So usually you'll set it up with the default poll delay of 15 minutes (anything below what it can make, roughly a poll every 20-30 minutes with our load, gets silently upped to that interval) or, even better, subscribe the ping email address to your commit mailing list.
That works well enough, and produces excellent XML commit data, but it's a bit hackish - the polling is horridly inefficient, and I need to run yet another service on our poor machine. Also, it's sometimes a tad slow.
So I figured "If we already get the commit data from the mailing list, can't we use that somehow?" and resurrected an old set of scripts. What we can do now is pick up mail sent out by Google Code (via the "Activity Notifications/all subversion commits" field on the "project summary" pane of the "administration" tab) and try to parse the commit.
The results are not as fancy as the repository polling, those mails were meant for humans to read, not machines, after all. So I can't always figure out filenames, and since there's no unambiguous "end of Log message" tag I currently cut log messages at the first empty line.
But I think it's slightly faster than the polling method, and it's certainly more elegant. It's still not instantaneous, mostly because it seems Google's email machinery takes a minute or two. But if you want, use it! Simply change the mail settings (or mailing list) to send to "cia+googlecode@" instead of "ping+whatever@". (And turn off periodic polling, if you have it enabled, or you'll end up getting your commits twice)
Feedback is always welcome! (the nick I sign my posts with, at this domain; or just comment on the blog) We'll probably have a few corner cases I didn't catch, but with some work we should be able to turn this into yet another Good Way to get commits into CIA.vc!
Note to any google employee (especially the google code folks) reading this: I'm sure that when we work together, we can do better. Drop me an E-mail and I'll work something out. I'm a geek, so pretty much any method you come up with to pipe commits, I can handle. It's always nice to see your commit show up on IRC while your finger is still hanging over the enter key, and we should be able to make that happen.
Umm, yeah, that's right, I'm still here.
Sorry if we've seemed a bit unstable lately - we've had a couple of unexpected (and partly unexplainable) problems crop up, and while I take care of things whenever I see them come up, that seeing part could be improved somewhat 8)
Things should be looking better soon, though. I've worked a bit on infrastructure that'll allow me to notice problems earlier and better, as well as made sure I get (and read) information from all the pieces of the system when stuff goes boom.
On the hosting side, we have a very interesting lead that may see significant improvements to our system. I don't want to give out any details as long as I don't have anything solid, so I don't make anyone look bad by accident, but stay tuned to this channel for more information when we have it.
What else have I been up to?
I've gone and scrubbed our blog comment spam. I already disabled links in comments a while ago, to cut down on all those people who saw "cool, a blog with open comments" but didn't see the rel="nofollow" we put on all comment links. We still got a lot of idiotic comments, though, which I can only guess must be some kind of "Hey guys, here's a blog with open comments" magic strings in the spam community. Or something. Well, gone now.
So I'll just have to periodically scrub the comments. I don't want to take direct steps against automated comments just yet - if you've got an RSS reader that allows you to directly post comments, I applaud your ingenuity.
On the subject of spam, it seems someone got the idea to use the project pages for spamming. Let's hope that trend doesn't continue. I'd hate to have to set up a wikipedia-like army of "recent changes" monitors.
In what's probably our most important component, the IRC bots, I've tuned the freenode settings - they should connect much faster now when stuff is restarted, and fixed a bug that prevented them from properly connecting to EFnet. I hope I've also increased the general connect speed to any network, but we'll have to see how that goes next time it needs a restart.
What will I be working on?
Obviously, the hosting change I've hinted at above is going to take some working-out.
Also, we've had some interesting bugs with unicode / UTF-8 in commits. I muchly hope we're at the point where commits get through, no matter the charset, but I'm afraid we currently replace all 8bit characters with '?'. Pieces of the core don't work happily with unicode, we'll need to fix that.
The advent of distributed SCM's like git have seen an (almost) entirely new problem: Currently, most hooks take each commit pushed into the central repository and send out a notification about it. Normally, that's exactly what you want.
Things get a bit interesting if someone checked in from his vacation to deliver the 100 commits he wrote while away, and they take a machine-gun-march through the system. Or occasionally I see someone merging branches, and the hook script sending a commit for every merged commit. We'll have to change the hook scripts to detect this kind of thing and just say "push of 100 commits" or something. Stay posted, when I get such a script I'll put it up and tell everyone to use it ;)
I guess an alternative that might work for some people would be this: Instead of putting an on-push hook on the central repository, put an on-commit on each developer's repository - that way, you'll get instant notification what everyone is working on, and can ask him/her to push it up when it looks interesting.
Oh whee, that got much longer than I intended. I'd better stop and get some actual work done again now ;-)
