Posts from March 2007
Archives
Stats
About CIA.vc
Development
Last Friday, March 23rd, this site changed domains from cia.navi.cx to cia.vc. This was achieved by sending an HTTP 301 Permanently Moved response to most types of requests made on the cia.navi.cx domain.
The resulting data surprised me:
There seems to be a noticeable difference between the ability of Google and Yahoo to properly deal with these redirects. I'm not just talking about their ability to follow the redirects themselves- I'm referring to their ability to quickly and accurately update their index and page rankings accordingly.
It's possible that the hits from Google will be back to normal after Google has finished fully re-indexing the site, and it's also possible that this is a statistical fluke. However, I would still assert that Yahoo is handling the move much more gracefully. First let's try a Google search for "cia.vc":
Now let's try Yahoo:

As of today, all pages on the CIA site should have an interactive search box at the top, just left of the navigation links.
The quality of the search results will be somewhat mediocre. Currently it's doing a simple substring search on stats paths and titles. Sorry, it can't search the actual commit messages yet.
Browser compatibility is also limited at the moment. It should work on IE6+, Firefox, and Safari, but there are sure to be bugs.
The result quality and the accessibility will be improved over time. If you have any bug reports or suggestions, feel free to leave a comment.
The site rebranding just went live. The CIA service, formerly at http://cia.navi.cx, now has a new domain and new formal name: CIA.vc. This new name emphasizes that we're about Version Control tracking, and it makes it easier to speak about CIA unambiguously when you need to.
The web site should be handing out 301 Permanently Moved redirects to point folks to the new domain. Compatibility- and latency-sensitive content (RSS feeds, XML-RPC, raw XML feeds) will work from any domain without redirecting. Additionally:
- If you're afraid of domains with strange TLDs, you can use http://cia-vc.org
- https://cia.vc now has an SSL certificate from Thawte, so you shouldn't get nagged by your browser if you want to log in to CIA securely. Because of this, I'll probably make secure login the default soon.
At the risk of sounding like I'm changing my mind every 3 seconds...
Sean brought up some good points regarding the naming of CIA. The name CIA, despite being somewhat confusing, has quite a bit of community recognition. Additionally, the name (somewhat indirectly) represents CIA's purpose pretty well- and the modicum of ambiguity and confusion produced by the name does actually help spread the word.
It still remains that the name CIA isn't even remotely unique. This causes problems when picking a domain name, or talking about the service around an audience who's unfamiliar with it. The solution? Keep the CIA, but qualify it in a globally unique way. People are already doing this by referring to the service as cia.navi.cx, but navi.cx doesn't really have anything to do with CIA: it's just my personal domain.
Well, cia.org is taken and cia-project.org is kinda long. What I really want is a domain name that qualifies CIA within the realm of version control.

Thank you, Saint Vincent. I propose making the service's official domain and name CIA.vc, which can be handily abbreviated CIA. The bots will keep their current names, and of course all the existing URIs and e-mail addresses will continue to function.
Like it? Hate it? Tired of all these silly blog posts? Leave a comment and voice your opinion.
CIA's message delivery component had about 15 minutes of unexpected downtime today, followed by a restart of all IRC bots. This was caused by some thrashing that resulted from a still-unresolved slow memory leak in the RPC and Bot servers. Sorry for the disruption.
P.S. I've received both positive and negative feedback over the name "Diffcast" which I suggested to replace "CIA". If you have any opinions on the matter, don't hesitate to leave a comment.
P.P.S. I've been experimenting with XMPP, XMPP-over-BOSH, and XMPP publish/subscribe lately. I think that the XMPP pub/sub specification itself (particularly its support for content-based subscriptions) matches CIA's needs very well, but no current implementation provides that level of support. I might have to hack on ejabberd's mod_pubsub in order to add support for anonymous (temporary lease) subscriptions as well as filtering/formatting using rulesets.
There are a few big problems with CIA that I've had my eye on recently:
The old web interface (The root page, and everything under /stats) sucks.
I want to redo this so that the entire site is running on the new codebase with the new look, but I'd like to make some changes to the way project/author stats are displayed. You can see a prototype of the new stats interface, but there's still much work left to be done. New commits should appear in real-time as they happen. Historical commit data should automatically load as you collapse the more recent items. Each item should have contextual links in the righthand margin for related projects, authors, and version control systems. I'd also like to finish the graphing functionality that's been half-implemented for ages...
The bots are hard to maintain, and lack key features.
I've been working on rewriting the CIA bot daemon itself in Erlang lately, to solve these problems.
The messaging hub is a closed system. Want to implement your own IRC bot or commit ticker, or syndicate the real-time data in a new way? Tough.
I've been thinking about replacing CIA's message hub with an XMPP server to solve problem #3 and help out a great deal with #1 and maybe #2.
Well, I just came across XEP-0124: Bidirectional Streams over Synchronous HTTP. Better yet, this was already implemented by Mabber in ejabberd. Hurray! This means that XMPP on its own might just provide most of the infrastructure necessary for CIA:
- Routing messages from projects to various syndication systems
- Providing an open API that anyone can use to get real-time messages
- Enabling real-time Comet-style updates
- The existing input mechanisms (XML-RPC, email, repository poller) should be able to simply deliver messages via an internal XMPP node
- The existing syndication mechanisms (IRC, Web stats) can also simply be XMPP nodes that subscribe to feeds
- Scalability across server clusters
There are still some problems that, as far as I know, will still be somewhat tough to solve even with XMPP:
Message filtering.
Traditionally, CIA's hub has been dealing with a single all-encompassing stream of commit messages that then get filtered using rulesets. With Jabber, each project's feed would probably become a discrete resource. If you want all the projects a particular person works on, those feeds would have to be explicitly aggregated. This has benefits and drawbacks.
Message formatting.
As far as I know, without a protocol extension there wouldn't be a way to directly ask the XMPP server to send out pre-formatted commit messages. Each client (including those written in Javascript, unfortunately) would need their own code to format the commit messages.
Message storage and retrieval.
The XMPP server would only broadcast events when new commits are delivered. There would still need to be a separate system which subscribes to all new messages, records them to disk, indexes them in various ways, then provides a service for querying that data.
I'm still quite an XMPP newbie, so I'd be very interested in hearing opinions from anyone who's more familiar with the technology.
I came across this today: a post made about a week ago by Olav Vitters, of GNOME fame, announcing a new Bugzilla IRC bot written by Max Kanat-Alexander.
It's actually a plugin for Supybot, and it has some cool features. You can query the bugzilla database directly from IRC. It gets notifications by parsing bugmail, but it can also make queries directly to the Bugzilla server.
I'm glad to see such a bot become generally available. David Trowbridge was hacking on Bugzilla integration for CIA long ago, but it was never completed. CIA was actually designed with support for sending more than just commit notifications, but these other message formats never gained critical mass. There are a few projects that shoehorn their bug or automated test messages into CIA's commit format, but it's been too hard for others to contribute new formatting modules to be installed on the CIA server.
It's also nice to see bidirectional Bugzilla integration. It makes announcements, but you can also ask it about a bug. I'd love to have similar features in CIA: ask it for details (or a URL) on the most recent message, or ask it for the most recent commit made by a particular developer.
Unfortunately, such functionality isn't easily built into the current CIA bot server. The current bot server can't be restarted without causing over 500 IRC channels to be annoyed by join/part messages. This makes it hard to upgrade the bots. I've been thinking about rewriting the bots (and eventually maybe the message hub) in Erlang in order to improve their scalability and maintainability.
If you use svnmerge and you have a CIA bot on your IRC channel, you may have noticed that the merge-related log messages are long. Really long. Also, kinda ugly.
CIA's filtering capabilities to the rescue. Open up your CIA account and head over to your bot's settings page. This example will assume you're using Basic Filtering. If you're already using advanced filtering, you can use that filter as a starting point.
The web interface can't yet help you out if you're using Basic Filtering and you want to transition to Advanced, but it's pretty easy to do once you know the steps.
Let's say your bot is set to Filter by Project, and Prefix each commit with its project name is checked. Your project list might look something like:
python navi-misc asterisk.org
Click the "Advanced Filtering" tab, and write a filter like the following:
<or>
<match path="project"> python </match>
<match path="project"> navi-misc </match>
<match path="project"> asterisk.org </match>
</or>
<formatter medium="irc"/>
<formatter name="IRCProjectName"/>
If you don't want the project name prefixes, you can omit that last line. Also, if you only have a single project you don't need the <or> and </or> tags.
For the final step, add a new <rule> section between the projects and the "irc" formatter. The final ruleset should look like the following:
<or>
<match path="project"> python </match>
<match path="project"> navi-misc </match>
<match path="project"> asterisk.org </match>
</or>
Cut down on svnmerge spam...
<rule>
<find path="log">svnmerge</find>
<formatter medium="irc">
<lineLimit>2</lineLimit>
</formatter>
<formatter name="IRCProjectName"/>
<break/>
</rule>
<formatter medium="irc"/>
<formatter name="IRCProjectName"/>
Notice that you can write comments as plain text in most circumstances. This <rule> element is like a miniature sub-filter embedded in your larger filter. If it doesn't find the text "svnmerge" somewhere in your log message, it stops running the sub-filter. Otherwise, it will apply a tweaked version of the usual formatting: this time with at most two lines of the log message. After applying the new formatting, it uses <break/> to terminate execution of the entire ruleset, to avoid running the IRC formatter twice.
For more tips on using CIA's advanced filtering to customize your IRC bot, take a look at the filtering reference manual.
A lot has changed since CIA first became a service that anyone could use, back in early June of 2003. It gained a web site with per-project statistics and real-time commit feeds. CIA's users wrote client scripts that now enable it to receive data from nearly a dozen different version control systems. The infrastructure was rewritten and tweaked for improved scalability and responsiveness.
At the time of this writing, CIA now hosts commit notifications for 2240 projects. There are 67 IRC bots running on 41 unique IRC networks, posting commits to 562 IRC channels. As you can imagine, the 20-line shell script running on my apartment's DSL line will no longer suffice.
Despite all of this growth and server-side complexity, CIA still stands for doing one thing and doing it well: making it easy for everyone to syndicate source code changes in real-time.
There are now several services on the web which have at least superficial similarities to CIA. Most of these are newer than CIA, which has been online for nearly four years.
It may also be worth noting that, for better or for worse, CIA is the only one of these services which is funded and run by an individual rather than a company.
I don't know if Freshmeat was the first service to syndicate news about open source projects, but it's certainly the most ubiquitous. Perhaps more importantly than their project release syndication is their endeavor to build a complete index of projects.
There are a few important similarities between CIA and Freshmeat. Both are primarily services rather than re-deployable projects. Both CIA and Freshmeat have developed a community of contributed tools: CIA client scripts and Freshmeat submission scripts. Both web sites display summary information on each project.
The important differences are of course related to the type of information being syndicated. CIA delivers notifications on every single source code change, whereas Freshmeat only notifies you of releases. This means that while both services are of complementary use to a project's users, only CIA is useful as a development tool.
SWiK is a relatively recent service, first coming online in mid-2005. It is a taggable wiki for project information and news syndication. It's almost like a Web 2.0 take on Freshmeat, with AJAX, folksonomies, and such.
There are a few similarities with CIA: both services effectively provide a catalog of open source projects, however CIA's catalog is more of a side-effect whereas it's central to SWiK's mission. Both services, while designed to be primarily centralized, are open source. As with CIA, you can set up your own SWiK server if you want to: it just isn't designed to be easy.
The syndication aspects of SWiK, however, bear little resemblance to CIA.
Ohloh is the service on this list which bears the most similarity to CIA. So much so, in fact, that there was recently an Ohloh vs. CIA thread on their forums.
According to Ohloh's About page, their service was founded in 2004- not more than a year after CIA first went online. I first discovered it this January, and even created a page for CIA on Ohloh. (It claims that CIA is mostly written in Javascript, but only because there happens to be a release of YUI in the repository.)
Both CIA and Ohloh connect to your project's revision control system and track the changes you're making. Both services have a database of projects and authors, and they can both track the relationship between each project and their contributors.
A lot of the parts that make up Ohloh and CIA are similar, but the end goals are different. CIA delivers instant change notification. Ohloh does an in-depth analysis of your code repository, looking at things like the number of comments and the amount of code written over time. Neither Ohloh nor CIA would be well-suited to doing the other's job. CIA does not inspect a project's full repository history, and Ohloh does not have support for receiving and syndicating immediate change notifications.
It's clear that there are other established services on the 'net with the goal of cataloging open source projects, performing metrics on their code, or providing a Web 2.0-flavored view of the open source ecosystem. I believe CIA needs to stay true to what it does best, and what it was originally designed for.
CIA should be all about getting changes from your repository to your developers and users as quickly and easily as possible. This means:
The web interface needs to continue improving, but it is really just a means to an end. Any new feature on the web site is of limited use if it doesn't deliver instant commit notifications or make CIA easier to use.
The IRC support needs to improve. The bots need more power, security, and scalability.
CIA needs to branch out into other real-time delivery mechanisms.
In fact, CIA has long supported RSS 2.0 publish/subscribe, based on XML-RPC. Unfortunately, this standard never got off the ground. The only RSS aggregator to implement this flavor of publish/subscribe was Radio Userland, and it was impossible to use this technology over NAT. Subscription support was recently disabled for performance reasons.
I'd like to roll out XMPP publish/subscribe support, as well as Comet-style server-push HTTP. I want to provide primitives that allow other developers to build services on top of CIA without sacrificing immediacy. You should be able to write your own IRC bot, if you wanted to, that can subscribe to a CIA feed and deliver commits nearly fast as the built-in bots.
CIA needs to identify what it stands for. Other projects are monitoring open source projects, but CIA is about monitoring changes and doing it instantly.
This is one reason I've been planning to rename the project...
Out with "CIA Open Source Notification System", in with something new. Diffcast is the name I've been considering. It hilights the fact that CIA is about collecting changes (diffs) and broadcasting them in various ways. It also happens to be a cute domain hack: http://diffca.st/
Yes, the "cast" namespace is overused. This is unfortunate, because I'm much more interested in conveying the notion of "broadcasting" than I am in jumping on the podcast bandwagon.
I plan to change the service's name shortly after the conversion to Django is complete. The web site URL as well as the name used in all documentation will change, however existing URLs and e-mail addresses will always work. You will not need to change your CIA client scripts. I probably won't change the name of the IRC bots themselves.
Whether you like the name Diffcast or think it's unbearably stupid, comments are welcome!
Welcome to the new blog for CIA, the open source notification system. This will be a place to announce new features, as well as to discuss current and upcoming developments with the CIA service.
I'll be writing some more detailed posts later, but I'll kick this off with a short summary of the recent changes to CIA:
CIA now has a web interface for maintaining your own projects and IRC bots. This is a huge improvement over the old "email me and wait 6 months" system.
Thanks to all the developers who have contributed client scripts over the last several years, there is now official support for Arch, Bazaar, Bazaar-NG, BitKeeper, CVS, Darcs, Git, Subversion and Mercurial.
It's now easier than ever to get started with CIA. In addition to installing a client script, there are two other ways to get your project hooked up:
If you host your project with SourceForge.net or Gna!, it's already CIA-ready.
All projects on Gna! are set up with CIA by default. If you use SourceForge.net, setting up CIA is as easy as selecting a pre-installed client script from a menu.
Now you don't have to call CIA. CIA can call you.
If your project has a public Subversion repository, you can have CIA connect directly and download the latest commits. For immediate notification, it can use any e-mail message as a hint to check for new revisions
This is compatible with Google Code.
The web site is getting a much-needed facelift. The site's design is quite dated, and it's missing some crucial features: like search. All of the peripheral portions of the site, such as documentation, have already been converted.
CIA's advanced message filtering functionality, the Ruleset language, has finally been documented. You can optionally write your own filters in this language when you set up an IRC bot.
It's now slightly easier to set up a private CIA server, for development purposes or for private organizations. This is still completely unsupported, but you can download a Virtual Machine image with a pre-configured CIA instance.
