Stats
About CIA.vc
Development
Usability improvements
Now that it's finally easy to get started with CIA and easy to administer your projects and bots, the biggest user-visible problem is the decrepit state of the actual stats pages. For a great example of how much the current UI sucks, see the KDE or GNOME pages.
These pages are hard to navigate due to the clumsy 'catalog' with no search or pagination. The page layout does not put the most important information first. Top priorities should be:
Port the stats target page to the new site. This can probably be done in stages, starting with the stats pages themselves and moving the RSS feeds and per-message pages later. URLs must remain backward-compatible, though it should make an effort to strip out unfriendly characters.
- This has already started, at /stats-experimental
Search-based interface for locating stats targets, both on a site-wide level and within children of the current target.
- Unified search. A single real-time search widget should be able to find child targets and messages.
Expand/collapse with automatic loading and purging
- The stats page itself, on the client side, should be seen as a cache for server-side data. It is statically populated with a small default dataset, which also serves as static content for non-javascript browsers. The client needs to be able to load new data for date ranges that have just been expanded. It should also be able to purge data for date ranges that have not been opened recently.
Real-time updates. I was planning to write a lengthy section on how this should be implemented. It turns out that this has not only already been invented, it has already been given a corny name. See cometd and mod_mailbox.
Extended details about a message should be visible in-line. By default, the log message should be truncated and the file list will be summarized. An expander can reveal the full log message and file list.
Next-generation stats relations
Each visible message has an icon or text advertising each other stats target that received the same message. This allows each message to act as a link between projects, VCSes, authors...
These links/icons can show up in the right margin of the stats page.
This usage model supports the idea of adding an SQL index alongside the current message archives: each row would include a message ID (archive # and message offset), a timestamp, and a target ID.
- Test the efficiency of such an SQL table
- Implement BSAX in C, in order to quickly index existing messages.
In addition to messages, collapsed date headings should also show links to other stats targets. This will make it natural to see who was working on a particular project last month, for example.
For efficiency, this will require a separate database table relating multi-resolution time periods (years, months, days) to stats targets. Each relation should include strength/freshness.
Sparklines and interactive graphs, produced with fidtool. Prerequisites:
There is a new web interface under development, using Django. It will eventually replace the old web interface entirely, but it's currently only used for managing user accounts and assets (such as bots, projects, and authors).
Abuse prevention
The ability to revert changes from the change history page
More tools to easily close accounts, lock assets, and revert all of a user's changes
IP- and cookie-based bans. This will be a quick fix to silence the less persistant troublemakers.
IP- and login-based sandboxing. Let the mailicious users log in and change settings, but keep those changes in a private sandbox.
Fix the IRC channel redirect bugs
Enforce network uniqueness. Currently, users may create multiple IRC "networks" which actually refer to the same physical network. This can happen by accident, or it might be abused maliciously to cause multiple CIA bots to join a single channel or to introduce bots into channels where they aren't wanted.
This is a difficult problem to solve. Two servers may refer to the same network because they are multiple DNS names for the same IP, multiple IPs on the same server, or multiple servers linked together on a single IRC network.
Various methods could be used to determine server uniqueness at network creation time:
Use the NETWORK= portion of the 005 (server capabilities) message. This is simple and straight-forward, and it's supported by most networks I've tested. The biggest exception is Freenode.
Networks that don't send a NETWORK= field will still work with CIA, they'll just have to be installed by an administrator first. The "add bot" page will conduct a network identification test when the user submits the page with a network of "Other...".
Note that this means we'll no longer ask the user for a network description. Either it comes from the NETWORK= field, or we need to involve an admin.
Some test code for this method is in tools/irc-tester.py
Create a 'fuzzy' hash of the network's identity by examining the names and topics of the most popular IRC channels on a server. This wouldn't be guaranteed against false positives, but false negatives (which can be abused) would never occur. This approach has the advantage of requiring no extra IRC connections aside from a single connect while adding new networks.
Look for other CIA bots, and verify the identity of those bots. This would be very reliable, but it requires changes to the bot daemon and it requires that we maintain at least one bot on all networks in order to perform these checks.
Verifying networks at creation-time solves most problems, but it isn't quite a complete solution. Servers may at some later time switch networks. The result is that we may have to merge our own concept of a "network". This will be harder to solve, and will definitely require support from the bot daemon.
Require the consent of IRC channel operators in order to install new IRC bots. In theory this is a fairly simple thing to do- but it requires changes in the bot daemon, which makes it fairly difficult at the moment. See below.
Ruleset editor improvements
Image uploader improvements
The wiki pages linked above are really just my own notes on the subject- they are a little disorganized, and subject to change. If you'd like to contribute to any of these projects or you're curious how they work, please contact Micah directly.
Bots will rejoin when kicked from a channel, they need to remove or disable their irc:// ruleset.
This will become important once the web interface for IRC ruleset editing is written- if someone invites a CIA bot into a channel where it isn't wanted, the channel's ops need to be able to ask it to leave. As it is now, the botnet tries a little too hard and the bot will just immediately rejoin.
Track bandwidth averages per-request and per-bot. This could be used to load balance the bots by rearranging requests among them, and it would make a nice indicator for the IRC status page
Support SSL connections to IRC servers. Some (mostly-private) networks only allow SSL connections.
