This is an un-ordered compendium of projects and project ideas.
We want to improve the client libraries for the major languages (ruby, python, c++, etc). Some of these don't yet have a 0.8 compatible library available and for others the client is somewhat limited and could be improved.
LinkedIn has an "audit" application that checks the correctness of the data pipeline by comparing published and consumed messages. It would be nice to get this open sourced as well as make a number of improvements to it.
There are a number of projects that fall under the general bucket of performance improvements that aren't called out elsewhere:
Now that we have replication it would be possible to implement exactly-once producer semantics.
For people who want to publish Kafka feeds for existing applications that produce log files it might be nice to have something more sophisticated than the console-producer. This would be a process that ran in the background and tailed log directories and read and published formatted messages.
It would be nice to have a simple web app that showed the state of the cluster--which brokers are up, what topics and partitions they replicate and lead, how much data they have etc.