Cassandra Use Cases
This summary of a mailing-list survey briefly describes how several organizations (Rackspace, Cisco, OneSpot, more) are using Cassandra: more detail in this mailing-list thread.
The below gives simple use patterns and example implementations in high-level code.
If you've got more simple examples along the lines of those below, please add them.
Twissandra, a Twitter clone using Cassandra
Available at twissandra.com.
A Simple Capped Log
Please help complete
- Adapt e.g. this redis implementation to Cassandra
- This mailing list thread gives an overview for building a production-grade windowed time-series store in Cassandra.
Inverted Index for Document Search
Please help complete
A distributed Priority Job Queue
Please help complete
Use Cassandra to enqueue jobs with a priority and optional delay. At each request, the broker assigns the ready job with highest priority.
Consistent Vote Counting
From a conversation on the #cassandra IRC channel, here's a way to implement Consistent Vote Counting using Cassandra that doesn't depend on vector clocks or an atomic increment operation.
Uniq a large dataset using simple key-value columns
We have to batch-process a massive dataset with frequent duplicates that we'd like to skip.
Here is ruby code using Cassandra as a simple key-value store to skip duplicates. You can find a real working version in the Wukong example code – it's used to batch process terabyte-scale data on a 30 machine cluster using Hadoop and Cassandra.
class CassandraConditionalOutputter CASSANDRA_KEYSPACE = 'Foo' # Batch parse a raw stream into parsed objects. The parsed objects may have # many duplicates which we'd like to reject # # records respond to #key (only one record for the given key will be output) # and #timestamp (which can be say '0' if record has no meaningful timestamp) def process raw_records raw_records.parse do |record| if should_emit?(record) track! record puts record end end end # Emit if record's key isn't already in the key column def should_emit? record key_cache.exists?(key_column, record.key) end # register key in the key_cache def track! record key_cache.insert(key_column, record.key, 't' => record.timestamp) end # nuke key from the key_cache def remove record key_cache.remove(key_column, record.key) end # The Cassandra keyspace for key lookup def key_cache @key_cache ||= Cassandra.new(CASSANDRA_KEYSPACE) end # Name the key column after class def key_column self.class.to_s+'Keys' end end
Simple time-series with roll-ups
Cloudkick implements time-series down at the second-level with roll-ups.
An implementation of some DBMS rules written in python using pycassa
We have created a DBMS layer that handles references to other columnfamilys (foreign keys), Automatic reverse linking. required fields in columnfamilys and datatypes (long and datetime). It wraps the get, get_range, insert, remove functions of pycassas columnfamilys. At this time it is limited to: on delete cascade and positive long numbers but this could change if there is enough interest. It suits our project.
ThomasBoose dbms implementation
Based on this article
ThomasBoose EERD model components to Cassandra Column family's