Cassandra Data Model and Operations
This page was created by someone trying to understand Cassandra. Until it is reviewed & blessed by someone who really knows you read it at your own risk...
Wiki Markup |
---|
This page is an alternate attempt at capturing the Cassandra data model and its operations. The descriptions below show the original Thrift API (as of 0.3) as well as a simplified notation borrowed from the [Bright Yellow Cow blog entry|http://www.brightyellowcow.com/blog/Evaluating-the-API-of-Cassandra-BigTable-.html], i.e. using \[\] to mean 'list of' and ( , ) for tuple construction. |
Simple Column families
A column family has a name and an arbitrary number of columns, each column is a name, value, and timestamp tuple. Columns may be name sorted or time sorted, which affects range operations on them. In pseudo-notation:
No Format |
---|
family -> [(name, value, timestamp)] |
Since each (top-level) row has an arbitrary set of columns in each column family, we can really think of this as a two dimensional map:
No Format |
---|
family -> [(key1, key2, value, timestamp)] |
In the Thrift API all this is defined as:
No Format |
---|
struct column_t { 1: string columnName, 2: binary value, 3: i64 timestamp, } typedef map< string, list<column_t> > column_family_map |
insert
Insert a column.
No Format |
---|
insert(family, key1, key2, value, timestamp) |
I believe the block_for parameter is to wait for N replicas to ACK the write. From the Thrift API:
No Format |
---|
void insert(1:string tablename, 2:string key, 3:string columnFamily_column, 4:binary cellData, 5:i64 timestamp, 6:i32 block_for=0) throws (1: InvalidRequestException ire, 2: UnavailableException ue), |
remove
Remove a column
No Format |
---|
remove(family, key1, key2, timestamp) |
The timestamp specifies exactly which insertion is removed (the column could have been re-inserted "later"). From the Thrift API:
No Format |
---|
void remove(1:string tablename, 2:string key, 3:string columnFamily_column, 4:i64 timestamp, 5:i32 block_for=0) throws (1: InvalidRequestException ire, 2: UnavailableException ue), |
get_column
Retrieve a specific column for a key.
No Format |
---|
get_column(family, key1, key2) -> (key2, value, timestamp) |
From the Thrift API:
No Format |
---|
column_t get_column(1:string tablename, 2:string key, 3:string columnFamily_column) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
get_slice
Retrieve all columns for a key:
No Format |
---|
get_slice(family, key1) -> [(key2, value, timestamp)] |
plus start
/count
parameters allow pagination of the results. From the Thrift API:
No Format |
---|
list<column_t> get_slice(1:string tablename, 2:string key, 3:string columnFamily_column, 4:i32 start=-1, 5:i32 count=-1) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
get_slice_by_name_range
Retrieve a range of columns for a key:
No Format |
---|
get_slice(family, key1, key2_start, key2_end) -> [(key2, value, timestamp)] |
plus a count
parameter allows limiting the result. From the Thrift API:
No Format |
---|
list<column_t> get_slice_by_name_range(1:string tablename, 2:string key, 3:string columnFamily, 4:string start, 5:string end, 6:i32 count=-1) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
get_slice_by_names
Retrieve a specific set of columns for a key:
No Format |
---|
get_slice_by_names(family, key1, [key2_1, key2_2, ..., key2_N]) -> [(key2, value, timestamp)] |
From the Thrift API:
No Format |
---|
list<column_t> get_slice_by_names(1:string tablename, 2:string key, 3:string columnFamily, 4:list<string> columnNames) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
get_slice_from
Retrieve columns for a key starting from a specific column.
No Format |
---|
get_slice_from(family, key1, key2_start) -> [(key, value, timestamp)] |
plus an ascending/descending flag and a count determine the direction and limit of the enumeration. From the Thrift API:
No Format |
---|
list<column_t> get_slice_from(1:string tablename, 2:string key, 3:string columnFamily_column, 4:bool isAscending, 5:i32 count) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
get_columns_since
Retrieves columns for a key starting from a specific timestamp.
No Format |
---|
get_columns_since(family, key1, key2, timestamp) -> [(key, value, timestamp)] |
From the Thrift API:
No Format |
---|
list<column_t> get_columns_since(1:string tablename, 2:string key, 3:string columnFamily_column, 4:i64 timeStamp) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
get_column_count
Return the number of columns for a key.
No Format |
---|
get_column_count(family, key1, key2) -> count |
From the Thrift API:
No Format |
---|
i32 get_column_count(1:string tablename, 2:string key, 3:string columnFamily_column) throws (1: InvalidRequestException ire), |
batch_insert
Insert a batch of columns for a key.
No Format |
---|
batch_insert(family, key1, [(key2, value, timestamp)]) |
From the Thrift API:
No Format |
---|
struct batch_mutation_t { 1: string table, 2: string key, 3: column_family_map cfmap, } void batch_insert(1: batch_mutation_t batchMutation, 2:i32 block_for=0) throws (1: InvalidRequestException ire, 2: UnavailableException ue), |
Super Column
A super column family has a name and an arbitrary number of super columns, each super column has an arbitrary number of columns. "Currently" supercolumns are always name-sorted, and their subcolumns are always time-sorted. In pseudo-notation:
No Format |
---|
super_family -> [(super_column, [(column_name, value, timestamp)])] |
It is tempting but inaccurate to think of this as a three dimensional map:
No Format |
---|
super_family -> [(key1, key2, key3, value, timestamp)] |
What's more accurate is to continue thinking of this as a two-dimensional map, just like regular column families, but where the values are really sets of name-value pairs (plus timestamps to be accurate). So it's really like this:
No Format |
---|
Simple column families: column_family -> [(key1, key2, value, timestamp)] Super column families: column_family -> [(key1, key2, [(key3, value, timestamp)])] |
In the Thrift API all this is defined as:
No Format |
---|
struct superColumn_t { 1: string name, 2: list<column_t> columns, } typedef map< string, list<superColumn_t> > superColumn_family_map |
get_superColumn
Retrieves a super column from a column family for a key.
No Format |
---|
get_superColumn(super_family, key1, key2) -> (key2, [(key3, value, timestamp)]) |
From the Thrift API:
No Format |
---|
superColumn_t get_superColumn(1:string tablename, 2:string key, 3:string columnFamily) throws (1: InvalidRequestException ire, 2: NotFoundException nfe), |
Note that the 3rd argument should really be called columnFamily_superColumnName
get_slice_super
Retrieve the super columns in a super column family for a key.
No Format |
---|
get_slice_super(super_family, key1) -> [(key2, [(key3, value, timestamp)])] |
The start
/count
parameters allow pagination of the results. From the Thrift API:
No Format |
---|
list<superColumn_t> get_slice_super(1:string tablename, 2:string key, 3:string columnFamily_superColumnName, 4:i32 start=-1, 5:i32 count=-1) throws (1: InvalidRequestException ire), |
Note that the 3rd argument should really be called columnFamily
get_slice_super_by_names
Retrieve a set of super columns in a super column family.
No Format |
---|
get_slice_super_by_names(family, key1, [key2_1, key2_2, ..., key2_N]) -> [(key2, [(key3, value, timestamp)])] |
From the Thrift API:
No Format |
---|
list<superColumn_t> get_slice_super_by_names(1:string tablename, 2:string key, 3:string columnFamily, 4:list<string> superColumnNames) throws (1: InvalidRequestException ire), |
batch_insert_superColumn
Insert a super column.
No Format |
---|
batch_insert_superColumn(family, key1, key2, [(key3, value, timestamp)]) |
From the Thrift API:
No Format |
---|
struct batch_mutation_super_t { 1: string table, 2: string key, 3: superColumn_family_map cfmap, } void batch_insert_superColumn(1:batch_mutation_super_t batchMutationSuper, 2:i32 block_for=0) throws (1: InvalidRequestException ire, 2: UnavailableException ue), |
Other operations
get_key_range
Retrieve the list of keys that exist in a range. A key exists if at least on column in one column family exists for the key. A list of column families can be passed into the call to reduce the search to columns in those families.
No Format |
---|
get_key_range(family, key1_start, key1_end, [key2_1, key2_2, ..., key2_N]) -> [key1_1, key1_2, ..., key1_M] |
From the Thrift API:
No Format |
---|
# range query: returns matching keys list<string> get_key_range(1:string tablename, 2:list<string> columnFamilies=[], 3:string startWith="", 4:string stopAt="", 5:i32 maxResults=1000) throws (1: InvalidRequestException ire), |
touch
Intended to force index information for the key into cache, but is buggy and to be deprecated.
No Format |
---|
touch(key1) |
From the Thrift API:
No Format |
---|
oneway void touch(1:string key, 2:bool fData), |