Audience: All Cassandra Users and Developers
User Impact: Support for fast general purpose transactions
Whitepaper: Accord
GitHub: https://github.com/apache/cassandra-accord
Status
Current state: Accepted
Discussion thread: https://lists.apache.org/thread/xgj4sym0d3vox3dzg8xc8dnx4c8jb4d5 , https://lists.apache.org/thread/j402lzzf7m699zc2vk23vgfxz8wwtlyl
JIRA:
Jira | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|
Motivation
Users must expend significant effort to modify their database consistently while maintaining scalability. Even simple transactions involving more than one partition may become complex and error prone, as a distributed state machine must be built atop the database. Conversely, packing all of the state into one partition is not scalable.
Performance also remains an issue, despite recent Paxos improvements: latency is still twice its theoretical minimum over the wide area network, and suffers particularly badly under contention.
This work aims to improve Cassandra to support fast general purpose transactions. That is, those that may operate over any set of keys in the database atomically, modifying their contents at-once, with any action conditional on the existing contents of any key.
...
Execution
The union of all dependencies received during consensus is derived before t is disseminated via Commit and simultaneously a Read is issued by C to a member of each participating shard (preferably in the same DC), with those dependencies known to participate in that shard attached. This replica waits for all dependencies to be committed before filtering out those that are assigned a later t. The remaining dependencies are waited on until they execute and their result applied on this replica, before the read is evaluated and returned to the coordinator. C combines these responses to compute an update and client response, which is then disseminated by Apply to all replicas and returned to the client (respectively).
Code Block |
---|
Execution
Replica R receiving Commit(X, deps):
Committed[X] = true
Coordinator C:
send a read to one or more (preferably local) replicas of each shard
(containing those deps that apply on the shard)
Replica R receiving Read(X, t, deps):
Wait for deps to be committed
Wait for deps with a lower t to be applied locally
Reply with result of read
Coordinator C (with a response from each shard):
result = execute(read responses)
send Apply(result) to all replicas of each shard
send result to client
Replica R receiving Apply(X, t, deps, result):
Wait for deps to be committed
Wait for deps with a lower t to be applied locally
Apply result locally
Applied[X] = true |
...