Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We define a new concept in the controller called a record metadata transaction. A transaction consists of a BeginMarker, a number of records, and either an EndMarker or AbortMarker. These records can be committed atomically to the Raft layer in multiple batches. By allowing a set of records to span across several record batches, we overcome the current batch size limitation and further decouple the controller from Raft layer implementation details.

...

  • Only a single transaction may exist at a time
  • A transaction cannot be interleaved with records outside the transaction
  • All scheduled writes to Raft for a transaction must be atomic, including the two marker records

...

By implementing transactions in the "application layer" (as opposed to the storage layer, i.e., Raft), we can be efficient about buffering handling the incomplete records. For example, instead of actually buffering the incomplete records in-memory, we can apply them to the controller's timeline data structures optimistically. If the transaction is aborted, we can easily reset the state of these data structures. 

...

The most interesting aspect of this design to test at a system level is controller failover. We can use a system test to continuously create a topics with a large number of partitions during a failover event. This will allow us to observe and verify the recovery behavior on the new controller.

We will need good unit test coverage within the QuorumController since this design will be the first time we are rolling back committed records. Normally, the only time the controller resets its state is when we are handling a snapshot.

Rejected Alternatives

Raft Transactions

...