Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Partitions of each cache have corresponding file in page store directory (particular node may own not all partitions). 

 

P1,2,N - Cache partition pages.

Special partition 65535 is used for SQL indexes and saved to index.bin

Each cache has corresponding folder in page store, each owning (or backup) partition has its related file.

Persistence and Crash Recovery

...

  • Percent of dirty pages is trigger for checkpointing (e.g. 75%).
  • Timeout is also trigger, do checkpoint every N seconds

 

WAL

 

We can’t control moment when node crashes. 

 

Let's suppose we have saved tree leafs, but didn’t save tree root (during pages allocation they may be reordered because allocation is multithread). In this case all updates will be lost.

In the same time we can’t translate each memory page update to disk each time - it is too slow. 

Technique to solve this named write ahead loggingBefore doing actual update, we append planned change information into cyclic file named WAL log (operation name - WAL append/WAL log).

 

After crash we can read and replay WAL using already saved page set. We can restore to state, which was last committed state of crashed process. Restore is based on pages store + WAL.

 

Practically we can’t replay WAL from the beginning of times, Volume(HDD)<Volume(full WAL), and we need procedure to throw out oldest part of changes in WAL, and this is done during checkpointing. 

Consistent state comes only from pair of WAL and page store.

...