Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table of Contents:

Table of Contents

Ignite Persitent Store

File types

There are following file types used for persisting data: Cache pages or page store, Checkpoint markers, and WAL segments

Image Added

  • Write Ahead Log (WAL) segments - constant size file (WAL work directory 0...9.wal;, WAL archive 0.wal…)
  • CP markers (UUID-Begin.bin, UUID-End.bin)
  • Page store (uses file per partition: cache-(cache_name)\part1,2,3.bin, and index.bin)

Folders structure

Ignite with enabled persistence uses following folder stucture

Image Added

Consistent ID may be configured using IgniteConfiguration or generated from local IPs set by default.

Page store

Partitions of each cache have corresponding file in page store directory (particular node may own not all partitions). 

 

Image Added

Special partition 65535 is used for SQL indexes and saved to index.bin

Persistence and Crash Recovery

...

Practically we can’t replay WAL from the beginning of times, Volume(HDD)<Volume(full WAL), and we need procedure to throw out oldest part of changes in WAL.

Consistent state comes only from pair of WAL and page store.

This procedure is named checkpointing

...

Implemented - Sharp Checkpoint; F.C. - todoto be done in future releases.

To achieve consistency Checkpoint read-write lock is used (see GridCacheDatabaseSharedManager#checkpointLock)

...

Dirty pages is set, when page from non-dirty becomes dirty, it is added to this set.

Image Added

Collection of pages (GridCacheDatabaseSharedManager.Checkpoint#cpPages) allows us to collect and then write pages which were changed since last checkpoint.

...

  • Percent of dirty pages is trigger for checkpointing (e.g. 75%).
  • Timeout is also trigger, do checkpoint every N seconds

WAL structure

...

Image Removed

...

WAL

...

WAL records for recovery

Crash recovery involves following records writtent in WAL, it may be of 2 main types

  1. Logical record
    1. Operation description - which operation we want to do. Contains operation type (put, remove) and (Key, Value, Version)  - DataRecord
    2. Transactional record - this record is marker of begin, prepare, commit, and rollback transactions - (TxRecord
    3. Checkpoint record - marker of begin checkpointing (CheckpointRecord)
    Image Added
  2. Physical records
    1. Full page snapshot - record is issued for first page update after successfull checkpointing. Record is logged when page state changes from 'clean' to 'dirty' state (PageSnapshot)
    2. Delta record - describes memory region change, page change. Subclass of PageDeltaRecord. Contains bytes changed in the page. e.g bytes 5-10 were changed to [...,]. Relatively small records for B+tree records
      Image Added

For particular cache entry update we log records in follwowing order:

...

Possible future optimisation - refer data modified from PageDeltaRecord to logical record. Will allow to not store byte updates twice. There is file WAL pointer, pointer to record from the beginning of time. This refreence may be used.

 

WAL structure

 

WAL file segments and rotation structure

 

Image Added

 

See also WAL history size section below


Local Recovery Process

Let’s assume node start process is running with existent files.

...

If transaction begin record has no corresponding end, tx change is not applied.

Summary, limitations and performance

Persistence files

Ignite with enabled persistence uses following folder stucture

Image Removed

Consistent ID may be configured using IgniteConfiguration or generated from local IPs set

There are following file types used for persisting data: Cache pages or page store, Checkpoint markers, and WAL segments

Image Removed

  • WAL segments - constant size file (WAL work directory 0...9.wal;, WAL archive 0.wal…)
  • CP markers (UUID-Begin.bin, UUID-End.bin)
  • Page store (uses file per partition: cache-(cache_name)\part1,2,3.bin, and index.bin)

Consistent state comes only from pair of WAL and page store.

Image Removed

performance 

Limitations

Because CP are consistent we can’t start next CP until previous is not completed.

...

There several levels of guarantees (WALMode)

 

 
Implementation
WarannyWarranties
DEFAULTfsync() on each commitAny crashes (OS and process crash)
LOG_ONY

write() on commit

Synchronisation is responsibility of OS

Kill process, but no OS fail
BACKGROUND

do nothing on commit

(records are accumulated in memory)

write() on timeout

kill -9 may cause loss of several latest updates

 

...