Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

write-waiting-time-(avg|total)

As explained for bytes-flushed-(rate|total) and flush-time-(avg|min|max), when the memtable is almost full, data in the memtable is flushed to disk by a background process. During flush and compaction above, from time to time RocksDB flushes data from the memtable to disk and reorganises data on the disk with compactions. During flushes and compactions a write to the database might need to wait until these processes finish. These metrics measure the average and total waiting time of a write process until flush and compaction finish.

If flush and compaction happen too often this time may increase and signal a bottleneck. Users can then take action by, e.g., increasing the size of the memtable to decrease the rate of flushes or changing the compaction settings.This 

num-open-files and num-file-errors-total

Part of the data in RocksDB is kept in files. This files need to be opened and closed. Metric num-open-files measures the number of currently open files and metric num-file-errors-total measures the number of file errors. Both metrics may help to find issues connected to OS and file systems.  


Compatibility, Deprecation, and Migration Plan

  • What impact (if any) will there be on existing users?
  • If we are changing behavior how will we phase out the older behavior?
  • If we need special migration tools, describe them here.
  • When will we remove the existing behavior?

Rejected Alternatives

...

Since metrics are added and no other metrics are modified, this KIP should not

  • affect backward-compatibility
  • deprecate public interfaces
  • need a migration plan other than adding the new metrics to its own monitoring component

Rejected Alternatives

  • Metrics bytes-read-compaction-total and bytes-written-compaction-total did not seem useful to me since they would measure bytes moved between memory and disk due to compaction. The metric bytes-flushed-total gives at least a feeling about the size of the persisted data in the RocksDB instance.
  • The number of timed-out writes would