When Geode was first developed as a technology storage was fairly straightforward--Disk was the persistent store. Today, however, we see an increasingly diverse storage environment. Distributed stores, such as HDFS, HBase, Cassandra, and others are increasingly used to persist data for analysis and recovery. New persistent memory technologies are adding a new type of non-volatile storage to the mix. Having the core Geode team address every possible use case or store type that may be desired does not scale, nor should it be a gating function on the user community for enabling a new persistence layer. Instead we propose that by developing a properly defined pluggable interface and specification for pluggable persistent stores the Geode persistence layer becomes easily extensible and allow user and the community to add and contribute pluggable use case specific stores. Whether the same applies to over-flow stores is an open question for the community, as the requirements here are different, so we leave that as an open question.
Short list of initial requirements
- Disk store remains as is; all other persistent stores become pluggable
- Overflow is not changed- uses disk store always (open question on this requirement; should overflow be included?)
- A cache store manager layer is created to manage persistence stores
- Cache stores implement a well defined interface for read/write and store information/statistics exposure
- Cache store interface includes methods for getting details on cache store and for statistics
- Categories for store types will be defined, for example node-local or distributed stores to allow Geode logic to correctly handle persistence (e.g., avoid copy multiplication in the case of multi-coopy distributed stores)
- Cache store interface supports standard configuration method interface for a given store type type
- Cache stores are added in configuration via classpath and loaded and validated by the cache storage manager
- Storage manager expose JMX interfaces such that gfsh and dashboards can: list storage types available, list storage configuration parameters for a give store, list details and statistics for a given store.
Overflow has different potential requirements hence it is an open question as to whether is should be included in the pluggable interface. While one can see storage class memory (SCM) used for overflow this type of memory is likely to be exposed as a file system interface similar to any other disk. It is difficult to imagine other types of storage being used for overflow due to performance and latency requirements. The main use case seems to be for those using proprietary SCM interfaces to provide very fast low latency overflow. The open questions are (1) should overflow be included? (2) what is the incremental increase to the difficulty of implementation of pluggable stores by including overflow? (3) are the requirements similar enough to combine persistence and overflow, or should overflow, if done, be a separate proposal?
Support for batch operations should be provided, and the cache store interface should have a method to check if the operations are supported natively (via the store's characteristics). A cache store may support batch operations for writes by collecting operations and using a timed and amount based approach to batch then transparently to the underlying store for performance and/or by supporting putAll(), getAll(), and deleteAll() operations.
Basic putAll(), getAll(), and deleteAll() functionality should be supported, and the store's characteristics should indicate of these are supported natively by the underlying store.
When multiple store types are supported it is important that applications be able walk the list of available options and select the store that most closely matches its needs. Cache stores should support:
Proposed required characteristics:
- redundancy type (none, local-redundant, distributed-redundant)
- effective redundancy (if type is not none)
- available capacity
- batch operations supported natively (boolean)
Proposed optional characteristics:
- redundancy detail type (none, multi-disk, multi-machine, multi-rack, multi-datacenter)
Proposed measured characteristics:
- average latency
- average throughput
- expected reliability
Measured characteristics should be maintained by the storage manager.
Scan and Query support
It is possible to eventually support a scan of data in persistent store for all items matching a given filter. Query support would extend this to OQL.
Expiration and Eviction
It has been proposed in GEODE-1209 that expiration and eviction operations be propagated to underlying cache stores. In the case of an eviction or expiration that is intended only to conserve memory space by removing least used keys we do not see a reason to support propagation of these events to the underlying cache store. In the case where data has a time to live that applies globally (both in memory and in persistent storage), and that TTL expires, it is desirable to support a method for the expiration event to be propagated to the cache store.
Technical proposal is added at
Open Questions on requirements & scope
- Should multiple cache stores per region be a goal? If yes, the following question naturally come up:
- Should write-only cache stores be supported (use case- write data for batch analytics)?
- Only one recovery store can be used when Geode needs to initialize a region- should this be decided by a priority list (order of trying read/write stores to recover)?
- Data filters- should data filters be supported for write?
- How exactly is transition handled for distributed cache store case; possibility- maintain a last N entries queue and seed the async write queue with this during transition. Corner cases here a concern- loss of events is primary concern.
- Can cache stores be added during region operation or only at start/restart?
- Scan with filter support in initial implementation?
- OQL support in initial cut?
- Adding new cache stores types/purposes may require behavior on cache store unavailable (write fail or get fail) to be configurable, should such configuration be supported?
- Should data event change register be supported for cache stores that support generating such notifications?
- Should pluggable cache stores be usable for overflow or limited to data persistence and export (if write-only supported)?