Current state: Accepted
Discussion thread: here
Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).
Today a Kafka Streams application will implicitly create state. This state is used for storing intermediate data such as aggregation results. The state is also used to store KTable’s data when they are materialized. The problem this document addresses is that this state is hidden from application developers and they cannot access it directly. The DSL allows users to make a copy of the data (using the through operator) but this leads to a doubling in the amount of state that is kept. In addition, this leads to extra IOs to external databases/key value stores that could potentially slow down the entire pipeline.
Here is a simple example that illustrates the problem:
1 KTable<String, Long> wordCounts = textLine
2 .flatMapValues(value ->Arrays.asList(value.toLowerCase().split("\\W+")))
3 .map((key, word) -> new KeyValue<>(word, word)
5 wordCounts.to(Serdes.String(), Serdes.Long(), "streams-wordcount-output");
In line 4, the aggregation already maintains state in a store called StoreName, however that store cannot be directly queried by the developer. Instead, the developer makes a copy of the data in that store into a topic called streams-wordcount-output. Subsequently, the developer might instantiate its own database after reading the data from that topic (this step is not shown above). This is shown in illustration (a):
We propose to make the Kafka Streams internal state stores visible to external queries. The internal state stores include all processor node stores used implicitly (through DSL) or explicitly (through low level Processor API). Benefits of this approach include:
Avoid duplicating data
Avoid extra IOs
Fewer moving pieces in the end-to-end architecture by avoiding unneeded state stores
Ability to colocate data and processing (e.g., in situations where many rows are scanned per operation).
The proposal can be thought as having two distinct steps as described below.
Step 1 in proposal: expose state store names to DSL and local queries
The stream store namespace is local to a KStreams instance, i.e., it is part of the same process that the KStreams instance is in. Conceptually the code to access such a store would look like this:
KafkaStreams streams = new KafkaStreams(..);
ReadOnlyKeyValueStore store = streams.store("storeName", QueryableStoreTypes.keyValueStore());
The state store is discovered by querying the KafkaStreams instance. The query operations will be read-only, i.e., no updates. The query method is calling methods of the StateStore object.
Step 2 in proposal: global discovery of state stores
The second step adds to the first by having the namespace be global. The figure below shows two KStream instances on potentially different servers and two state stores each backed up by their own Kafka topic. An additional proposed component in the diagram is a logical discovery component. Conceptually this component provides a lookup API that keeps track of the mapping between a state store name and the KafkaStreams instance that owns that state store.
Hence, the discovery API is part of the KafkaStreams instance. The API will provide four methods:
- Collection<StreamsMetadata> KafkaStreams.allMetadata()) where StreamsMetadata has fields such as list of assigned partitions, list of state store names and HostInfo that includes hostname / port, etc. The port is the listening port of a user-defined listener that users provide to listen for queries (e.g., using REST APIs). More on the user-defined agent below.
Collection<StreamsMetadata> KafkaStreams.allMetadataForStore(String /* storeName */) would return only the StreamsMetadata that include the given store name.
StreamsMetadata KafkaStreams.metadataWithKey(String storeName, K key, Serializer<K> serializer) would return the StreamsMetadata that has the store that might have the key (the key might not exist in any store). The default StreamPartitioner will be used for key partitioning.
StreamsMetadata KafkaStreams.metadataForKey(String storeName, K key, StreamPartitioner<K, ?> partitioner) same as above but will use the provided StreamPartitioner
An additional configuration parameter, StreamsConfig.APPLICATION_SERVER_CONFIG, will be added. This is a host:port pair supplied by the streams developer and should map to a Server running in the same instance of the KafkaStreams application. The supplied host:port pair will form part of the StreamsMetadata returned from the the above mentioned API calls.
The individual KafkaStreams instances will know about all mappings between state store names, keys and KafkaStream instances. We propose that they keep track of this information by piggybacking on the consumer group membership protocol. In particular, we propose to add the above mapping Map<HostInfo, Set<TopicPartition>> to StreamPartitionAssignor so that each consumer knows about all the other tasks metadata and host states in the system.
Bootstrapping. To bootstrap the discovery, a user can simply query the discovery instance of any one of the KafkaStream instances she operates, i.e., bootstrap happens within a single KafkaStream instance.
User-defined listener. This design option simply provides a discovery API and the ability to find a state store. Once a state store is found, then the store is queried using the APIs in step 1. However, this KIP stops short of providing a listener on each KafkaStream instance that actually listens for remote queries (e.g., through REST). It is assumed that the user will write such a listener themselves. So the scope for Apache Kafka is the discovery API (through existing consumer membership protocol) and the queries as specified in the step 1.
The focus is on querying state stores, not updating them. It is not clear what it would mean to update a state store from outside the stream processing framework. Such updates are likely to introduce undefined behavior to the framework. This KIP does not support updates.
The operations of the state store will be the query-only operations of whatever state store the user has plugged in. Some examples (full API in a subsequent section):
If the plugged in state store is a RocksDB key-value state store: V get(K key);
If the plugged in state store is a RocksDB-backed window store: WindowStoreIterator<V> fetch(K key, long timeFrom, long timeTo);
Tasks, and thus StateStores, can be re-assigned to threads anytime. This could happen after a failure of a server on which KafkaStreams is running, or for load balancing. During any re-assignment of state stores, they will not be available for querying until they are fully restored and ready to be used by the KafkaStreams instance.
The implication is that the user’s code will not have a direct reference to the underlying StateStores, but rather an Object that knows how to locate and query them. Whenever operations are made against this object it must first check if the underlying StateStore is valid. If the StateStore is not valid the operation needs to return an exception InvalidStateStore. On catching this exception the user can try again. If the rebalance has finished and the StateStores are open, subsequent operations will be successful.
State store names
Today, some state store names are hidden from the developer and are internal to Kafka Streams. This makes it difficult for a developer to query them. Hence, as part of this proposal we add the need to expose all relevant state store names to the user. This means three things:
Explicitly require names for all operations that create a state store, like aggregate.
Explicitly require names for all created KTables (thus their changelog state stores)
Materialize all KTables (today some KTables are materialized and some are not)
We propose adding an additional method to the KafkaStreams public API:
The QueryableStoreType interface, below, can be used to ‘plug-in’ different StateStore implementations to Queryable State. Developers using the Processor API and supplying custom StateStore’s can get access to their StateStores with the same API method as above. Implementations of this interface for the StateStores that are part of the KafkaStreams library will be provided by this KIP.
A class that provides implementations of the QueryableStoreTypes that are part of KafkaStreams,i.e.,
Two new interfaces to restrict StateStore access to Read Only (note this only applies to implementations that are part of Kafka Streams)
We can then use the above API to get access to the stores like so:
Exposed APIs from Kafka Streams:
Below is an example of how a developer might use the Discover API
Developer Guide for Custom Stores
If you want to make a customized store Queryable you'll need to implement the QueryableStoreType interface. Below are example implementations for KeyValueStore and WindowStore. You may also want to provide an interface to restrict operations to read-only and a Composite type for providing a faćade over the potentially many instances of the underlying store (see example CompositeReadOnlyKeyValueStore below).
Proposed implementation outline
The implementation has three logical parts. First, expose all state store names to the DSL. Note that we don’t need to do this for the processor API since the state stores are already explicit there. Second, implement the interfaces that query a state store.
For the first part, an implementation outline follows:
Expose stream names to the DSL: see KAFKA-3870 PR.
Always enforce KTable materialization upon creation (i.e. even for "builder.table(..).to()" we also materialize it into a state store; this will indeed incur unnecessary overhead but should be very rare, and it helps for consistency of materialization).
We must also document the valid characters for a state store name. Since the name is directly used to name underlying changelog topics, for example, only such characters are currently valid that are also valid characters to name a Kafka topic.
For the second part, an implementation outline follows:
Implement read-only interface and QueryableStoreType for key value state store
Implement read-only interface and QueryableStoreType for windowed state store
Ensure store operations are thread-safe. I.e., a single state store could have concurrent access, e.g. one stream worker thread updating the store, and one user thread querying the store.
Ensure failure and rebalancing is handled correctly. I.e., because the "read-only" state store interface could actually wrap multiple state stores underneath, since one instance could contain multiple tasks and hence multiple state stores with the same names. Rebalance could happen migrating state stores in / out while the user is interacting with the interface.
See Expose Local Stores for how this could be done
For the third part:
Update group membership data to include discovery endpoints
Compatibility, Deprecation, and Migration Plan
- The query and discovery APIs will not affect existing users.
- However, exposing state store names to the API will affect existing users, since the current interfaces will change. The new interfaces will not be backwards compatible.
- As we now need to handle concurrent access to the State Stores, this may incur some overhead on streams applications while querying is on-going. We will measure the overhead in benchmarks.
Querying directly from Kafka topic. We considered allowing users to query directly from a Kafka topic, instead of a state store. The namespace would be global in this case, but instead of worrying about the StateStore namespace, we would be interested in the topic names instead (each state store is often backed into a Kafka topic). A user would create a RocksDb StateStore and instruct it to cache data from the desired topic. From then on, queries are performed on that local state store using the query method of direct calls, just like the above KIP proposal. Discovery with this option is implicitly done by the consumer that reads from the topic.
One potential drawback of this method is lack of consistency. In particular, the latest records for a topic might be residing on a remote StateStore (that acts like a cache for that topic). Hence, the user might not see the latest value for their data. Another limitation is that it might not be possible to run this on a single machine. A large keyspace could result in too much data to fit in memory and/or on local disk. It might be hard to provision the downstream machine correctly. This would then require another ‘cluster’ to host the Rocks DB caches. Furthermore, if you have N applications that need to query the state, you need to duplicate the state N times. Another limitation is that state stores that do not persist to Kafka (e.g., an in-memory state store) cannot be queried using this design.