ID | IEP-102 |
Author | |
Sponsor | |
Created |
|
Status | IN PROGRESS |
Ignite needs an API to process an incoming stream of external data in a distributed fashion. The processing may or may not involve updating one or more Ignite tables.
Data streamer API accepts a stream of data on the client side, distributes data entries in some way across server nodes, where the processing takes place.
The following two use cases should be supported:
Uncaught exception in receiver aborts the streamer. The exception is sent back to the client and passed to the CompletableFuture returned from streamData API.
To handle non-critical errors while processing individual items, user code can return arbitrary data from the receiver (see p2-c above).
Receiver class should be deployed on the cluster prior to data streamer usage - see IEP-103: Code Deployment.
Java 9 Reactive Streams is a standard API for asynchronous stream processing (initially inspired by Akka Streams). Our API will accept Flow.Publisher from the user and pull the data from there as needed, at the speed of server-side ingestion and processing.
public interface StreamerTarget<T> { /** * Streams data into the table. * * @param publisher Producer. * @return Future that will be completed when the stream is finished. */ CompletableFuture<Void> streamData( Flow.Publisher<T> publisher, @Nullable DataStreamerOptions options); /** * Streams data into the cluster with a receiver. * * @param publisher Producer. * @param keyAccessor Key accessor. Required to determine target node from the entry key. * @param receiver Stream receiver. Will be invoked on the target node. * @param resultSubscriber Optional stream result subscriber. Will be invoked on the current client * for every non-null result returned by the receiver. * @return Future that will be completed when the stream is finished. * @param <S> Source item type. * @param <R> Result item type. */ <S, R> CompletableFuture<Void> streamData( Flow.Publisher<S> publisher, Function<S, T> keyAccessor, StreamReceiver<S, R> receiver, @Nullable Flow.Subscriber<R> resultSubscriber, @Nullable DataStreamerOptions options); }
Similar to Compute, server-side code (receiver) requires a pre-deployed Java class, which is addressed by a fully qualified name.
Asynchronous stream with consumer-controlled flow (pull-based) is represented in .NET by IAsyncEnumerable interface. Both producer and consumer are async (non-blocking).
public interface IStreamerTarget<in T> { Task StreamDataAsync(IAsyncEnumerable<T> stream, StreamerOptions? options = null); Task StreamDataAsync<TItem, TResult>( IAsyncEnumerable<TItem> stream, Func<TItem, T> keySelector, string receiverClassName, IStreamerResultListener<TResult>? resultListener, StreamerOptions? options = null); }
The client can be connected to a subset of nodes from the cluster (or even just 1 of many). Batches will include the target node id, and the receiving server node should reroute them to the corresponding node if necessary.
Some of the per-node batches may be less “popular” than others, filling up slowly or not filling at all, containing only a few entries. Use a configurable timeout to flush those batches.
Discussion Links
Key | Summary | T | Created | Updated | Due | Assignee | Reporter | Priority | Priority | Priority | Priority | P | Status | Resolution |
---|