ID | IEP-68 |
Author | |
Sponsor | |
Created | |
Status | |
Motivation
Thin clients need an efficient way to stream large amounts of data into the cluster.
Description
Add DataStreamer operations to the Thin Client protocol: OP_DATA_STREAMER_START, OP_DATA_STREAMER_ADD_DATA
There are multiple options for the client-side implementation with this approach, from simple to more efficient:
- Stateless -- all data goes though a single server node, only OP_DATA_STREAMER_START is used to write the batch and close the streamer until next batch is ready
- Stateful -- all data goes though a single server node, and streamer is kept open
- Partition-aware stateless – data is grouped by node and batches are sent to the primary, new streamer is used for every batch
- Partition-aware stateful – data is grouped by node and batches are sent to the primary, streamer per node is kept open
OP_RESOURCE_CLOSE can be used to close the streamer, as well as Close flag, depending on the use case:
- Cancel and close - use OP_RESOURCE_CLOSE
- Flush and close - use OP_DATA_STREAMER_ADD_DATA with Close flag (to avoid an extra OP_RESOURCE_CLOSE call)
OP_DATA_STREAMER_START = 8000
Initial operation combines streamer options and the first batch of entries.
Request |
---|
int | cacheId |
byte | flags (allowOverwrite, skipStore, keepBinary, flush, close) |
int | perNodeBufferSize, -1 for server default |
int | perThreadBufferSize, -1 for server default |
BinaryObject | Stream receiver |
byte | receiverPlatform, when receiver is not null (1 = Java, 2 = .NET, 3 = C++) |
int | entryCount |
n*(Object, Object) | entries (add when value is not null, remove otherwise) |
Response |
---|
long | resourceId (0 when close flag is set) |
Details
- Close flag can be true when there is only a single batch, so an additional close request is not necessary
- Flush flag should be true when client-side user code calls Flush method, and false otherwise
OP_DATA_STREAMER_ADD_DATA = 8001
Add data to the existing streamer by a resource id, optionally flush and/or close the streamer.
Request |
---|
long | resourceId |
byte | flags (flush, close) |
int | entryCount |
n*(Object, Object) | entries (add when value is not null, remove otherwise) |
Response |
---|
long | resourceId (0 when close flag is set) |
Details
- Close flag can be true for the last batch, so an additional close request is not necessary
- Flush flag should be true when client-side user code calls Flush method, and false otherwise
Risks and Assumptions
- Unlike existing thick streamer API, we are not going to allow changing options (allowOverwrite, etc) after the start. This behavior seems confusing. Every client-side implementation can decide on the API, but it makes sense to remove setters from the DataStreamer interface and move all the options to a separate type, like DataStreamerOptions, and pass this once to igniteClient.dataStreamer(cacheName, options).
- Buffer sizes can be matching or different on client and server sides.
- Example 1: per-node buffer size is the same on partition-aware client and server. When client flushes the buffer, it gets flushed on the server right away.
- Example 2: client-side buffer is small due to resource constraints, server-side buffer is bigger for better batching and performance.
- Client API can expose both server-side and client-side buffer sizes as configuration parameters, or choose to hide them for simplicity
Discussion Links
http://apache-ignite-developers.2346864.n4.nabble.com/IEP-68-Thin-Client-Data-Streamer-td51622.html
Reference Links
PoC: https://github.com/apache/ignite/pull/8847
Tickets
Key
|
Summary
|
T
|
Created
|
Updated
|
Due
|
Assignee
|
Reporter
|
P
|
Status
|
Resolution
|