IDIEP-68
Author
Sponsor
Created

  

Status

DRAFT


Motivation

Thin clients need an efficient way to stream large amounts of data into the cluster.

Description

Add DataStreamer operations to the Thin Client protocol: OP_DATA_STREAMER_START, OP_DATA_STREAMER_ADD_DATA

There are multiple options for the client-side implementation with this approach, from simple to more efficient:

  • Stateless -- all data goes though a single server node, only OP_DATA_STREAMER_START is used to write the batch and close the streamer until next batch is ready
  • Stateful -- all data goes though a single server node, and streamer is kept open
  • Partition-aware stateless – data is grouped by node and batches are sent to the primary, new streamer is used for every batch
  • Partition-aware stateful – data is grouped by node and batches are sent to the primary, streamer per node is kept open


OP_RESOURCE_CLOSE can be used to close the streamer, as well as Close flag, depending on the use case:

  • Cancel and close - use OP_RESOURCE_CLOSE
  • Flush and close - use OP_DATA_STREAMER_ADD_DATA with Close flag (to avoid an extra OP_RESOURCE_CLOSE call)

OP_DATA_STREAMER_START = 8000

Initial operation combines streamer options and the first batch of entries.

Request
intcacheId
byteflags (allowOverwrite, skipStore, keepBinary, flush, close)
intperNodeBufferSize, -1 for server default
intperThreadBufferSize, -1 for server default
BinaryObjectStream receiver
bytereceiverPlatform, when receiver is not null (1 = Java, 2 = .NET, 3 = C++) 
intentryCount
n*(Object, Object)entries (add when value is not null, remove otherwise)


Response
longresourceId (0 when close flag is set)

Details

  • Close flag can be true when there is only a single batch, so an additional close request is not necessary
  • Flush flag should be true when client-side user code calls Flush method, and false otherwise

OP_DATA_STREAMER_ADD_DATA = 8001

Add data to the existing streamer by a resource id, optionally flush and/or close the streamer.


Request
longresourceId
byteflags (flush, close)
intentryCount
n*(Object, Object)entries (add when value is not null, remove otherwise)


Response
longresourceId (0 when close flag is set)

Details

  • Close flag can be true for the last batch, so an additional close request is not necessary
  • Flush flag should be true when client-side user code calls Flush method, and false otherwise

Risks and Assumptions

  • Unlike existing thick streamer API, we are not going to allow changing options (allowOverwrite, etc) after the start. This behavior seems confusing. Every client-side implementation can decide on the API, but it makes sense to remove setters from the DataStreamer interface and move all the options to a separate type, like DataStreamerOptions, and pass this once to igniteClient.dataStreamer(cacheName, options).
  • Buffer sizes can be matching or different on client and server sides.
    • Example 1: per-node buffer size is the same on partition-aware client and server. When client flushes the buffer, it gets flushed on the server right away.
    • Example 2: client-side buffer is small due to resource constraints, server-side buffer is bigger for better batching and performance.
  • Client API can expose both server-side and client-side buffer sizes as configuration parameters, or choose to hide them for simplicity

Discussion Links

http://apache-ignite-developers.2346864.n4.nabble.com/IEP-68-Thin-Client-Data-Streamer-td51622.html

Reference Links

PoC: https://github.com/apache/ignite/pull/8847

Tickets


Key Summary T Created Updated Due Assignee Reporter P Status Resolution
Loading...
Refresh

  • No labels