Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


4. Shard-2 is split into Shard-6 and Shard-7. ShardPartitionMapper unmaps Shard-2 and since there are no free ssp slots to map the resulting new shards, it will map one of the new shards to an ssp which is already mapped to another shard. The offset of the record read from any of these shards mapped to the same ssp should now contain a list of all the sequence numbers of the latest records for all the other shards that are inserted into BEM, along with the sequence number for the current shard record that is going to be inserted into BEM. Please take a look at Task1:ssp1 below where a record for shard-3 arrives first followed by a record for shard-6.

      Untitled presentation (3).png


5. When a job is restarted, the number of ssps in each task will become equal to the shard count and the mapping between ssps and shards willl be 1:1.

Untitled presentation (4).pngImage Added

6. Let’s take the scenario where we add a container to keep up with the Kinesis stream throughput. On Yarn, one needs to make yarn container count config change and restart the job with the new config. This would result in recalculating job model. On stand-alone, adding a container would result in JobCoordinator leader to redo the job model.

         Untitled presentation (5).pngImage Removed



  Kinesis System Consumer - presentation (1).pngImage Added

7. Shard-5 and Shard-6 are merged into Shard-8. ShardPartitionMapper unmaps Shard-5 and Shard-6 and maps the new shard to free ssp as below.

Kinesis System Consumer - presentation.pngImage RemovedKinesis System Consumer - presentation (2).pngImage Added


The following limitations apply for Samza jobs consuming from Kinesis streams using the proposed consumer: