Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

FeatureApache Storm 0.11.0-SNAPSHOT (and expected pull requests)JStorm-2.1.0/JStorm2.0.4NotesJIRA to port JStorm feature
SchedulerEven, Default, Isolation, Multi-tenant, RAS(work in progress)
  1. Evenly distribute a component's tasks across the nodes in the cluster.
  2. Balanace the number of tasks in a worker.
  3. Try to assign two tasks which are transferring messages directly into the same worker to reduce network cost.
  4. Support user-defined assignment and using the result of the last assignment. Different solution between Storm and JStorm.
The scheduler interface is pluggable, so we should be able to support both schedulers if needed.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1320
Nimbus HASupport for a pool of nimbus servers. Once Blobstore is merged in, "leader election" and "state storage" will be separate.Support for configuring more than one backup nimbus. When the master nimbus is down, the most appropriate spare nimbus (topologies on disk most closely match the records in ZooKeeper) will be chosen to be promoted.Need to evaluate the strengths and weaknesses of each and decide on updates to storm if any.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1321
Topology Structureworker 1->* executor 1->* taskworker 1-> taskNeed to evaluate if removing executors will add enough benefit to developers/performance that we can drop it from architecture.  Probably need resource aware re-balancing or Jstorm rebalancing that can support changing parallelism before this can happen.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1322
Topology MasterNone exactly. The Heartbeat Server (Currently under review) reduces load on ZK, but it is not really the same and may be a complement to TopologyMasterNew system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reporting the info to nimbus. Besides task heartbeat info, it also can be used to dispatch control messages within the topology. Topology master significantly reduces the amount of read/write to ZooKeeper. Before this change, ZooKeeper was the bottleneck for deploying big clusters and topologies.Need to evaluate how this impacts storm architecture especially around security.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1323
BackpressureBang-Bang controller on Disruptor queue capacity using ZooKeeper for broadcast.
  1. Implement backpressure using "topology master" (TM). TM is responsible for processing the trigger message and sending the flow control request to relevant spouts. "flow control" in JStorm doesn't complete stop the spout from emitting tuples, but instead just slows down the tuple sending.
  2. User can update the configuration of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low watermark, etc.
 
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1324
Monitoring of task execute threadPotential Pull Request, but none right now.Monitors the status of the execute thread of tasks. It is effective to find the slow bolt in a topology, and potentially uncovers deadlock as well.Yes we should do this
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1325
Message processing

Deserialization happens on the netty thread

Serialization happens after the send queue when batching is happening.

  1. Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously
  2. Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase
The two sound equivalent now, but we should talk to see if there are other optimizations needed.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1326
Tuple BatchingBatching in DisruptorQueueDo batch before sending tuple to transfer queue and support for adjusting the batch size dynamically according to samples of actual batch size sent out for past intervals.Should evaluate both implementations, and see which is better for performance, and possible if we can/should move some of the dynamic batching logic into disruptor.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1327
Grouping

Load aware balancing in shuffle grouping

  1. Has a "localfirst" grouping that causes tuples to be sent to the tasks in the same worker by default. But if the load of all local tasks is high, the tuples will be sent out to remote tasks.
  2. Improve localOrShuffle grouping from Storm. In Storm's localOrShuffle grouping the definition of "local" is local within the same worker process. i.e., if there is a bolt that the component can send to in the current worker process it will send the tuples there. If there is not one, it will do round robin between all of the instances of that bolt no matter which hosts they are on. JStorm has extended that so that other workers/JVMs on the same host are considered "local" as well, taking into account the load of the network connections on the local worker.
We should look at combining both of these to have shuffle look at both distance and load to decide where to send a tuple.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1328
web-UIYesDifferentDoes someone know a good UI designer that we can use?  I don't really like either of them (Bobby) but that is just me
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1339
metric systemIMetric and IMetricConsumer
  1. All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, are sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debugging and finding the hotspots of a topology.
  2. Support full metrics data. Previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.
  3. Use new metrics windows, the mininum metric window is 1 minute, thus we can see the metrics data every single minute.
  4. Supplies a metric uploader interface, third-party companies can easily build their own metric systems based on the historic metric data.
Ideally we should have a way to display most/system metrics in the UI.  IMetric is too generic to make this happen, but we cannot completely drop support for it.  But perhaps we need to depricate it if the JStorm metrics are much better.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1329
rebalance commandBasic functionality of rebalanceBesides rebalance, scale-out/in by updating the number of workers, ackers, spouts & bolts dynamically without stopping topology. Routing is updated dynamically within upstream components.dynamic routing with some groupings is difficult to get right when there is state, we need to be sure this is well documented, and might want to disallow it for some groupings without a force flag.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1330
list commandList information of topologiesList information of all topologies, all supervisors, and JStorm versionmore info is good, but we want it human readable too. perhaps with a machine readable option too
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1331
zktool commandN/ASupports some ZooKeeper operations, e.g. "list", "read"…Will need to be evaluated for security
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1332
metricsMonitor commandN/AAllows toggling on/off some metrics which may impact topology performanceSounds great
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1333
restart commandN/ARestart a topology. Besides restart, this command can also update the topology configuration.Sounds great
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1334
update_topology commandN/AUpdate jars and configuration dynamically for a topology, without stopping the topology.Sounds great, should work will with blob-storm
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1335
cgroupN/ASupports controlling the upper limit of CPU core usage for a worker using cgroupsSounds like a good start, will be nice to integrate it with RAS requests too.
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1336
logging
  1. Supports user-defined log configuration via Log4j 2
  2. Supports dynamic changes to logging of a running topology
  3. Supports log4j and slf4j log APIS
  1. Supports user-defined configuration of log
  2. Supports both logback and log4j
Need to evaluate differences and see what we want long term
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1337
Worker classloader (isolation)N/A - uses shading for most dependenciesThe "worker classloader" avoids problem of re-loading classesThis sounds great
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1338
Multi-thread of spoutNAThere are two modes of spout, "single-thread" and "multi-thread" in JStorm. The "single-thread" mode is simliar to Storm while the "multi-thread" mode separates the processing of ack/fail and nextTuple to two threads. It means we can stay in nextTuple for a long time without any side effect on ack/fail. This improves the response and throughput of spout
Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keySTORM-1358