Feature | Apache Storm 0.11.0-SNAPSHOT (and expected pull requests) | JStorm-2.1.0/JStorm2.0.4 | Notes | JIRA to port JStorm feature |
---|
Scheduler | Even, Default, Isolation, Multi-tenant, RAS(work in progress) | - Evenly distribute a component's tasks across the nodes in the cluster.
- Balanace the number of tasks in a worker.
- Try to assign two tasks which are transferring messages directly into the same worker to reduce network cost.
- Support user-defined assignment and using the result of the last assignment. Different solution between Storm and JStorm.
| The scheduler interface is pluggable, so we should be able to support both schedulers if needed. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1320 |
---|
|
|
Nimbus HA | Support for a pool of nimbus servers. Once Blobstore is merged in, "leader election" and "state storage" will be separate. | Support for configuring more than one backup nimbus. When the master nimbus is down, the most appropriate spare nimbus (topologies on disk most closely match the records in ZooKeeper) will be chosen to be promoted. | Need to evaluate the strengths and weaknesses of each and decide on updates to storm if any. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1321 |
---|
|
|
Topology Structure | worker 1->* executor 1->* task | worker 1-> task | Need to evaluate if removing executors will add enough benefit to developers/performance that we can drop it from architecture. Probably need resource aware re-balancing or Jstorm rebalancing that can support changing parallelism before this can happen. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1322 |
---|
|
|
Topology Master | None exactly. The Heartbeat Server (Currently under review) reduces load on ZK, but it is not really the same and may be a complement to TopologyMaster | New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reporting the info to nimbus. Besides task heartbeat info, it also can be used to dispatch control messages within the topology. Topology master significantly reduces the amount of read/write to ZooKeeper. Before this change, ZooKeeper was the bottleneck for deploying big clusters and topologies. | Need to evaluate how this impacts storm architecture especially around security. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1323 |
---|
|
|
Backpressure | Bang-Bang controller on Disruptor queue capacity using ZooKeeper for broadcast. | - Implement backpressure using "topology master" (TM). TM is responsible for processing the trigger message and sending the flow control request to relevant spouts. "flow control" in JStorm doesn't complete stop the spout from emitting tuples, but instead just slows down the tuple sending.
- User can update the configuration of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low watermark, etc.
| | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1324 |
---|
|
|
Monitoring of task execute thread | Potential Pull Request, but none right now. | Monitors the status of the execute thread of tasks. It is effective to find the slow bolt in a topology, and potentially uncovers deadlock as well. | Yes we should do this | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1325 |
---|
|
|
Message processing | Deserialization happens on the netty thread Serialization happens after the send queue when batching is happening. | - Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously
- Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase
| The two sound equivalent now, but we should talk to see if there are other optimizations needed. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1326 |
---|
|
|
Tuple Batching | Batching in DisruptorQueue | Do batch before sending tuple to transfer queue and support for adjusting the batch size dynamically according to samples of actual batch size sent out for past intervals. | Should evaluate both implementations, and see which is better for performance, and possible if we can/should move some of the dynamic batching logic into disruptor. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1327 |
---|
|
|
Grouping | Load aware balancing in shuffle grouping | - Has a "localfirst" grouping that causes tuples to be sent to the tasks in the same worker by default. But if the load of all local tasks is high, the tuples will be sent out to remote tasks.
- Improve localOrShuffle grouping from Storm. In Storm's localOrShuffle grouping the definition of "local" is local within the same worker process. i.e., if there is a bolt that the component can send to in the current worker process it will send the tuples there. If there is not one, it will do round robin between all of the instances of that bolt no matter which hosts they are on. JStorm has extended that so that other workers/JVMs on the same host are considered "local" as well, taking into account the load of the network connections on the local worker.
| We should look at combining both of these to have shuffle look at both distance and load to decide where to send a tuple. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1328 |
---|
|
|
web-UI | Yes | Different | Does someone know a good UI designer that we can use? I don't really like either of them (Bobby) but that is just me | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1339 |
---|
|
|
metric system | IMetric and IMetricConsumer | - All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, are sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debugging and finding the hotspots of a topology.
- Support full metrics data. Previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.
- Use new metrics windows, the mininum metric window is 1 minute, thus we can see the metrics data every single minute.
- Supplies a metric uploader interface, third-party companies can easily build their own metric systems based on the historic metric data.
| Ideally we should have a way to display most/system metrics in the UI. IMetric is too generic to make this happen, but we cannot completely drop support for it. But perhaps we need to depricate it if the JStorm metrics are much better. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1329 |
---|
|
|
rebalance command | Basic functionality of rebalance | Besides rebalance, scale-out/in by updating the number of workers, ackers, spouts & bolts dynamically without stopping topology. Routing is updated dynamically within upstream components. | dynamic routing with some groupings is difficult to get right when there is state, we need to be sure this is well documented, and might want to disallow it for some groupings without a force flag. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1330 |
---|
|
|
list command | List information of topologies | List information of all topologies, all supervisors, and JStorm version | more info is good, but we want it human readable too. perhaps with a machine readable option too | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1331 |
---|
|
|
zktool command | N/A | Supports some ZooKeeper operations, e.g. "list", "read"… | Will need to be evaluated for security | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1332 |
---|
|
|
metricsMonitor command | N/A | Allows toggling on/off some metrics which may impact topology performance | Sounds great | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1333 |
---|
|
|
restart command | N/A | Restart a topology. Besides restart, this command can also update the topology configuration. | Sounds great | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1334 |
---|
|
|
update_topology command | N/A | Update jars and configuration dynamically for a topology, without stopping the topology. | Sounds great, should work will with blob-storm | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1335 |
---|
|
|
cgroup | N/A | Supports controlling the upper limit of CPU core usage for a worker using cgroups | Sounds like a good start, will be nice to integrate it with RAS requests too. | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1336 |
---|
|
|
logging | - Supports user-defined log configuration via Log4j 2
- Supports dynamic changes to logging of a running topology
- Supports log4j and slf4j log APIS
| - Supports user-defined configuration of log
- Supports both logback and log4j
| Need to evaluate differences and see what we want long term | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1337 |
---|
|
|
Worker classloader (isolation) | N/A - uses shading for most dependencies | The "worker classloader" avoids problem of re-loading classes | This sounds great | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1338 |
---|
|
|
Multi-thread of spout | NA | There are two modes of spout, "single-thread" and "multi-thread" in JStorm. The "single-thread" mode is simliar to Storm while the "multi-thread" mode separates the processing of ack/fail and nextTuple to two threads. It means we can stay in nextTuple for a long time without any side effect on ack/fail. | This improves the response and throughput of spout | Jira |
---|
server | ASF JIRA |
---|
serverId | 5aa69414-a9e9-3523-82ec-879b028fb15b |
---|
key | STORM-1358 |
---|
|
|