Page History

...

Feature

Apache Storm 0.11.0-SNAPSHOT (and expected pull requests)

JStorm-2.1.0/JStorm2.0.4

Notes

JIRA to port JStorm feature

Scheduler

Even, Default, Isolation, Multi-tenant, RAS(work in progress)

Evenly distribute a component's tasks across the nodes in the cluster.
Balanace the number of tasks in a worker.
Try to assign two tasks which are transferring messages directly into the same worker to reduce network cost.
Support user-defined assignment and using the result of the last assignment. Different solution between Storm and JStorm.

The scheduler interface is pluggable, so we should be able to support both schedulers if needed.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1320

Nimbus HA

Support for a pool of nimbus servers. Once Blobstore is merged in, "leader election" and "state storage" will be separate.

Support for configuring more than one backup nimbus. When the master nimbus is down, the most appropriate spare nimbus (topologies on disk most closely match the records in ZooKeeper) will be chosen to be promoted.

Need to evaluate the strengths and weaknesses of each and decide on updates to storm if any.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1321

Topology Structure

worker 1->* executor 1->* task

worker 1-> task

Need to evaluate if removing executors will add enough benefit to developers/performance that we can drop it from architecture. Probably need resource aware re-balancing or Jstorm rebalancing that can support changing parallelism before this can happen.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1322

Topology Master

None exactly. The Heartbeat Server (Currently under review) reduces load on ZK, but it is not really the same and may be a complement to TopologyMaster

New system bolt "topology master" was added, which is responsible for collecting task heartbeat info of all tasks and reporting the info to nimbus. Besides task heartbeat info, it also can be used to dispatch control messages within the topology. Topology master significantly reduces the amount of read/write to ZooKeeper. Before this change, ZooKeeper was the bottleneck for deploying big clusters and topologies.

Need to evaluate how this impacts storm architecture especially around security.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1323

Backpressure

Bang-Bang controller on Disruptor queue capacity using ZooKeeper for broadcast.

Implement backpressure using "topology master" (TM). TM is responsible for processing the trigger message and sending the flow control request to relevant spouts. "flow control" in JStorm doesn't complete stop the spout from emitting tuples, but instead just slows down the tuple sending.
User can update the configuration of backpressure dynamically without restarting topology, e.g. enable/disable backpressue, high/low watermark, etc.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1324

Monitoring of task execute thread

Potential Pull Request, but none right now.

Monitors the status of the execute thread of tasks. It is effective to find the slow bolt in a topology, and potentially uncovers deadlock as well.

Yes we should do this

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1325

Message processing

Deserialization happens on the netty thread

Serialization happens after the send queue when batching is happening.

Add receiving and transferring queue/thread for each task to make deserialization and serialization asynchronously
Remove receiving and transferring thread on worker level to avoid unnecessary locks and to shorten the message processing phase

The two sound equivalent now, but we should talk to see if there are other optimizations needed.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1326

Tuple Batching

Batching in DisruptorQueue

Do batch before sending tuple to transfer queue and support for adjusting the batch size dynamically according to samples of actual batch size sent out for past intervals.

Should evaluate both implementations, and see which is better for performance, and possible if we can/should move some of the dynamic batching logic into disruptor.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1327

Grouping

Load aware balancing in shuffle grouping

Has a "localfirst" grouping that causes tuples to be sent to the tasks in the same worker by default. But if the load of all local tasks is high, the tuples will be sent out to remote tasks.
Improve localOrShuffle grouping from Storm. In Storm's localOrShuffle grouping the definition of "local" is local within the same worker process. i.e., if there is a bolt that the component can send to in the current worker process it will send the tuples there. If there is not one, it will do round robin between all of the instances of that bolt no matter which hosts they are on. JStorm has extended that so that other workers/JVMs on the same host are considered "local" as well, taking into account the load of the network connections on the local worker.

We should look at combining both of these to have shuffle look at both distance and load to decide where to send a tuple.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1328

web-UI

Yes

Different

Does someone know a good UI designer that we can use? I don't really like either of them (Bobby) but that is just me

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1339

metric system

IMetric and IMetricConsumer

All levels of metrics, including stream metrics, task metrics, component metrics, topology metrics, even cluster metrics, are sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very useful for debugging and finding the hotspots of a topology.
Support full metrics data. Previous metric system can only display mean value of meters/histograms, the new metric system can display m1, m5, m15 of meters, and common percentiles of histograms.
Use new metrics windows, the mininum metric window is 1 minute, thus we can see the metrics data every single minute.
Supplies a metric uploader interface, third-party companies can easily build their own metric systems based on the historic metric data.

Ideally we should have a way to display most/system metrics in the UI. IMetric is too generic to make this happen, but we cannot completely drop support for it. But perhaps we need to depricate it if the JStorm metrics are much better.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1329

rebalance command

Basic functionality of rebalance

Besides rebalance, scale-out/in by updating the number of workers, ackers, spouts & bolts dynamically without stopping topology. Routing is updated dynamically within upstream components.

dynamic routing with some groupings is difficult to get right when there is state, we need to be sure this is well documented, and might want to disallow it for some groupings without a force flag.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1330

list command

List information of topologies

List information of all topologies, all supervisors, and JStorm version

more info is good, but we want it human readable too. perhaps with a machine readable option too

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1331

zktool command

N/A

Supports some ZooKeeper operations, e.g. "list", "read"…

Will need to be evaluated for security

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1332

metricsMonitor command

N/A

Allows toggling on/off some metrics which may impact topology performance

Sounds great

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1333

restart command

N/A

Restart a topology. Besides restart, this command can also update the topology configuration.

Sounds great

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1334

update_topology command

N/A

Update jars and configuration dynamically for a topology, without stopping the topology.

Sounds great, should work will with blob-storm

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1335

cgroup

N/A

Supports controlling the upper limit of CPU core usage for a worker using cgroups

Sounds like a good start, will be nice to integrate it with RAS requests too.

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1336

logging

Supports user-defined log configuration via Log4j 2
Supports dynamic changes to logging of a running topology
Supports log4j and slf4j log APIS

Supports user-defined configuration of log
Supports both logback and log4j

Need to evaluate differences and see what we want long term

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1337

Worker classloader (isolation)

N/A - uses shading for most dependencies

The "worker classloader" avoids problem of re-loading classes

This sounds great

Jira

server	ASF JIRA
serverId	5aa69414-a9e9-3523-82ec-879b028fb15b
key	STORM-1338

...

Page tree

Versions Compared

Old Version 41

New Version 42

Key