A. Design doc: adding jars and maven artifacts at submission

Note

This document borrows the format from KIP. Please let me know we would want rearrange to have other format.

Motivation

In order to use “storm sql”, we should find dependencies from somewhere and add them to extlib, since it just compiles SQL to classes and packages only them. While finding dependencies manually is already bad, it also affects other worker’s classpath, which could occur library version mismatch. Same things on Redis backed State feature.

Thanks to blobstore feature and Zeppelin/Spark projects which already handles this, I think we can provide a new feature - add jars and also maven artifacts (with transitive dependencies) from submission phase, and launch workers with them. This design doc describes how to implement this feature to Storm.

Public Interfaces

“storm jar” and “storm sql” will have two options regarding this which are all optional, and StormTopology will have optional field ‘dependencies’.

So in this change we keep backward compatibility.

Proposed Change

“storm jar” and “storm sql” will handle two options, which one is “--jars” and another one is “--packages”. Two options are here to address same objective, adding jars for worker classpath. How each of option works is described on table.

option	behavior
--jars	Add jars on local.
--packages	Add packages and underlying transitive dependencies.

Submitter will resolve and download transitive dependencies jars if “--packages” is specified. For handling maven artifacts we can use Eclipse Aether, and Zeppelin already has relevant logic so we can pick it. (I heard that that would be possible between ASF projects.)

After downloading all of jars for local, submitter will upload all of jars to blobstore to be available for Storm cluster. In order to provide the informations of added jars, StormTopology will have an optional field ‘dependencies‘ (or jars) which is List<String> consisting blobstore keys. Submitter should add the blobstore keys of uploaded jars to dependencies if such options are specified.

Submitting topology to nimbus is same as it is, but the way supervisor launches workers will be changed. When supervisor downloads assigned topologies’ code, supervisor will also download dependencies jars from blobstore to topology code directory. If possible, they will be also checked as well as topology codes before launching workers, so that worker is not launched instead of crashing when some jars are missing. Finally, supervisor can launch worker with adding downloaded jars to classpath. (we can even optionally set this ‘before storm-core and libs’ or in normal way ‘after storm-core and libs’)

Rejected Alternatives

I considered “worker bootstrapping” as launching worker, but it feels me a bit over-engineering so I picked supervisor side handling.

Let me explain “worker bootstrapping” a bit. This concept is launching worker from only having bootstrap code with mutable classloader, and loads storm-core and relevant libs, and download jars from blobstore, and loads them again, and finally launch worker main entry. It is complicated since this incurs 3-phases classloading, and we need to initialize needed classes dynamically, like Class.forName, and also invoke methods dynamically. Upside of this approach is that it doesn’t add the workload to supervisor.

JIRAs

STORM-2016 - Getting issue details... STATUS

Page tree