Generating the Samza API whitelist
In order to load the Samza API classes from the API classloader, we need to tell cytodynamics what those classes are. We can do this by providing a whitelist of packages/classes when building the cytodynamics classloader. All public interfaces/classes inside of samza-api should be considered an API class. One way to generate this whitelist is to use a Gradle task to find all the classes from samza-api and put that list in a file. Then, that file can be read by Samza when constructing the cytodynamics classloader. The Gradle task should also include classes from samza-kv.
The SamzaApplication.describe method needs to be able to delegate to the infrastructure classloader framework for certain concrete descriptor components (e.g. descriptors provided by the framework). Therefore, the application classloader from above can't be used, since it does not delegate to the infrastructure classloader.We will build an additional classloader just to load the SamzaApplication, and then everything else will delegate to the existing infrastructure classloader. The infrastructure classloader still might delegate to the original application classloader, and that is good, because we want classes from the original application classloader to do the container processingsystem descriptors, table functions). The framework descriptor components will be added as part of the framework API whitelist which will be checked when loading classes in the application classloader, so that the application classloader will delegate to the framework API classloader for framework descriptors. The descriptors are used to generate configs through the descriptor API classes, so concrete framework descriptors and custom descriptors will both work.
Table functions get serialized into configs by the table descriptors that they are contained in. They only need to be deserialized for processing logic, so the job coordinator does not need to deserialize them. On the processing containers, they can get deserialized using the framework infrastructure classloader, so that they can access application classes (e.g. schemas) if necessary. The infrastructure classloader will not delegate to the API classloader for the concrete descriptors.
Flow for loading a class from the additional application classloader for SamzaApplication.describe:
- If a class is a framework API class, load it from the framework API classloader.
- If a class is a framework descriptor class, load it from the framework API the implementation of SamzaApplication specified by "app.class", then load it from this describe classloader.
- Load the class from the infrastructure classloader. The infrastructure classloader might do further delegation to other classloadersapplication classpath.
By using the special classloader to instantiate the "main" class, any dependencies will then be loaded using that classloader. Then Java will automatically propagate the special classloader through the rest of Samza. We can modify the "main" method to use reflection to load the "main" class and then trigger the actual Samza startup.