Ignite OSGi enablement involves at least two tasks:
For the inter-package dependencies (package export/import) and other bundle metadata, OSGi relies on a number of special entries in the jar's META-INF/MANIFEST file. Those entries are usually generated semi-automatically by the Maven Bundle plugin during a Maven build. The OSGi Framework will refuse to load a jar without a valid OSGi manifest defined.
Since the Ignite's distribution includes a number of optional dependencies (such as ignite-indexing, ignite-log4j, etc) in addition to non-optional ignite-core, the proposal is to make the optional dependencies the bundle fragments hosted by the ignite-core bundle. The fact that fragments share the class loader of the host bundle should simplify inter operation between the components.
At runtime the Ignite user installs all required bundles (including ignite-core, any optional Ignite bundles as well as the application bundles) using either the standard mechanisms defined by the OSGi spec, or relying on the container's implementation-specific capabilities. For example, Apache Karaf (an OSGi implementation) offers a packaging/deloyment concept called "Feature" which roughly speaking is a list of bundles to automatically deploy when the OSGi Framework starts.
[raul.kripalani]: I've already modified the POMs to generate and package the MANIFEST.MF with the appropriate OSGi headers. The code is pushed to the ignite-1527 branch.
We should provide a Feature Repository to make it easier for Apache Karaf users to install Ignite and, optionally, Ignite modules. There should be one feature per module that also installs all necessary library dependencies.
The main problem we need to solve in order to allow Ignite OSGi enablement is the marshalling. More specifically the issue is with deserialization of the classes that are provided by the bundles other than the JDK and the Ignite bundle itself.
When the Ignite transport layer receives a message it needs to figure out how to deserialize the bytes and for that it needs to know the bundle that provides the class to be deserialized. To make things more complex, the class may contain other classes that come from other bundles, and so on recursively. In general, what is needed then is a way to map an FQN of a class to its bundle (and hence to the class loader).
And this is where ClassLoaderCodec
comes into to play.
On the high level the proposal is as follows:
ClassLoaderCodec.encodeClassLoader(cls)
with the class to be serialized as its only parameter. The implementation of the method may return an arbitrary object that in some way (up to the implementation) represents the class loader of the class. The encoded representation of the class loader will be serialized along with the rest of the message data. The returned object may be a primitive, a serializable POJO, or a null
.encodeClassLoader()
call during serialization) as well as the FQN of the class being deserialized are passed into ClassLoaderCodec.decodeClassLoader(fqn, encodedClassLoader)
method. The implementation of the method is expected to decode and return an instance of the class loader to use for loading the class with the given FQN.It's responsibility of the implementation to ensure that the encoded representation is sufficient to unambiguously identify the correct bundle during deserialization.
[raul.kripalani]: The naming is confusing. We are actually not transmitting classloaders. In fact, we cannot do so. What we'd like to do is transmit deserialisation "hints", that are used in whatever form the marshaller deems appropriate. So if anything, I would call this class a DeserialisationHintsCodec with methods: generateHints and computeClassLoaderFromHints.
[dmitriy setrakyan]: I am not sure I see the reason for removing the word classLoader on serialization part and keeping it on deserialization. I also think that the method names should be symmetric. With that in mind, "encodeClassLoader" and "decodeClassLoader" may not be the best names, but they are consistent with each other and symmetric. My vote would be to keep the naming.
The ClassLoaderCodec
should be called for every Object during serialization and deserialization and should be part of the IgniteConfiguraiton
:
public interface ClassLoaderCodec { @Nullable public Object encodeClassLoader(Class<?> cls, ClassLoader clsLdr) throws IgniteException; public ClassLoader decodeClassLoader(String fqn, @Nullable Object encodedClsLdr) throws IgniteException; }
[raul.kripalani]: See my comment above.
Ignite will come with 2 OSGI class loader codecs out of the box, pessimistic
and optimistic
, leaving users with opportunity to provide their own custom class loader codecs as well (potentially for non-OSGI environments).
In general in OSGi, the same package may be exported by multiple bundles and therefore an FQN may not be sufficient to look up the correct class loader. In such cases, the codec implementation must employ a pessimistic
approach and encode enough information (for example, the bundle symbolic name, plus the bundle version) for the deserializer to be able to resolve the FQN to the correct class loader. Such implementation will work for all use cases, but it introduces some overhead and increases the size of the serialized messages.
However, for the applications that can enforce one-to-one mapping of packages to bundles, a simplified (optimistic) approach can be used instead. With this approach, no encoding of the class loader is required (encodeClassLoader()
returns null
), and only the FQN is used for decoding of the class loader.
[raul.kripalani]: I don't like transmitting bundle symbolic names over the wire, as it couples the serialising party with the deserialising party, forcing both to contain the class inside the same bundle. As I said in the mailing list, making this assumption would be a short-sighted strategy, as users may be sharing caches across applications across multiple containers, where classes live in different bundles in different containers.
I also don't think it's necessary. We just need the package name + package version. An OSGi container cannot expose the same package under the same version number twice, so the tuple (package name, package version) is enough to unambiguously locate the Bundle that exports our class.
Now, what we need to do is determine HOW we locate the Bundle. I have two ideas in mind:
With either of these approaches, I think we don't need pessimistic and/or optimistic strategies. Just a single strategy would be enough.
Here's how the pessimistic codec implementation might look like (in pseudo-code):
public class ClassLoaderPessimisticCodec implements ClassLoaderCodec { public ClassLoaderPessimisticCodec() {} @Nullable public Object encodeClassLoader(Class<?> cls, ClassLoader clsLdr) throws IgniteException { // TODO return bundleName + bundleVersion; } public ClassLoader decodeClassLoader(String fqn, @Nullable Object encodedClsLdr) throws IgniteException { // TODO: get class loader for a bundle based on bundleName and version. ... } }
Here's how the optimistic
(opportunistic :)))) codec implementation might look like:
public class ClassLoaderOptimisticCodec implements ClassLoaderCodec { public ClassLoaderOptimisticCodec() {} @Nullable public Object encodeClassLoader(Class<?> cls, ClassLoader clsLdr) throws IgniteException { return null; } public ClassLoader decodeClassLoader(String fqn, @Nullable Object encodedClsLdr) throws IgniteException { // TODO: // Iterate through all the bundles and pick the first one // that can load the class. Once found, cache the class loader // for faster lookups going forward. ... } }
First of all the both approaches imply that your cluster is consistent and contains the same version of the bundles on all the nodes. This can be see a a valid assumption in order to ensure the consistency of your computation tasks. If you want to be able to work it in a more non deterministic approach then we have to introduce yet another strategy. But first let focus assume that the bundles are equals on the entire cluster.
TBD >= 5.0
On the write side this approach require you to capture the bundle symbolic name and its version. This is something easy to do as in OSGi all classloader except the system classloader implements the BundleReference. The pessimis codec can look like that:
public class ClassLoaderPessimisticCodec implements ClassLoaderCodec { private static final byte FRAMEWORK_CLASS_LOADER_ID = 0; private static final byte IGNITE_CLASS_LOADER_ID = 1; private static final byte BOOT_CLASS_LOADER_ID = 2; private static final byte BUNDLE_CLASS_LOADER_ID = 4; private static final ClassLoader FRAMEWOR_CLASS_LOADER = Bundle.class.getClassLoader(); private final PackageAdmin packageAdmin; public ClassLoaderPessimisticCodec(PackageAdmin packageAdmin) { this.packageAdmin = packageAdmin; } @Nullable @Override public Object encodeClassLoader(Class<?> cls) throws IgniteException { ClassLoader classLoader = cls.getClassLoader(); if (isIgniteClass(classLoader)) { return ClassLoaderDesc.newIgniteClassLoaderDesc(); } if (isFrameworkClassLoader(cls.getClassLoader())) { return ClassLoaderDesc.newFrameworkClassLoader(); } Bundle bundle = FrameworkUtil.getBundle(cls); if (bundle != null) { return ClassLoaderDesc.newBundleClassLoaderDesc(bundle); } return ClassLoaderDesc.newBootClassLoader(); } @Override public ClassLoader decodeClassLoader(String fqn, ClassLoader clsLdr, @Nullable Object encodedClsLdr) throws IgniteException { ClassLoaderDesc classLoaderDesc = (ClassLoaderDesc) encodedClsLdr; switch (classLoaderDesc.classLoaderId) { case BOOT_CLASS_LOADER_ID: return clsLdr; case FRAMEWORK_CLASS_LOADER_ID: return FRAMEWOR_CLASS_LOADER; case IGNITE_CLASS_LOADER_ID: return ClassLoaderCodec.class.getClassLoader(); case BUNDLE_CLASS_LOADER_ID: //strict version but we can think about an different strategy here like minor or micro version range Bundle[] bundles = packageAdmin.getBundles(classLoaderDesc.bsn, classLoaderDesc.version); if (bundles == null) { throw new IgniteException("No bundle found: " + classLoaderDesc.bsn + ":" + classLoaderDesc.version); } try { //highest ranking bundle return bundles[0].loadClass(fqn).getClassLoader(); } catch (ClassNotFoundException e) { throw new IgniteException(e); } default: throw new IgniteException("Unsupported class loader description type: " + classLoaderDesc.classLoaderId); } } static final class ClassLoaderDesc implements Externalizable { private String version; private String bsn; private byte classLoaderId; public ClassLoaderDesc() {} public ClassLoaderDesc(byte classLoaderId) { this.classLoaderId = classLoaderId; } public ClassLoaderDesc(Bundle bundle) { this.classLoaderId = BUNDLE_CLASS_LOADER_ID; this.bsn = bundle.getSymbolicName(); this.version = bundle.getVersion().toString(); } @Override public void writeExternal(ObjectOutput out) throws IOException { out.write(classLoaderId); if (classLoaderId == BUNDLE_CLASS_LOADER_ID) { out.writeUTF(bsn); //can be optimized out.writeUTF(version); } } @Override public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { classLoaderId = in.readByte(); if (classLoaderId == BUNDLE_CLASS_LOADER_ID) { } } static ClassLoaderDesc newIgniteClassLoaderDesc() { return new ClassLoaderDesc(IGNITE_CLASS_LOADER_ID); } public static ClassLoaderDesc newBundleClassLoaderDesc(Bundle bundle) { return new ClassLoaderDesc(bundle); } public static ClassLoaderDesc newFrameworkClassLoader() { return new ClassLoaderDesc(FRAMEWORK_CLASS_LOADER_ID); } public static ClassLoaderDesc newBootClassLoader() { return new ClassLoaderDesc(BOOT_CLASS_LOADER_ID); } } }
Disclaimer: this implementation is not functional and not optimized is purpose it to show how it can be done. Here we are using the PackageAdmin
service which is deprecated but really simple to demonstrate the purpose.
It seems to be more interesting to get the ClassLoader clsLdr
as a parameter of the decode method instead of the encode one.
In this strategy we start with a more strict assumption: packages of all serialized classes come from one and only one bundle.
You may think this option is more simple but it is not true.
TODO
4 Comments
Andrey Kornev
1) The interface name: I feel the current name is to the point (ultimately it's about class loaders, not some hints) and doesn't suggest implementation. There is nothing in its name that implies actual transmission of class loaders between JVMs.
2) Encoding/Decoding strategies: The Optimistic strategy unlike any other proposed so far has zero overhead in terms of on-wire size as it does not require any data to be serialized in addition to what's already available (not even the package name and the version). In my application I can guarantee consistent deployment and would like to avoid the unnecessary serialization costs. Taking a step back however, I personally see this proposal as being mostly about providing a way to plug in any class loader codec implementation, and not so much about a particular codec implementation. Having said that, I think Raul's second approach would make a strong candidate as the default implementation shipped with Ignite.
Raúl Kripalani
1) We are talking classloaders because we're talking OSGi + default JVM serialisation/deserialisation. But there are dozens of serialisation technologies out there, and we're trying to build a generic solution that is not only applicable to OSGi. I could easily imagine other serialisation technologies requiring other assistive data, such as context names, data catalogues, schema IDs, etc. So our strategy should not be constrained to the immediate need, but to general pluggability for providing additional context-dependent data on the wire.
2) I also think the second technique is less intrusive and makes a fair candidate for an OOTB implementation. It does require some overhead as we need to query OSGi PackageAdmin (or the superseding API) to find out the package version – but to avoid incurring in this cost repeatedly, we could build a memory cache (ConcurrentHashMap) mapping classdefs to package versions on the serialising side, additional to the (package name, package version) => Bundle cache on the deserialising side.
Dmitriy Setrakyan
1) Raul, I actually see your point, but it does not apply to Ignite 1.5 release. As you know, we are introducing a new default
"binary"
format, and in that format we take care of all the serialization routines automatically. Also, users already have "Binaryzible" interface in case if custom serialization behavior is needed. The only thing that remains is class-loader detection, hence the name of the method.2) I like your suggestion on encoding package name with a version.
I also think that the optimistic option should be the default, as it has no over-the-wire overhead and will provide better performance.
Romain Gilles
I agree with Dmitriy here we are focusing on the class loader resolution. If the end user want to customize the serialization it can to it at different level with other spi. So the single responsibility of this class is to resolve and make resolvable the classLoader. I'm also agree on the point that a serialization spi class should provide 2 symmetric methods to express the follow of the data in the both direction. Maybe we can call the class
ClassLoaderHintsCodec
and the methodencodeHints(...)
anddecodeHints(...)
?I vote for the second approach as an OOTB implementation. I had the same idea for the optimistic impl . It will require to have a homogeneous cluster deployment as explain in the assumptions section.
For the first approach, what I see is that in one way or the other you will have to configure your build system in order to add those new header. Therefore it will make the Ignite development for OSGi slightly different from the non OSGi implementation. Here I would like to example a use case. I see the point for the cache use case but cache is not the only use case. Maybe now we can see it as a distributed caching solution but it is still very interesting for distributed computing. In this use case I don't see a cluster of distributed computing have different implementations of the computation units. For example let say you are using it in order to price deals store in partitioned cache. Then the bank will be quite disappointed to gate different (inconsistent) pricing result across the cluster. Also, you don't want to export the computation logic unit because it is a private detail. therefore it will not be in the exported package. Does it make sens to you or I'm totally out of scope?
Finally, I still think that in some cases you may need a way to do some mapping. Therefore I will suggest to introduce a way to be notified of the start / end of a serialization / deserialization of an object graph or maybe to provide a mapping capability as a method argument. Does it make sens?