One of the main issues when dealing with Hadoop is that the various different versions could have potential minor incompatibilities and to a greater extent, the absence/presence of a variety of features. For example, when using Tez, the main issue till date has been the handling of YARN Timeline Server. Hadoop 2.4 introduced YARN Timeline support. However, Security/ACLs support was only introduced in Hadoop 2.6 and required different APIs to be invoked.
In Tez, to work with different Hadoop versions, one must do the following:
For the most part, the features that are version dependent are usually plugins and they can be cleanly separated into different modules. The profiles above largely decide which modules will be built or not built based on the hadoop version. However, they are some features that require APIs that are a bit more invasive in nature and a plugin approach does not really work well. A couple of examples of this are:
To make use of this APIs, we have introduced a shim layer as part of TEZ-2924. The basic shim interface defines the various APIs whose implementations are version-specific and the version-specific shim implementation provides the runtime implementation to match the version of Hadoop. The Hadoop shims are discovered via ServiceLoaders (no configuration knobs required). The Tez framework will cycle through all providers that it finds on its classpath, query each one to see if it supports the current version of Hadoop and pick the first Shim that does.
To implement a distro-specific shim:
As mentioned earlier, if there are other shims that match the Hadoop version, your specific shim may not be picked up at runtime. To avoid this issue, you may need to remove the "hadoop-shim-2.*" jars from the tez tarball as needed.
For more questions/clarifications, please send questions to firstname.lastname@example.org