This Confluence has been LDAP enabled, if you are an ASF Committer, please use your LDAP Credentials to login. Any problems file an INFRA jira ticket please.

Child pages
  • Project Dependency Trees schema

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

The Project Dependency Trees artifact defines all the side artifacts of a project as well as each artifacts tree of dependencies. This can be used by consumers to decide what the consumers effective tree of dependencies is as well as allowing consumers to perform intelligent substitutions in the tree. By providing the entire tree we can reduce the number of requests a consumer needs to make in order to resolve all the artifacts the consumer requires.

 

There are a number of issues with the current Project Object Model used by Maven:

...

To illustrate by example:

Consider a project that builds a Java Web application that can be run standalone or as part of an EAR. Under current best-practice we would advise separating the project into multiple modules:

  • A module to build the JAR file that contains the compiled code and corresponding resources
  • A module to build the WAR file for consumption as part of an EAR - this needs to be skinny as the common dependencies will be shared across all the modules within the EAR
  • A module to build the WAR file for standalone - this needs to be fat and is built from the skinny WAR by adding in the common dependencies

There are other ways to skin this cat, but what we really want to have is that there is a single project that produces:

  • A JAR of the compiled code - we may want to reuse this
  • A skinny WAR which exposes transitive dependencies of the common dependencies that are required to be present in the EAR
  • A fat WAR which does not expose any transitive dependencies - perhaps other than the servlet container and JVM level requirements
  • A test JAR that allows for re-use of the unit tests of the compiled code
  • An integration test JAR that allows for extending and running the WAR acceptance tests
  • A source JAR for the main JAR
  • A javadoc JAR for the main JAR
  • A source JAR for the test JAR
  • A javadoc JAR for the test JAR
  • A source JAR for the integration test JAR
  • A javadoc JAR for the integration test JAR

Each of these artifacts will have different effective dependencies, for example the test JAR will have a dependency on the main JAR and then a dependency on the test framework, etc.

The way the modelVersion 4.0.0 POM handled these different dependencies was via <scope> tags. The issue with scope tags is that the valid scopes become part of the model version and the information about which scopes apply to which artifacts has been lost to the build process by the time the artifacts is consumed by a consumer.

To solve this issue, the Project Dependency Trees will list the effective consumption required dependencies of each artifact produced by the project. 

...

The primary driver of non-atomic deployment is the production of platform specific artifacts for the project. In this regard, the Project Dependency Trees model assumes that deployments will be at least atomic per platform. "At least atomic per platform" means that the initial deployment may include multiple platforms and subsequent additional deployments will be atomic per platform. For example:

The foo project produces some non-platform specific artifacts as well as artifacts for the os-x, windows and linux platforms. A build on a say os-x may be able to use cross-compiling tooling to produce artifacts for the linux platform (e.g. you can use rpmbuild on a mac, so you could create the RPM installer when building the project on some macs). Thus if we perform the release from a mac, our initial deployment will include the non-platform specific artifacts (e.g. the standalone WAR file for the web application) as well as some platform specific artifacts (e.g. the OS-X installer and perhaps the RPM & DEB installers for linux systems). At the com.example:foo::1.0 coordinates we would deploy:

  • a POM for modelVersion 4.0.0 compatibility
  • the Project Dependency Trees where the top level <project> tag does not have a platformId. There will be an <artifacts> tag as well as <artifacts platformId="os-x"> and <artifacts platformId="linux"> detailing the artifacts that were produced as part of the initial atomic deployment
  • the non-platform specific artifacts will be deployed in com.example:foo::1.0 as they are associated with the modelVersion 4.0.0 pom coordinates
  • a POM for modelVersion 4.0.0 compatibility will be deployed at com.example:foo:os-x:1.0 (which maps to GAV com.example:foo-os-x:1.0 in the modelVersion 4.0.0 coordinates) detailing the dependencies of the os-x artifacts (as different platforms are most likely to have the biggest differences in dependencies, it makes sense to give each platform its own modelVersion 4.0.0 POM to assist legacy consumers get as close to the correct dependency tree as we can)
  • the os-x specific artifacts will be deployed at com.example:foo:os-x:1.0
  • a POM for modelVersion 4.0.0 compatibility will be deployed at com.example:foo:linux:1.0
  • the linux specific artifacts will be deployed at com.example:foo:linux:1.0

Later, we perform a checkout of the tag from SCM on a windows machine and perform the build and deployment of the windows specific artifacts.

  • a POM for modelVersion 4.0.0 compatibility will be deployed at com.example:foo:windows:1.0 as following the pattern from above
  • the windows specific artifacts will be deployed at com.example:foo:windows:1.0 again following the pattern from above
  • the original Project Dependency Trees cannot be redeployed as that would break the atomic deployment as well as breaking the write once principle of Maven release repositories, thus a Project Dependency Trees file will be deployed at com.example:foo:windows:1.0 in this case the <project> tag must have a platformId, specifically <project platformId="windows"> and there must be one and only one <artifacts> tag contained within the <project> tag, i.e. <artifacts platformId="windows">. The metadata for either groupId or groupId:artifactId - which can be updated - will, in addition to detailing the available versions, detail the platformIds available for each version.

NOTE: as the platformId is the unit that separates atomically deployable components, it will be up to the tooling providers to agree on what values of individual platformIds mean for that specific tooling. The example above used high-level operating systems as platformIds, but without prejudice, we could equally have os-x-10.10, os-x-10.9linux-fedora-25, linux-fedora-24, linux-rhel-6linux-centos-6, linux-ubuntu-12.04windows-server-2012, etc. Similarly we could have the platform differentiate in other ways, e.g. java7java8android, etc. Or perhaps the platform could differentiate artifacts that target different runtimes, such as tomcatjettyweblogicgeronimo, etc where the major difference in those platform specific artifacts is the dependency trees.

NOTE: while different projects can follow different conventions for what the different platformIds are used for, as the dependency's platformId is part of the dependency tree, the project can perform the appropriate mapping of its transitive dependencies platform identifiers into its own convention, so deviations in conventions between projects should not prove fatal.

Conflict resolution

The modelVersion 4.0.0 POM mixes build time dependency specification with consumption time dependency specification. This has the effect of significantly complicating the dependency model within the POM:

  • Dependencies can be specified in the POM directly
  • Dependencies can be inherited from parent POMs
  • Dependencies can be added via profiles
  • Transitive dependencies need to be traversed and processed and built
  • Versions can be specified in the <dependencies> section, or <dependencyManagement> or imported by a <scope>import</scope> dependency in the dependency management.
  • Versions can be specified using a ${property} which can cause confusion as depending on where the dependency comes from, the valid origins for the property to be used with property expansion can be unclear. 
  • Conflict resolution is by "POM" order where the "first" dependency wins... but also the "child" wins over the "parent"... but the parent's <dependencies> entries come before the child's!!!
  • If "POM" order fails, then conflict resolution is by tree depth, such that nearest to the root wins.
  • The author suspects that there is more unspecified (or perhaps weakly specified) behavioural madness...
  • The end result of dependency resolution is typically a flattened list of dependencies

The above set of "rules" make it hard for other toolings to process the dependencies of a POM correctly, and consequently there are many many examples of real world POMs where various hacks have been used to tame the effective dependency tree in order to produce the required transitive tree for consumers.

The Project Dependency Trees simplifies the work of consumers by explicitly providing the fully intended resolved tree to be used by consumers. There is no requirement for a consumer of a project's artifacts to consult any transitive dependencies (though if the consumer has a better understanding of a specific transitive dependency modelVersion the consumer may want to consult, it does not have to).

The consumer is then free to decide how to resolve conflicts, and because the tree has been provided, in the event that conflict resolution requires dependency substitution, the tree can be pruned safely (whereas with a flattened list, safe substitution would not be possible as we could end up retaining orphaned transitive dependencies)

The consumer is also free to decide if conflicts need to be resolved at all. For example, an OSGi container can correctly manage multiple versions of the same module whereas the Java 9 modulepath can only have one version of any specific module. When a project produces a JAR artifact that contains both the OSGi module metadata as well as the Java 9 module info, it has no way of knowing whether the consumer will want to apply a "single version per module" rule or "all versions of each module" rule and nor should it, only the consumer can know how conflicts should be resolved.

Conflict resolution is also related to the next issue.

Version ranges and reproducible builds

One of the main issues with version ranges in the modelVersion 4.0.0 POM is that they produce an irreproducible build, as the consumer will re-resolve the version range every time it builds the dependency tree, and as such may resolve a different version.

The utility of version ranges comes into play when performing conflict resolution. If a consumer has to pick a single version of each dependency, the range information allows that version selection to be performed safely... i.e. if I have transitive dependencies on com.example:foo::[1.0,)com.example:foo::[1.2,2.0) and com.example:foo::[1.1,1.4.5],[1.4.7,) then I can construct the effective safe range of [1.2,1.4.5],[1.4.7,2.0) and select the appropriate single version.

The irreproducibility of version ranges is still somewhat of an issue though. We can resolve the irreproducibility of builds by recognising that it is really a trade-off choice that the consumer should make.

The consumer should be able to choose between:

 

  • Selecting the lowest matching version in the range - i.e. should be stable
  • Selecting the highest matching version in the range - i.e. to pick up bug fixes automatically
  • Selecting the lowest matching version in the range that was actually resolved by a dependency
  • Selecting the highest matching version in the range that was actually resolved by a dependency

The Project Dependency Trees model enabled consumers to make this choice by providing not only the version range but also the resolved version of each dependency. The version range can then be used to guide conflict resolution and the resolved version information can be used as hints to pre-select the exact version to use if the consumer wants a reproducible build.

Build time information

The Project Dependency Trees model removes all the build time information that was previously exposed from the modelVersion 4.0.0 POM, thus there is no <build><profiles> or <reporting> sections. 

This points to a legitimate concern about how to handle project inheritance while moving the POM beyond modelVersion 4.0.0. The solution here is to define two classes of compatibility.

  • The modelVersion 4.0.0 POM will always be deployed (at least until such time as there are effectively no more modelVersion 4.0.0 consumers)
  • The Project Dependency Trees provides for "best effort" forward compatibility with newer modelVersions in order to ensure that older clients can at least consume artifacts from newer model trees (the older consumers may have to apply hacks such as <exclusions> or explicitly listing required dependencies in order to consume the dependency correctly... just as a modelVersion 4.0.0 POM consumer does today, but the artifacts can be consumed)
  • The build time information is only required from parent / mix-in projects. To use a parent / mix-in you must be building with a tool that understands the modelVersions up to and including the highest modelVersion of the parent project and any mix-in projects 

In other words:

  • Parent and Mix-In inheritance is backwards compatible but not forwards compatible
  • Dependency trees have backwards and forwards compatibility (though the forwards compatibility is with restrictions of what can be mapped)

Thus only projects that are intended to be consumed as either parent projects or as mix-in projects would deploy their newer modelVersion POM.

OPEN QUESTION: do we deploy the newer modelVersion POM as the groupId:artifactId::version::pom or as groupId:artifactId::version:build:pom? The first form ensures that the POM cannot be used as a parent by modelVersion 4.0.0 projects as they will blow up immediately, however there has been an established practice of using <packaging>pom</packaging> for projects that produce non-standard artifacts and want to opt-out of the standard lifecycle binding, and thus we would break consumption of those "side" artifacts by legacy clients. Perhaps the solution is to follow the second form (i.e. it gets deployed with <classifier>build</classifier> and either put a Maven enforcer execution into the modelVersion 4.0.0 POM or use the <prerequisites> tag to try and at least alert that the parent is invalid.

 

<project modelVersion="..." groupId="..." artifactId="..." [platformId="..."] version="...">

...