Status
 DRAFT
Version
Issue(s)
Sources
Developer(s)

Status

This RFC is currently in the DRAFT state. Nothing in this RFC has been agreed or confirmed.

Contents

Introduction


The remote repository layout defines how the central repositories used by Apache Maven as well as a non-trivial number of third party clients can access the artifacts produced as versioned releases of dependent projects. 


Overview


Projects

The basic unit of organisation in a repository is the project coordinates. The project coordinates consist of the pair groupId:artifactId.

Versions

Each project consists of at least one versioned set of artifacts. The full set of artifacts for any specific version is defined by the coordinates groupId:artifactId:version.

Artifacts

Any specific version of a project will have at least one artifact. The artifacts have the following information associated with them:

  • type (mandatory) - this represents the type of artifact
  • classifier (optional) - when absent, the artifact is the primary artifact of that type. When present it is used to disambiguate additional artifacts. For example with Java artifacts the main artifact is typically a jar file and the javadocs are also typically packaged as a jar file. The main artifact (containing the .class files etc) will have no classifier while the Javadoc artifact will have a classifier of javadoc.
  • platformId (optional) - when absent, the artifact either is not tied to any specific platform or is associated with multiple platforms. When present, the artifact will only work on the specific platform. The platformId naming is established by convention based on the type of file / project. Some examples:
    • A project that builds Firefox binaries will have a need to produce different binaries targeting different systems. One such artifact might be on disk a file such as firefox-49.0-2.fc26.i686.rpm the version would probably be 49.0-2 and the platformId could be fc26.i686 or some derivative (such as fedora-26.i686; as fedora.i686 or fedora.x86) because that specific RPM may not work on other versions of fedora, other CPU architectures or other versions of linux systems that can support the RPM packaging. Similarly there may be a firefox_49.0+ubuntu0.12.04.1_i386.deb that may use a DEB specific scheme for deciding the platformId or it may suffice to use something like ubuntu.i386. It is expected that a convention will be established by the community of users of the repository
    • A project that builds installers for say Apache Tomcat, would probably end up producing an RPM that does not have a platformId (corresponding the the noarch RPMS). The Apache Tomcat connector RPMs, however, would have platformIds as they include platform specific code. 
    • The JFFI project produces a jar file that bundles the native libraries for implementing its foreign function interface SPI. It would be intended that the JFFI project would not deploy its jar artifact with a platformId as the jar artifact targets multiple platforms with a single artifact.
    • Regular Java and .NET projects would be expected not to use the platformId as the artifacts produced by such projects are typically independent of operating system (subject to the availability of their required common runtime) though there are cases where Java and .NET projects may end up producing artifacts that target specific platforms
    • It may be the case that a Java artifact targets e.g. a specific JavaEE container... in those cases it may make sense to use the container as the platformId, e.g. there may be a jboss and weblogic variant of the same version of the same project. It is expected in such cases that the major differences between such platform specific artifacts would be the transitive required dependencies. 

Every artifact is thus uniquely identified by its coordinates: 

groupId:artifactId:platformId:version:classifier:type

For artifacts that do not have a platformId the preferred form of coordinates is:

groupId:artifactId::version:classifier:type

For artifacts that do not have a classifier, the preferred form of coordinates is:

groupId:artifactId:platformId:version::type

For artifacts that do not have either a platformId or a classifier, the preferred form of coordinates is:

groupId:artifactId::version::type

The intermediate :: characters are critical in order to disambiguate platform aware coordinates from the previous styles of coordinates:

groupId:artifactId:version:type

and

groupId:artifactId:version:classifier:type

Repository artifact layout

There have been two previous layouts used for the repository: Maven 1 and Maven 2/3

The migration from the Maven 1 layout to the current Maven 2/3 layout was problematic and caused a large amount of pain for users. Consequently, there is little appetite for a mass migration of artifacts to a new layout. Thus the new layout will be a superposition on top of the Maven 2/3 layout.

The Maven 2/3 layout mapped artifacts from a groupId:artifactId:version:classifier:type to a repository path using the following scheme:

${groupId.replace('.','/')}/${artifactId}/${version}/${artifactId}-${version}${classifier==null?'':'-'+classifier}.${type}

The new layout will mix the artifactId and platformId together. This scheme allows for better interoperability with older clients of the remote repository that do not understand the platformId concept.

Note: An alternative scheme would be to mix the platformId either with the version or with the classifier. Both of these were rejected because:

  • Platform specific artifacts are highly likely to have different dependencies. This was rejected because legacy clients would thus not be able to consume that information as the dependency tree of a classifier artifact is the same as the dependency tree of the main artifact in a Model Version 4.0.0 POM (which is all we can assume a legacy client can consume)
  • There is a strong likelihood that some projects will want to depend on multiple platforms of the same project. For example projects such as JFFI may want to depend on the native libraries from compiled each platform so that those artifacts can be embedded within the jar file. This was rejected because legacy clients cannot depend on multiple versions of the same project under the Model Version 4.0.0 graph conflict resolution rules.

The new layout is thus:

${groupId.replace('.','/')}/${artifactId}${platformId==null?'':'-'+platformId}/${version}/${artifactId}${platformId==null?'':'-'+platformId}-${version}${classifier==null?'':'-'+classifier}.${type}

In other words, from the point of view of a legacy client, the platform specific artifacts are available from a different project at groupId:artifactId-platformId this mirrors the current way that users of the central repository have been handling platform specific artifacts.

Repository metadata layout

For legacy clients, we need to maintain the current maven-metadata.xml files, however, for newer clients we will provide a more flexible metadata index using JSON. In order to allow for metadata evolution, the JSON format will be subject to the following restrictions:

  • Consumers must ignore unknown keys
  • Producers must preserve unknown keys
  • Aggregating proxies must merge all keys, where conflicts arise, the aggregating proxy will use a priority list of upstream sources to determine which value will win

The basic format will be something like:

{
  "modified":"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'", // always ISO 8601 extended format in UTC timezone
  "group":[ // present if there are any artifacts deployed using the repo path as groupId
    "artifactId",
    "artifactId:platformId",
    "artifactId",
    "artifactId:platformId",
    "artifactId:platformId"
  ],
  "artifact":[ // present if there are any artifacts deployed using the repo path as groupId:artifactId[:platformId]
    "version",
    "version",
    "version",
    "version"
  ],
  "org.apache.maven:plugins":[ // this is a Maven specific key, hence namespaced
    {
      "name":"...",
      "artifactId":"...",
      "prefix":"..."
    },
    {
      "name":"...",
      "artifactId":"...",
      "prefix":"..."
    }
  ]
}

Tool specific keys must be prefixed by the top level groupId of the tool to which they are scoped. Each tool is responsible for the structure within that key and how to handle evolution of that structure.

Some examples:

https://repo.maven.apache.org/maven2/io/github/stephenc/maven/repo-metadata.json would be:

{
  "modified":"2014-01-16T09:55:43.511Z",
  "group":[
    "rfmm-maven-plugin"
  ],
  "org.apache.maven:plugins":[
    {
      "name": "Release From My Machine Maven Plugin"
      "prefix": "rfmm"
      "artifactId": "rfmm-maven-plugin"
    }
  ]
}

https://repo.maven.apache.org/maven2/io/github/stephenc/maven/rfmm-maven-plugin/repo-metadata.json would be:

{
  "modified":"2014-01-16T09:55:49.243Z",
  "artifact":[
    "1.0"
  ]
}


TODO consider a counter-proposal... the top level keys are the repository "id" and then everything else is as before. This simplifies aggregating proxies and may assist with PDT Repositories as we would then know the IDs of the content from aggregating proxies, e.g.

https://repo.maven.apache.org/maven2/io/github/stephenc/maven/repo-metadata.json would be:

{
  "central":{
    "modified":"2014-01-16T09:55:43.511Z",
    "group":[
      "rfmm-maven-plugin"
    ],
    "org.apache.maven:plugins":[
      {
        "name": "Release From My Machine Maven Plugin"
        "prefix": "rfmm"
        "artifactId": "rfmm-maven-plugin"
      }
    ]
  }
}

https://repo.maven.apache.org/maven2/io/github/stephenc/maven/rfmm-maven-plugin/repo-metadata.json would be:

{
  "central":{
    "modified":"2014-01-16T09:55:49.243Z",
    "artifact":[
      "1.0"
    ]
  }
}



3 Comments

  1. How about using '+' as separator in both repo layout and coordinate? The '+' is not a valid identifier character, so there won't be any collisions and it is immediately clear which part is the platformId

    i.e. groupId:artifactId+platformId:version:classifier:type and org/apache/maven/artifact+myos/1.2.3/artifact+myos.jar

     

     

     

     

    1. Robert Scholte so the critical part is that a 4.0.0 consumer must  be able to get the platform specific artifacts. If + is not valid in artifactId then a 4.0.0 pom cannot depend on the platform specific artifacts which is a problem.

       

      If + is valid in artifactId then it is no different than - except we don't even get to follow the existing convention people have been "following" on central (from what I can see)

  2. Considering my counter-proposal... we would still need to define merging strategies for the different keys... but tool specific merging strategies then become only a concern of the tool that consumes the tool specific key, i.e. we only have to document how to merge the modifiedgroup and artifact keys... which would be respectively "use latest", "merge as a unique set" and "merge as a unique set"... with the org.apache.maven:plugins namespaced key... that would be an internal to Maven concern, though we would probably use something like "merge and overwrite values, in reverse order of repository id's configured by the user" so that the "first" defined repository would always "win".

    I kind of like this counter-proposal