Status WIP
Version 
Issue(s) MCOMPILER-21 - Getting issue details... STATUS
Sources trunk
Developer(s) Mark Struberg

Currently (3.0.4) Apache Maven doesn't support incremental builds very well. Because of that fact, most people use mvn clean compile instead of mvn compile for example. This is pretty time consuming and can be improved a lot.

The goal is to make mvn compile, mvn verify, mvn install, ... (all stuff without clean) usable for almost all situations.

In general it's better we unnecessarily force a bit more work than to not detect a change as in the later case we would end up with broken artifacts and people will use the clean goal again.

Rational - aka what is broken?

check out the sample project

git clone git://github.com/struberg/maventest.git

First please set the maven-compiler-plugin version back to 2.5 (the latest release). Then build the whole project with

mvn clean install

Now change the code of BeanA and rename getI() to getI_doesntexistanymore(). This means that BeanA2.java as well as BeanB.java should fail to compile!

But if we try to build the project with

mvn install

without any clean lifecycle, then we see 2 bugs

  1. the maven build still succeeds to compile the project
  2. maven even generates a jar which contains broken classes
  3. moduleB does not get recompiled and is thus broken as well.

And this is just the tip of the iceberg.

Solution

In maven this needs 2 parts to get tweaked.

A.) Incremental Module Builds

The following parts should be implemented in the maven-core reactor code (or what remained from it). If any of those tests indicate a change then we force a 'clean' on the module and on all depending downstream modules.

  1. If an artifact in the maven repository changed. The change of an artifact might get detected by it's md5.
  1. If an activated Profile or an evaluated property changed. We might store away the hash code (md5) of the evaluated profiles and properties in a file in target/. We do not store the evalutated properties in a file due to security reasons (passwords, etc)

B.) Plugin support for Incremental Builds

Each plugin need to check on it's own whether it should perform it's tasks or not. Not every plugin needs the full dependency graph. And other plugins (e.g. maven-ear-plugin) even have 'manual' dependencies which are not reflected in the dependency graph. Thus all plugins need to check for themselfs whether they need to do something or not. The first plugin which kicks in detecting a change will create a result. And this result might trigger work in another plugin/phase.

Strategies for change detection

A plugin has a few ways to detect that it needs to do some work

  1. if the input file is of same date or newer than the output file
  2. if the input file is is newer than the timestamp when the build got started
  3. hash Codes. A plugin might store the md5 of it's dependencies or input files to detect a change.
  4. additional pid or status files. E.g. the maven-shade-plugin could store the work state + md5 of the generated result in such a file.

It's suggested that all those additional information will get stored in ./target/maven.status/${plugin-name}.status

4 Comments

  1. Just to verify:
    it's all about classpath files and there are 3 types:
    1. project files (a single module, like a jar)
    2. reactor files
    3. dependencies

    It is very unexpected, that the first already causes trouble.
    The problem is actually the SourceMapping implementation. Right now we have a one-to-one implementation with the SuffixMapping, an many-to-one implementation with the SingleTargetSourceMapping.
    IIRC I've written another implementation of the SourceMapping
    based on a directory, making it a many-to-many implementation. That's what we need here (I'll need to search where I needed this implementation).
    And there's one feature which we need to add: a succeedFast-option. We're not interested in all the includes (files matching the criteria), if there's already one, then that's enough to recompile all the sources.

    For the second Maven is capable in switching from the classes-folder to the specific jar during a package. If such file (the jar or one of the files under classes) is newer then the start of the Maven build, a full recompile is required.

    For the third you would need to check the timestamp of the end of the previous build. If a dependency is newer, a full recompile is required.

    1. 1-3 is ok if you mean 'dependencies' in whatever form. I'd rather talk name them 'build input'.

      But it's not only about classpath files. For the compiler plugin it might be, but other plugins might even query the database...

      > It is very unexpected, that the first already causes trouble.
      even jason verified yesterday that the m-compiler-plugin doesn't detect cross class changes properly in 2.5. This works in m2eclipse just because the Eclipse JDT does the compile for it.

      Not sure about the SourceMapping. There can be so many cases. To analyze that we would need to build a complete graph of the whole source like Idea, Netbeans and Eclipse does internally. But this is way too expensive to regenerate it over and over again. Compilers are fast enough today to simple recompile once we hit the first dirty file.

      > For the third you would need to check the timestamp of the end of the previous build.
      My proposal is to store a list of all dependency names + their md5 hashes in a file in target/maven-status/. Whenever a dependency gets added/removed/changed (md5 different) then we trigger a 'clean' on this module. That way we can also automatically treat timestamped and even non-timestamped SNAPSHOT dependencies.

  2. I'm still not in favor for the auto-clean option. Suppose you added/changed a test, this would only modify the target/test-classes.
    Now it is really expensive if you would do a clean, since you haven't changed the main code. This could trigger a build of the complete multimodule project.

    Somehow the BuildContext must know its inputfiles and outputfiles. If there's a mismatch between the two, only the targetted outputdirectory ( normally either target/classes or target/test-classes) should be cleaned.

    That would mean that before generate-source and generate-test-sources you need to calculate the compile-plan and test-compile-plan.

  3. Greetings, I'd like to share thoughts and experiences relating to an incremental build system that I have developed and have the go ahead with my company to share with Apache. I'll highlight some key points and see if there is interest from Maven contributors.

    Note, I shared the design approach with Apache Ivy and core contributors expressed interest especially for "large build projects".

    Please comment back if not clear. Here it is.

    1. Careful per module cleaning is critical to handling cases of deleted resources.
    Edit: Being careful includes cleaning using the prior build module's SCM workspace view else you cant clean deleted resources correctly.

    2. Instead of eliminating the per module clean step, you eliminate the unnecessary transitive module rebuilds that are performed as a result of a known module change. The system should not allow the user to turn off the clean or rely on them enabling it. It has to happen.

    3. The system must analyze the semantic change and classify it in such a way that the subsequent transitive rebuilding can, in part, be driven by the nature of the change.

    4. The build system Dependency DAG traversal effectively starts in the changed leaf modules. A "delta" module is one that we are now visiting and the system calls out to a "classifier-indexer" engine before unwinding to consider if next transitive rebuild is needed.

    5. The classifier-indexer only gets called on delta modules.
    This is the jewel. The system uses an artifact introspector and comparator, that given a just produced artifact, say a jar file, uses BCEL to introspect all the class files and update an index that records all the details of what classes,methods,fields,etc the module artifact produces. The comparator compares and then classifies the change as being a) insignificant b)significant

    6. After change classification and index updating, the classified change is captured in a context object and the traversal unwinds the DFS to the next, possibly, dependent module.

    7. The next module in the DAG traversal is now visited and the incremental engine is invoked, and control is passed to a chained plugin system that answers the question "does this module need rebuilding(context)?"

    8. This "is-rebuild-needed" system includes core supplied and user supplied implementations. One core impl considers the index for this module, updated in previous builds, and considers the context classification. If a change is "insignificant" then it returns "no", else the significant change, typically API, is then determined if it really impacts this module.

    9. Now, here's another clever step: some significant change is common like adding API or changing visibility, others uncommon like deletions. Before we rebuild the module because of a significant change we consult the trusted index to determine that the delta change actually does impact this candidate. We have the opportunity to avoid rebuilding blindly and paying for the costly clean+install.

    10. Recall the "is-rebuild-needed is a chain. Even if one answers "no"' another can answer "yes". An example would be one that said it must be rebuilt because it needs to package dependent resources, like a war file. There must be metadata on the module to help that plugin do its work. My system in IVY uses configuration/scope to do this. Maven scope is not as rich as Ivy so anoer way of tagging a module artifact as being a packaging is needed, could be just by file extension like .war.

    11. That's the gist and sounds harder and slower tha it reall is. remember the classification-indexer is only invoked on delta modules. In practice, its millesecs on commodity systems.

    Sorry, if this is not the best way to get e point across, but because i have yet to fold it into a Maven prototype I didn't want to propose it on its own.

    I hope I can be encouraged to work this through with motivated contributors.

    Rich