Friday, November 23, 2012

RfC: Improving Mavens Performance

I am typically working in projects that are relatively complex, like one parent projects and 20 modules, or so. To handle the complexity, I have learned to use and appreciate Maven. OTOH, after 8 years or so with Maven, I am still missing some aspects of Ant builds, in particular the speed. Maven does a good job when it comes to understand Build scripts (biggest problem of Ant), but it can be painfully slow. Why is that? I could name several reason, but the most obvious seems to be that Maven is always building the whole project, whereas Ant allows to implement logic like

   if (module.isUpToDate()) {
     // Build it
   } else {
     // Ignore it
Of course, Ant's syntax is completely different, but that's not the point, unless you are a fanatic XML hater and really believe that a Groovy or JSON syntax is faster by definition (If so, stop reading, you picked up the wrong posting!)
The absence of such an uptodate check isn't necessarily a problem. Most Maven plugins are nowadays implementing an uptodate check for themselves. OTOH, if every plugin does an uptodate check and the module is possibly made up of other modules itself, then it sums up.
Apart from that, uptodate checks can be unnecessarily slow. Suggest the following situation, which I have quite frequently:
A module contains an XML schema. JAXB is used to create Java classes from the schema If the schema is complex, then the module might easily have severeal thousand Java source files.
This means, that the Compiler plugin needs to check the timestamps of several thousand Java and .class files, before it can detect that it is uptodate. Likewise, the Jar Plugin will check the same thousands of .class files and compare it against the jar file, before building it.
That's sad, because we could have a very easy and quick uptodate check by comparing the time stamps of the XML schema, and the pom file (it does affect the build, does it) with that of the jar file. If we notice that the jar file is uptodate with regard to the other two, then we might ignore the module at all: Ignore it would mean to completely remove it from the reactor and not invoke the Compiler or Jar plugins at all. Okay, that would help, but how do we achieve that without breaking the complete logic of Maven? Well, here's my proposal:
  1. Introduce a new lifecycle phase into Maven, which comes before everything else. (Let's call it "init". In other words, a typical Maven lifecycle would be "init, validate, compile, test, package, integration-test, verify, install, deploy" (see this document, if you need to learn about these phases.
  2. Create a new project property called "uptodate" with a default value of false (upwards compatibility).
  3. Create a new Maven plugin called "maven-init-plugin" with a configuration like
       groupid: org.apache.maven.plugins
            artifactId: artifactid>="maven-init-plugin"
            configuration:
               sourceResources:
                 sourceResource:
                   directory: src/main/schema
                   includes:
                     include: **/*.xsd
                 sourceResource:
                   directory: .
                   includes:
                     include: pom.xml
               targetResources: ${project.build.directory}
                   includes:
                     include: *.jar
        (Excuse the crude syntax, I have no idea how to dixplay XML on blogspot.com!
         I hope, you do get the idea, though.)
        The plugins purpose would be to perform an uptodate check by comparing source-
        and target resources and set th "uptodate" flag accordingly.
      


  • Modify the Maven core as follows: After the "init" phase, search for modules with isUptodate() == true and remove those modules from the reactor. Then run the other lifecycle phases.
  • That's it. Perfectly upwards compatible. Moderate changes. Much faster builds. How about that?

    6 comments:

    Nick Stolwijk said...

    Try to break up your build in releasable components. i.e. your JAXB example. I presume your schema is fairly solid, so why don't release your JAXB module in the same way as your schema? Remove it from the aggregator, move it to its own version control and depend on the release version.

    If you can do this with multiple parts of your build/project, it becomes much more manageable.

    Jochen Wiedmann said...

    Nick, while you are, of course, perfectly right and I would do so in anY open source project, I don't do that in my professional work. I attempted to do so in the past and noticed that I traded build speed for a lot of other problems for my colleagues who were unaware of the advantages and consequences.

    Kristian Rosenvold said...

    Technically you're looking for the ability of a mojo to halt the execution plan of a given project build without failing the build. Sounds like such a change should be attacked from that angle

    Nick Stolwijk said...

    @Jochen, I also stumbled upon the "problems" with colleagues, but solved that by teaching them instead of adjusting the build.

    rfscholte said...

    Mark Struberg noticed some issues with the way plugins discover if they should take actions. He's started a wiki-page about the incremental builds, which describes some usecases and strategies how to handle them. That would be the right place to share ideas.

    rfscholte said...

    Hyperlink is not visible, should be https://cwiki.apache.org/MAVEN/incremental-builds.html