Description
This is a meta-issue that provides quick overview of what needs to be done in order to change dependency tracking from source file-based to class name-based.
Tasks
Class-based dependency tracking can be implemented in two phases. The third phase is dedicated to testing on large projects.
Invalidate by inheritance based on class dependencies (phase 1)
Tasks marked as completed are completed only in a prototype implementation. No changes has been merged to sbt yet.
- add tracking of declared classes in source files; we want to invalidate classes but at the end we need className -> srcFile relation because we can recompile source files only
- add dependency tracking of top-level (not owned by any other class) imports; we'll assign dependencies introduced by those imports to the first top level class/object/trait declared in the source file (see notes below for details)
- track dependencies at class level (and use
declaredClasses
to immediately map them back to files until the rest of the algorithm is implemented) - add tracking of children of sealed parents; we should track this in
API
object corresponding to sealed parent; this will enable us to perform proper invalidation when a new case (class) introduced to a sealed hierarchy - add per (top level) class tracking of api hashes; tracking of api hashes per each (even nested) class separately will be done in phase 2
- handle dependencies coming from local and anonymous classes (see discussion below)
- switch invalidation of dependencies by inheritance to invalidate at class level instead of at source level (would fix Adding a method to a class forces dependent (by inheritance) files to recompile #2320)
- distinguish between source and binary class names. Introduce a new relation
binaryClassName: Relation[String, String]
and use it whenever the conversion is needed.
Track name hashes per class (phase 2)
- refactor tracking of APIs to be done at class level instead of at source level
- extract APIs of all classes (including inner ones) separately
- track name hashes at class level instead of being at source level (see also: Adding/removing/renaming a class always forces dependent source file to recompile #2319)
- implement the rest of invalidation logic (member reference, name hashing) based on class names (get rid of mapping through
declaredClasses
in almost entire algorithm) (would fix Adding a method to a class forces dependent (by inheritance) files to recompile #2320)
Only once the last bullet is merged we'll see improvement to incremental compilation when big number of nested classes is involved (e.g. like in Scala compiler's case).
Testing, bug fixing and benchmarking (phase 3)
- test with the Scalatest repo
- test with the scala repo using the sbt build
- test with the specs2 repo
- fix handling of anonymous and local classes defined in Java (similar logic will have to be implemented as for handling local Scala classes) (Java Anonymous and local classes are not handled properly zinc#192)
- index classes only declared (not inherited) in source files (zinc's handling of As Seen From zinc#174)
- benchmark
;clean;compile
performance - simplify classpath and Analysis instance lookup in incremental compiler (I think number of classpath lookups can be reduced now)
Merge changes upstrem, prepare for shipping (phase 4)
- determine the location where this work will be merged into
Targeted sbt version
This most likely is going to be shipped with sbt 1.0-M1.
Benefits
The improved dependency tracking delivers up to 40x speedups of incremental compilation in scenarios tested. Check benchmarking results here: #1104 (comment)
The speedups are caused by fixing two main issues:
- The Adding/removing/renaming a class always forces dependent source file to recompile #2319 would be fixed once name hashes are tracked per class. This way introduction of a new class and members coming with it would not affect source files dependent (by member ref) on already existing classes.
- Effects of adding members (like methods) to a class would affect only classes that inherit from that class. At the moment, adding a member to a class that nobody inherits from can trigger invalidation of all descendants of all classes defined in the same source file (see Adding a method to a class forces dependent (by inheritance) files to recompile #2320 for a specific scenario).
The 1. is likely to be triggered by any code base that uses more than one class defined in a single source file. The 2. is affecting code bases with big number of nested classes that are inherited.
One example is Scala compiler itself. Even with name hashing, we invalidate too much and code edit cycle becomes long whenever a new member is introduced.
This work described in this issue is funded by Lightbend. I'm working on it as a contractor.