Description
Package component level builds
What's the point, what is it ?
Stack use cabal "simple" (which is very close to Setup.hs commands except it's a binary) to actually build packages.
That means, for each package selected in the "Plan", it gathers all the info required by cabal simple and then call it.
Currently stack use cabal simple through package builds, that is, for each package it calls :
- "configure" without naming a component
- "build" with a set of components but it has no effect
- "copy" without naming a component
...etc
Component level builds is basically doing the same as before,
but all the cabal simple calls are targeted at a single component of a package instead
for instance :
- configure sublib
- build sublib
- register sublib
- configure exe1
- build exe1
- copy exe1
For a case where we have an exe1 depending on a sublib1.
Note that in this case the intra-package dependency has to be handled by stack
whereas it's currently handled by cabal simple.
Doing this in stack land, woud probably resolve many issues with over-building stuff, but mostly, it's a hard requirement for making backpack work (backpack cannot work with current style builds). I believe it's enough incentive to adopt this new style. Besides, it'd also bring stack closer to the cabal-install CLI.
Some architecture refactoring
In current stack, we have many occurences of "Set NamedComponent" or "Map StackUnqualCompName XX".
Given the requirements for component based builds, we are going to use a lot more of those in a
even more distinct flavors than now, which I don't think will scale well.
We also have many occurences of Library or Executable (see Installed data type) constructors as well which again is
redundant to some extent.
What I propose is we replace all of these by a a few datatypes, a phantom type and a type family
which would encompass all use cases through the same constructors.
First, the core data structures :
data AllLibrary collection (useCase :: UseCase) = AllLibrary
{ dcLibrary :: !(Maybe (CompInfo useCase StackLibrary))
-- ^ The main library target or information.
, dcSubLibraries :: !(collection (CompInfo useCase StackLibrary))
-- ^ The sublibraries target or information.
}
-- | This subdivision make sense because it reprensts "installable components"
data AllLibExe collection (useCase :: UseCase) = AllLibExe
{ icLibrary :: {-# UNPACK #-} !(AllLibrary collection useCase)
-- ^ The main or sub library target or information.
, icExecutables :: !(collection (CompInfo useCase StackExecutable))
-- ^ The executables target or information.
}
data AllTestBench collection (useCase :: UseCase) = AllTestBench
{ acTestSuites :: !(collection (CompInfo useCase StackTestSuite))
-- ^ The test suites target or information.
, acBenchmarks :: !(collection (CompInfo useCase StackBenchmark))
-- ^ The benchmarks target or information.
}
-- | A data structure to centralize all aspects of component collections,
-- whether it's a Set a Map or a CompCollection or whether you only want component names
-- it should all use the same data structure.
data AllComponent collection (useCase :: UseCase) = AllComponent
{ acForeignLibraries :: collection (CompInfo useCase StackForeignLibrary)
-- ^ The foreign libraries target or information.
, acTestBench :: {-# UNPACK #-} !(AllTestBench collection useCase)
-- ^ The test suites target or information.
, acAllLibExe :: {-# UNPACK #-} !(AllLibExe collection useCase)
-- ^ The executables target or information.
}
And then the use case type family :
-- | These all the use cases for the AllComponent type.
-- This is only meant to be used as an input for the 'CompInfo' type family.
data UseCase
= JustNames
-- ^ Sometimes we only need the names of the components,
| AllCompInfo
-- ^ Or the entire cabal info that we keep, see the "Stack.Types.Component" module.
-- In particular packages components are represented as "AllComponent CompCollection AllCompInfo".
| MissingPresentGhcPkgId
-- ^ When we construct the plan for building packages, we have to track what's
-- been installed and what's missing also at the component level.
| InstalledGhcPkgIdWithLocation
-- ^ When we retrieve the preexisting info from ghc's package database or the file system,
-- we want to know for all packages the library data or executable path they have.
| ModuleFileMap
-- ^ In GHCi we have to keep track of the module files at the component level.
| CabalFileMap
-- ^ In GHCi we have to keep track of the cabal files at the component level.
type family CompInfo (useCase :: UseCase) compType where
CompInfo JustNames _ = StackUnqualCompName
CompInfo AllCabalInfo compType = compType
CompInfo MissingPresent StackLibrary = GhcPkgId
CompInfo MissingPresent _ = ()
CompInfo InstalledGhcPkgIdWithLocation StackLibrary = (InstallLocation, GhcPkgId)
CompInfo InstalledGhcPkgIdWithLocation StackExecutable = InstallLocation
CompInfo InstalledGhcPkgIdWithLocation _ = ()
CompInfo ModuleFileMap _ = Map ModuleName (Path Abs File)
CompInfo CabalFileMap _ = [DotCabalPath]
Now this way appear a bit complicated at first, but there are many benefits to this approach :
- In terms of documentation, we can see at first glance what is it that we do at the component level
whereas it's kind of hard to scrap the code for all the Set NamedComponent/Map StackUnqualComName places. - The selection/targeting of components is easier this way, with the current design we have to check for the type of NamedComponent before walking through its characteristics in the Package datatype.
- We have a finer set construction : it enables type safe component restricted sets (like, give me all the libraries information == AllLibrary xx yy)
Now let's look at a few examples to see how that would look like in practice :
-- | First the package Datatype would we refactored to this, arguably we should unpack it :
packageComponents :: !(AllComponent CompCollection AllCompInfo)
-- And of course, we'd provide the equivalent selectors as before :
packageLibrary = dcLibrary . icLibrary . acAllLibExe . packageComponents
packageSubLibraries = dcSubLibraries . icLibrary . acAllLibExe . packageComponents
packageForeignLibraries = acForeignLibraries . packageComponents
packageTestSuites = acTestSuites . acTestBench . packageComponents
packageBenchmarks = acBenchmarks . acTestBench . packageComponents
packageExecutables = icExecutables . acAllLibExe . packageComponents
Now what about Package dependencies, they have in cabal a set of main or sublibrary dependencies :
-- To represnet this fact we currently have :
data DepLibrary = DepLibrary
{ dlMain :: !Bool
, dlSublib :: Set StackUnqualCompName
}
deriving (Eq, Show)
data DepType
= AsLibrary !DepLibrary
| AsBuildTool
deriving (Eq, Show)
-- That would become :
data DepType
= AsLibrary !(AllLibrary Set JustNames)
| AsBuildTool
deriving (Eq, Show)
The source files are also mapped for ghci through a Map of Named Component :
-- before :
data PackageComponentFile = PackageComponentFile
{ modulePathMap :: Map NamedComponent (Map ModuleName (Path Abs File))
, cabalFileMap :: !(Map NamedComponent [DotCabalPath])
-- ... etc
}
-- after :
data PackageComponentFile = PackageComponentFile
{ modulePathMap :: AllComponent (Map StackUnqualCompName) ModuleFileMap
, cabalFileMap :: !(AllComponent (Map StackUnqualCompName) CabalFileMap)
-- ... etc
}
The InstalledMap datatype which is providing installed things in the ghcPkg database would give :
type InstalledMap = Map PackageName (InstallLocation, Installed)
-- Now things would be a bit finer grained, components in a package can either
-- live in a snapshot or locally :
type InstalledMap = Map PackageName (AllLibExe (Map StackUnqualCompName) InstalledGhcPkgIdWithLocation)
Now you get it, the design would be more normalized and unified, for a small abstraction cost.
It's not strictly necessary to get the component based builds, but I'd say it would make it singnificantly easier.
The idea is to bring in this datatype and then to refactor slowly and step by step where it makes sense.
The actual task list for the component based builds
- Change ConstructPlan to account for component level installed versus to-install GhcPkgId (this would be quite significant).
- Resolve intra-package dependencies (for now we don't, we let cabal decide the order of component builds)
- Top-sort (probably through an insertion sort though) the package components to build (probably only the library components for now) if more than one is required.
- Either subdivide Tasks into smaller parts or only refine task actions. I think for now a good step is to try to do component builds with the one-task-one-package scheme (note that we can already have two tasks per package in case of non-all-in-one builds with tests & benchmarks). That is to say, the first iteration would only bring component-build inside one package task, and then we'd enable a better datatype for task to account for component level aspects.
RFC @mpilgrem
Other issues relating to component-based builds
(EDIT by @mpilgrem) The issue/feature request of component-based builds has a long history at this repository. The following are related issues:
- Better component-based build system #834
- Do all-in-one package builds when there are no cyclic dependencies #1166
- Backpack for Stack #2540
- Internal libraries does not work if there's no main library #3787
- DISCUSS: Drop support for older Cabal library versions #4475
- Begin using component-based builds #4745
- Support public sublibraries #5256
- Support for packages that depend on public sublibraries #5318
- Irreproducibility: Code rebuilds only the first time I switch resolvers #5381
- Support sub-library dependencies #5659
- Support for public sublibrary dependencies #6343
- Deprecate, then remove, Stack's support of Cabal < 2.2 #6377
- stack rebuilds sublibraries that are not dependencies of the specified target #6569