TMI: Making rustpkg work
Aug. 28th, 2013 10:44 pmI know my daily posts are usually pretty low-level, so today I want to step back just a bit and explain a little bit of how rustpkg works. rustpkg is heavily modeled on the Go package manager.
rustpkg operates on workspaces. A workspace is a directory. It has several subdirectories: src, lib, bin, build. rustpkg creates the last three automatically if they don't exist.
If you are using rustpkg, you probably want to build and install Rust code that you wrote yourself. You probably also want to build and install code that other people wrote, living in version control repositories that are available online (for example, on Github).
When you check out a remote repository to build, you check it out into a workspace you already created. So if you want to use my library called "quux" (which doesn't really exist), you would do:
mkdir -p Workspace/src
cd Workspace/src
git clone http://github.com/catamorphism/quux
cd ..
rustpkg install quux
This would make a directory called Workspace/src/quux. Here, quux is a package ID. A workspace can contain several different packages with different IDs.
rustpkg actually makes this a little bit easier for you. You can also do:
mkdir -p Workspace/src
cd Workspace
rustpkg install github.com/catamorphism/quux
n.b. Next paragraph edited for clarity on 8/29/2013
This would make a directory called Workspace/src/github.com/catamorphism/quux. Since the source of the package is a remote repository, here the package ID is a URL-fragment: github.com/catamorphism/quux. Supposing quux is a library, it would be installed in Workspace/.rust/lib/libquux-HASH-VERSION.dylib, where HASH is computed by rustc and VERSION is quux's version (as per the most recent git tag). You might assume that it would be installed in Workspace/lib/..., but since the source code was fetched remotely, rustpkg uses the first workspace in the Rust path as the destination workspace. By default, the first workspace in the Rust path is CWD/.rust,, where CWD is the current working directory. You can change the Rust path by changing the RUST_PATH environment variable, but I'll get to that.
You can use one workspace for many different projects. The idea is that the sources for each project live in its own subdirectory under Workspace/src (or whatever you want to call your workspace). Or, you can make one workspace per project. Then the src/ directory would contain one subdirectory for the sources for your project, and one subdirectory for the sources for each of its dependencies.
The motivation here is to avoid so-called "dependency hell" by compartmentalizing the dependencies for different projects, rather than storing all libraries in one system directory. So if project A needs version 1.2 of library Z, and project B needs version 3.6 of library Z, no problem! The downside is potential multiple copies of libraries; we'll see how that works out in practice.
You do have a choice if you want to avoid multiple copies: you can use the RUST_PATH. RUST_PATH is an environment variable that rustpkg knows about; it's a colon-separated list of workspaces. Like:
declare -x RUST_PATH=/home/tjc/rust:/home/tjc/servo:/home/tjc/miscellaneous-useful-libraries
(if you use bash, anyway.)
It's assumed that if you add a directory to the RUST_PATH, that directory is a workspace: that is, it has a src/ directory with one or more project-specific subdirectories. If it doesn't have a src/ directory, rustpkg won't find any source files -- or installed libraries -- there.
After I implemented all of that, Jack began trying to port some of the Servo submodules to rustpkg. He ran into the issue that rustpkg always installed the build artifacts into the same workspace as the sources, assuming the sources are in a workspace. He wanted to be able to have the source in one workspace, and tell rustpkg to put the build artifacts into a different workspace. As he pointed out, RUST_PATH represents both "where to find binaries" and "where to find sources". This was by design, but that doesn't mean it's necessarily the best design.
Jack wanted to be able to support the Servo submodule scenario without too much restructing. Once ported to rustpkg, the Servo repository will look something like:
servo/src/bin.rs
servo/deps/src/foo/lib.rs
...
bin.rs is the main crate module for the Servo executable. foo is an example submodule of Servo; lib.rs is its crate module. There are many other submodules under servo/deps/src. We want to be able to build foo with rustpkg and put its build output in servo/build and servo/lib, so that they can be found when building Servo with rustpkg.
At this point, I said: why not just put the deps directory under servo/src? Jack's answer is that the Servo repository has many subprojects, so the developers want to organize them somewhat. And if the deps directory has subdirectories, the subdirectory names become part of the package ID. Since these subprojects all have their own git repositories, it starts looking a little strange for the local directory structure to affect how you refer to the package.
Then we considered whether or not to just use the extern mod foo = ... mechanism for everything. That is, don't manually store the subprojects locally; rather, write a directive in the Servo crate like extern mod foo = "github.com/mozilla/foo"; that tells rustpkg "fetch the sources in the repository named mozilla/foo on github.com, cache a local copy, and build them". That way it's up to rustpkg how to organize the local files, and the developers won't have to think about it. However, this solution does not work well if the Servo developers often make local uncommitted changes to the libraries they depend on, concurrently with developing Servo. In this scenario, rustpkg will use the revision it found in the remote repository, and not the locally modified version.
The solution we came up with is to change what's allowed to appear in the RUST_PATH. So when building Servo, if somebody wanted to build a particular submodule -- rust-geom -- first, their RUST_PATH would look like:
RUST_PATH=$HOME/servo:$HOME/servo/src/support/geom/rust-geom
and they would type rustpkg build rust-geom to install rust-geom into the $HOME/servo workspace.
I have a pull request in the queue to implement this change. I wasn't sure whether we wanted it to be transitional -- until, perhaps, the Servo repository gets restructured to make it conform to rustpkg's view of the world -- or permanent, so I put it behind a command-line flag, --rust-path-hack. Perhaps instead, rustpkg should conform to Servo's view of the world, since we are motivated by wanting to build Servo and rustc with rustpkg.
The main advantage of allowing the RUST_PATH to only contain workspaces seems to be simplicity. But for a project with many submodules that may be organized in an arbitrarily complicated way, perhaps making the RUST_PATH more flexible makes it simpler to build packages.
What am I doing wrong?
Date: 2013-09-16 05:38 pm (UTC)Re: What am I doing wrong?
Date: 2013-09-16 06:31 pm (UTC)As for the other four commands, those aren't supported yet but I should make the error messages better -- I'll update https://github.com/mozilla/rust/issues/6365
Re: What am I doing wrong?
Date: 2013-09-16 06:32 pm (UTC)Re: What am I doing wrong?
Date: 2013-09-17 09:48 pm (UTC)Re: What am I doing wrong?
Date: 2013-09-18 09:07 am (UTC)Re: What am I doing wrong?
Date: 2013-09-20 10:22 pm (UTC)Thanks!
Re: What am I doing wrong?
Date: 2013-10-16 05:12 am (UTC)