tim | Entries tagged with research

Six weeks short of what would have been my two-year anniversary as a Mozilla full-timer (January 1, 2014), I've decided to leave Mozilla to work at a startup, AlephCloud. I've worked at Mozilla for longer than I've worked at any other full-time job: first six months as an intern, and now a year and ten months as a Research Engineer.

Some people leave a job and say that they're ready to move on. I wasn't ready, and I still had a lot to contribute to Rust. I just didn't see any way that I could continue contributing to it as a Mozilla employee. I'm unlikely to elaborate on that further except in private over a beer or a cup of tea.

I still hope to contribute to Rust as a volunteer. I know it's BS to say "hope" when talking about work; if I end up feeling like I want to do it enough, I know that I will, and if I don't, I know that I won't. That said, if I find myself with time to contribute to any open source project at all, it'll be Rust. For sure, I'm committed to mentoring a Rust intern through GNOME OPW, and I'll still be committed to doing that even as a volunteer. Thus, I'll still be the contact person for anyone still interested in working on a Rust project through OPW. Larissa Shapiro will take over the role coordinator for this round of Mozilla's participation in OPW.

That said, I can't commit to continuing my work on rustpkg while a volunteer, so I'm hoping to be able to identify a new owner for it. I've talked with one person on the core team who would like to take over ownership, and will try to settle the question before my last day.

On a personal level, I've grown a lot over the past two years. I mean both that I've learned a lot about software, and that I've grown as a person. Moreover, I think I understand a lot more about how the software industry works than I did at the beginning of 2011. In general, my colleagues on the Mozilla Research team have set an example both for how to build software, and how to build a community. I wish nothing but success to the Rust project, and what's more, I intend to continue to be there to help the project advance, even if it'll be for a smaller percentage of my time.

My last day as a Mozilla employee will be Friday, November 15, and with just a weekend in between, I'm starting at AlephCloud on Monday, November 18, in the role of Principal Software Engineer. At AlephCloud, I will be programming in Haskell full-time. I wasn't exactly ever expecting to be active in the Haskell community again, but I look forward to doing work that is central to the mission of my company, and it's icing on the cake that the company's language of choice happens to be Haskell. Though I expect I'll be working from various locations, my new office will be in Sunnyvale, so I'm not going far. Especially since my two closest future colleagues both work remotely (outside the Bay Area), I'm hoping to experiment with living outside the Bay Area for a while (student loans don't pay themselves), but I'll only consider that after a couple of months from now.

I've mostly been quiet lately since a lot of what I've been doing is wrestling with Linux test bots that behave slightly differently from other Linux test bots, and that's not very useful to write about (plus, if I did, it would mostly be swearing). However, I do want to make an exciting announcement! Copied verbatim from the mailing list:

Hello,

This year, the Rust and Servo projects are participating in the GNOME
Outreach Program for Women (OPW). OPW is an opportunity for women to
take on a paid, full-time, three-month internship working on one of
several open-source projects. Interns can be located anywhere in the
world and will primarily work from home; some funding will be available
to travel to meet with the project mentors for a brief period of time.

This is the third time that Mozilla has participated as a source of
project mentors, but it's the first time that the Rust and Servo
projects specifically have participated. We're very excited about this
and encourage you to apply if you're a woman who's interested in Rust.
(OPW is an inclusive program: as per their information for applicants,
"This program is open to anyone who is a female assigned at birth and
anyone who identifies as a woman, genderqueer, genderfluid, or
genderfree regardless of gender presentation or assigned sex at birth.
Participants must be at least 18 years old at the start date of the
internship.") If you're not eligible, please spread the word to any
eligible individuals you know who have any interest in Rust or Servo.

No previous experience with Rust is required, but we do request some
experience with C and systems programming, and of course people who have
previously used Rust may make faster progress.

The deadline to apply is November 11.

The details are available at:

https://wiki.mozilla.org/GNOME_Outreach_December2013 (specific to Mozilla)
https://wiki.gnome.org/OutreachProgramForWomen (applying to OPW)

If you are not a woman, but interested in an internship working on Rust,
never fear! If you are currently enrolled in an undergraduate or
graduate program, you can apply for an internship at Mozilla Research
for the winter, spring, or summer term. (Note that OPW is open to both
students and non-students.) We will be announcing this program to the
mailing list as well as soon as the details are finalized.

If you have any questions, please join #opw on irc.mozilla.org and/or
contact Tim Chevalier (tchevalier at mozilla.com, tjc on IRC) (Rust contact
person) or Lars Bergstrom (lbergstrom at mozilla.com, lbergstrom on IRC)
(Servo contact person).

I'm coordinating Mozilla's involvement in OPW this year as well as any potential Rust projects (others may end up mentoring interns depending on the intern's preferred project, but the buck stops with me), so I'd be very grateful if readers would publicize this notice to relevant venues, as well as direct any questions to me (via email, IRC, or in a comment on this post).

The last five days have been one full day of travel, preceded by three full days of Mozilla Summit in Toronto, preceded by another full day of travel. It's been quite a time, with not much sleep involved, but (for me) a surprising amount of code getting written and interesting sessions being attended and pure fun being had (karaoke night Friday and dance party Sunday -- if any of the organizers were reading, both of those were really good ideas). Actually, watching "Code Rush" -- a documentary made in 2000 about the very early history of Mozilla -- with Mozilla folks was quite an experience as well. You can watch it for free online; it's only about an hour, and I encourage you to if you have any interest at all in where we came from. (Passive sexism and other kinds of isms aside.)

So far as sessions, one of the highlights was Jack's talk on the Servo strategy on Saturday -- featuring about 30 minutes of lecture followed by a full 80 minutes of Q&A. The questions were the most focused, on-topic, respectful set of questions I've heard after any talk I've been to in recent memory (and it wasn't just because Jack prepared a list of suggested questions to ask in advance!) The other highlight for me was the excellent "Designing your project for participation" session with David Eaves, Emma Irwin, and Jess Klein on Sunday. One of the reasons why this session was so great was that it was mostly small-group discussion, so everybody got to be more engaged than they might have been if most of the time had been sitting and listening. I came away with a better understanding that a lot of different people are thinking about how to make the volunteer experience better in their specific project, and there are lessons to learn about that that are transferable. For example: volunteers are much more likely to keep participating if they get a prompt review for their first bug; and, it might actually be okay to ask volunteers to commit to finishing a particular task by a deadline.

During downtime, I got a few pull requests in, some of which even landed. One of the bigger ones, though, was #9732, for making automatically checked-out source files read-only and overhauling where rustpkg keeps temporary files. (That one hasn't been reviewed yet.) #9736, making it easier to get started with rustpkg by not requiring a workspace to be created (it will build files in the current directory if there's a crate file with the right name) is also important to me -- alas, it bounced on the Linux bot, like so many of my pull requests have been doing lately, so I'll have to investigate that when I'm back on the office network and able to log in to that bot tomorrow. Finally, I was excited to be able to fix #9756, an error-message-quality bug that's been biting me a lot lately, while I was on the plane back to San Francisco, and surprised that the fix was so easy!

Finally, I request that anyone reading this who's part of the Rust community read Lindsey Kuper's blog post about the IRC channel and how we should be treating each other. Lindsey notes, and I agree, that the incident that she describes is an anomaly in an otherwise very respectful and decorous IRC channel. However, as the Rust community grows, it's my job, and the job of the other core Rust team members, to keep it that way. I know that communities don't stay healthy by themselves -- every community, no matter how small, exists within a kyriarchy that rewards everybody for exercising unearned power and privilege. Stopping people from following those patterns of non-consensual domination and submission -- whether in such a (seemingly) small way as inappropriately calling attention to another person's gender on IRC, or in larger ways like sexual assault at conferences -- requires active effort. I can't change the world as a whole on my own, but I do have a lot of influence over a very small part of the world -- the same is true of most other human beings. So, I'm still thinking about how to put in that effort for Rust. The same is true for community processes as for code: patches welcome.

Once #9654 lands (after ~ 24 hours in the queue, it finally started building only to fail because of an overly long line... and I thought I checked that every time I pushed a new version, but apparently not), we'll have completed the "build Servo with rustpkg" milestone. The next milestone is the first part of community adoption, which, roughly speaking, is the list of features we think rustpkg has to have in order for it to get widespread use in the Rust community (such as it is).

The first thing on that list, which I started working on today, is #6480: making locally-cached copies of source files that rustpkg automatically fetched from github (or, in the future, elsewhere) read-only. The intent behind this bug is to prevent a sad situation -- that really happened to a Go developer, and could happen again with a similar package manager -- where somebody makes local changes to packages that their projects depends on without realizing they're editing automatically-fetched files, and then loses those changes when they redistribute the code (without their local changes).

One solution is for rustpkg to make any source files that it automatically fetches read-only. Of course, anybody can just change the permissions, but that requires deliberate action and (with hope) will remind people that they might be making a poor life decision. That's easy enough to implement, and in fact, I implemented it today.

But I realized that this really goes along with another issue I filed, #9514. Servo depends on many packages that the Servo developers are also hacking on in tandem with hacking on Servo -- so if we started making everything read-only, that would make it hard to work with local (uncommitted) changes.

I think the right thing to do (and as per discussion in meetings, it sounds like it's the general consensus) is to make rustpkg do two things differently: (1) store fetched sources in a subdirectory of a workspace's build directory, not its src directly (it's less confusing if src is only things you created yourself); and (2) always search for sources under src before searching under build. That way, if you want to make local changes to a package that you refer to with an extern mod that points to a remote package, you can manually git clone the sources for it under a workspace's src directory, and then rustpkg will automatically find your locally-changed copy (which it will also never overwrite with newly fetched sources).

I looked at what Go does, and it looks like it does more or less what I want to do in rustpkg: "By default, get uses the network to check out missing packages but does not use it to look for updates to existing packages." So, the only case where rustpkg will pull a fresh copy of a repository off github is if there's no local copy of the sources, or installed version of the binaries for it, that can be found by searching the RUST_PATH.

It's been a while since my last Rust blog post. The latest thing that's happened is that I just submitted a pull request that gives rustpkg packages that have custom build scripts the ability to have C dependencies (or, in fact, dependencies on any sort of artifact that you can write a Rust program to build). There's no magic here -- in fact, for the most part, it's just a matter of writing an example to show what needs to be done. Someday there will be a higher-level interface to rustpkg where you can just declare some source file names and rustpkg will do the rest, but not today. In the meantime, the build script explicitly sets up a workcache context and declares foreign source files as inputs, and so on.

The thing I did have to change in rustpkg itself was changing the install_pkg function (in the API that build scripts can call) so as to take a vector of extra dependencies. Currently, the types of dependencies supported are files (any source files, which get checked for freshness based on hashing their contents and metadata) and binaries (any files at all, which get checked for freshness based on their modification times).

Completing this bug means that (once it's reviewed and merged) I'll have completed the "build all of Servo" milestone for rustpkg, and on time, no less (albeit with less than 30 minutes remaining). Jack has been steadily working on actually building Servo with rustpkg, discovering many issues in the process, and once this pull request is merged, we should be able to do the rest.

I've got tests passing for my fix for #7879, making dependencies work more sensibly. I was proud of my voluminous status update for the past week, so it was a bit sad to only work on one bug today. It was a hard one, though, and I finished it! Aside from fixing tidy errors and such, anyway.

Amusingly/frustratingly enough, I did the real work over the weekend and the work I did today was almost completely centered around testability. I realized I can't use timestamps from stat() to ensure that a given file wasn't rebuilt. Someone else discovered this too: time granularity on Unix is only 1 second, so it's easy for two files that get created in quick succession, but sequentially, to have equal timestamps. Next idea was to compare hashes (which is what workcache already does!), but that won't do since if no dependencies have changed, but a file gets rebuilt anyway, it's likely to be bitwise identical. Jack suggested making the test case set the output to be read-only so that in case the second invocation of rustpkg tries to overwrite it, that would be an error. This required a little bit of fiddling due to details about conditions and tasks that I fought with before (see #9001), but the basic approach works.

So I can get a pull request in for this tomorrow, and happily, it's likely to fix several other bugs without additional work (or with minimal work): #7420, #9112, #8892, #9240, and I'm probably missing some.

I landed four pull requests between middle-of-yesterday and now! See the schedule -- #6408, #8524, #7402, and #8672.

The remaining work that's necessary for building Servo for rustpkg is making dependencies between crates in the same package work right; building C libraries; and (added today) rustpkg test. I worked mostly on the first one today (but worked a little on making a test case for the second one).

A community member reported this bug about rustpkg going into an infinite loop when compiling a package with two crates in it, where one depends on the other. Digging in, I realized that various parts of rustpkg assume that a package contains either only one crate, or several crates that are all independent of each other. The key thing is that it was computing dependencies at the package level, not the crate level. So if you have a package P containing crates A and B, and B depends on A, rustpkg goes back and tries to compile all the crates in P first, rather than just compiling A.

So I'm working on that; it requires some refactoring, but nothing too complicated. I was thinking at first that I would have to topologically sort the dependencies, but after talking to Erick Tryzelaar on IRC, I realized that that approach (the declarative one) isn't necessary in the framework of workcache, where function calls express dependencies. The issue was just that I was treating packages, rather than crates, as the unit of recompilation.

This past week was the Servo work week, so today I also spent some time in person working with Jack to get more Servo packages to build with rustpkg. He found a bug in how extern mod works that I fixed, but only after he had to leave for the airport.

Today I was able to finish the first three items on the "Build all of Servo" milestone on the schedule:

Sub-package-IDs allow you to name a subdirectory of a package directory by its own package ID. For example, if your package foo has an extras directory with several subdirectories, and foo/extras/baz has a crate file in it, you can rustpkg build foo/extras/baz without building the other crates in the foo package. This pull request landed already.
Recursive dependencies: actually, this almost worked before, and it was a matter of fixing a bug that assumed that all of the dependencies for a package (except for system libraries) were in the same workspace. This is building on the bots and should land in another ten minutes or so.
Installing to RUST_PATH: previously, rustpkg would install a package to the same workspace its sources were in. Now, it installs the package to the first workspace in RUST_PATH, if you've set a RUST_PATH in the environment; if you didn't, it defaults to the old behavior. This is in the queue to build.

I'm working on putting build output in a target-specific subdirectory now, which is easy except for cleaning up all of my crufty test code that made assumptions in lots of different places about the directory structure. Along those lines, I'm also cleaning up some of the test code to make it less crufty.

I looked at #7879 (infinite loop compiling dependencies) too, and it's harder than I thought; right now, rustpkg assumes it's okay to build the crates in a multi-crate package in any order. That's not necessarily true (as the examples given in the original bug and in my comment show) and so rustpkg can end up telling workcache that there's a circular dependency between two files (when there's not). I think fixing this will require topologically sorting the crates in a package in dependency order, which in turn will actually require... writing a graph library in Rust, since there isn't one. So that will be fun, but I don't have time today.

Today's pull request: making rustpkg find dependencies recursively. It wouldn't be much of a build system if it didn't do that, and it actually almost did without deliberate effort -- the missing piece was making the code that infers dependencies and searches for them actually search different workspaces other than the workspace of the package that has the dependencies. Other than that, it was just a matter of adding a test case, and then #8524 can close.

I just submitted a pull request that implements most of the commonly-used rustc flags as flags for rustpkg. This took longer than it would have taken if I had taken the approach of just passing any "unknown" flag to rustc, but I think it's useful for rustpkg to do a bit of checking so as to tell you if you're passing nonsensical combinations of flag and command. For example, most rustc flags don't make sense unless you're either building or installing a package, so rustpkg errors out in those cases (for example, if you do rustpkg list --opt-level=3).

I didn't add a test case for every flag, and there's still some refactoring of boilerplate that could be done, but I prioritized getting this done on schedule over completeness.

Now that workcache is in the queue, I've moved on to teaching rustpkg how to pass command-line flags to rustc. At first, I thought I was going to implement a syntax like:

rustpkg +RUSTC --linker myspeciallinker --emit-llvm -RUSTC build foo

(ripped off GHC's +RTS/-RTS options), where +RUSTC and -RUSTC delimit arguments that rustpkg should just pass straight to rustc. But then I realized that Rust's getopts library doesn't support this kind of flag. I didn't want to change it, and also realized that many flags are only valid when you're actually building or installing a package. So, I decided to enumerate the flags you can use in rustpkg (a subset of rustc's flags), as well as the subset of flags that work only with build, or with build and install.

So far, this approach is working fine; I'm writing a test for each flag, as well as a test that the same flag gets rejected when in the wrong context. I don't expect any surprises with this one.

Short post, since I stayed up late getting the workcache pull request ready. But: the workcache pull request is ready! Once this lands, I won't feel like I have to use finger air quotes anymore when saying that I'm working on a build system.

I thought I might be able to come up with something profound to say about it, but apparently not.

Today, I finished the initial work to integrate rustpkg with the workcache library, so that it remembers dependencies and only recompiles sources when they or their dependencies have changed. I'll write about that in a later post, though; for now I want to talk about a bug, or infelicity, that I discovered today.

Conditions are Rust's answer to exceptions; they're a lot simpler and more predictable than exceptions, and the conditions tutorial is quite clear, so you should probably just go read that first.

I've tried to exercise conditions a fair bit in rustpkg, since they are a fairly new feature and it's good to be able to see how they work out. For testing purposes, I wrote one regression test that makes sure the right conditions get raised when you try to install a package that doesn't exist.

When I finished with workcache, that test failed. Which was weird at first, since avoiding recompilation shouldn't affect the behavior of a test that never compiles anything. Eventually, I realized the problem: conditions don't propagate to different tasks. Before adding workcache, rustpkg was totally single-task; but, when workcache does a unit-of-work-that-can-be-cached, it spawns a new task for it. That's great, because it means we get parallel compilation for free (at the crate level, anyway). But, it means that if the process of compiling one crate raises a condition, it will manifest as that entire task failing.

I'm not sure what to do about this in the long term, but for now I just changed that test case to use task::try, which catches task failure, and to test something weaker than it did before (that the task fails, rather than that it raises three particular conditions in sequence).

That solves my problem, but I'm not totally sure that this interaction between conditions and tasks was intentional, so I filed an issue. It seems to me like it's confusing to have to know whether a particular piece of code spawns multiple tasks in order to know whether the condition handlers you wrap around it will actually handle all the conditions that that code raises. That seems like an abstraction violation. On the other hand, maybe I'm misguided for thinking that conditions should be used for anything but signaling anomalous cases within a single task.

I'm continuing to make progress on workcache/rustpkg integration. Today I realized that I'd been doing things a bit wrong, and that it would be important for rustpkg to declare all the inputs (for a particular package) before starting to do actual work (building).

Let me back up a bit, though. workcache is a library for tracking cached function results. A function, in this case, is the process of generating build artifacts (libraries and/or executables) from inputs (source files). workcache knows about declared inputs, discovered inputs, and discovered outputs. In the case of rustpkg:

A declared input is a source file that you know about when you start building. For example, if the user wanted to build package foo in someworkspace, and someworkspace/src/foo/lib.rs exists, then you know that someworkspace/src/foo/lib.rs is a declared input.
A discovered input is a source file or binary file that you discover while searching for dependencies. In the previous example, if lib.rs contains extern mod bar, and rustpkg determines that bar lives in $HOME/.rust/lib/libbar-whatever-someversion.dylib, then $HOME/.rust/lib/libbar-whatever-someversion.dylib gets added as a discovered input.
A discovered output is what you're trying to build when you build the package. So, in this example, someworkspace/lib/libfoo-whatever-someversion.dylib would be a discovered output.

The code I'd written so far was interleaving declaring inputs with doing actual work (building, which produces discovered output). That won't do, because when you look up a cached function result, you're looking it up by the name of the function paired with the list of declared inputs. If you don't know all the inputs yet, you won't get the right function result, and you also won't save it in a way that lets it be found later.

So, I fixed that, and am putting it all back together now. It's slow going since workcache is not entirely documented (although it has more documentation than some Rust libraries do!) and I've had to add functionality to it.

I know my daily posts are usually pretty low-level, so today I want to step back just a bit and explain a little bit of how rustpkg works. rustpkg is heavily modeled on the Go package manager.

rustpkg operates on workspaces. A workspace is a directory. It has several subdirectories: src, lib, bin, build. rustpkg creates the last three automatically if they don't exist.

If you are using rustpkg, you probably want to build and install Rust code that you wrote yourself. You probably also want to build and install code that other people wrote, living in version control repositories that are available online (for example, on Github).

When you check out a remote repository to build, you check it out into a workspace you already created. So if you want to use my library called "quux" (which doesn't really exist), you would do:

mkdir -p Workspace/src cd Workspace/src git clone http://github.com/catamorphism/quux cd .. rustpkg install quux

This would make a directory called Workspace/src/quux. Here, quux is a package ID. A workspace can contain several different packages with different IDs.

rustpkg actually makes this a little bit easier for you. You can also do:

mkdir -p Workspace/src cd Workspace rustpkg install github.com/catamorphism/quux n.b. Next paragraph edited for clarity on 8/29/2013

This would make a directory called Workspace/src/github.com/catamorphism/quux. Since the source of the package is a remote repository, here the package ID is a URL-fragment: github.com/catamorphism/quux. Supposing quux is a library, it would be installed in Workspace/.rust/lib/libquux-HASH-VERSION.dylib, where HASH is computed by rustc and VERSION is quux's version (as per the most recent git tag). You might assume that it would be installed in Workspace/lib/..., but since the source code was fetched remotely, rustpkg uses the first workspace in the Rust path as the destination workspace. By default, the first workspace in the Rust path is CWD/.rust,, where CWD is the current working directory. You can change the Rust path by changing the RUST_PATH environment variable, but I'll get to that.

You can use one workspace for many different projects. The idea is that the sources for each project live in its own subdirectory under Workspace/src (or whatever you want to call your workspace). Or, you can make one workspace per project. Then the src/ directory would contain one subdirectory for the sources for your project, and one subdirectory for the sources for each of its dependencies.

The motivation here is to avoid so-called "dependency hell" by compartmentalizing the dependencies for different projects, rather than storing all libraries in one system directory. So if project A needs version 1.2 of library Z, and project B needs version 3.6 of library Z, no problem! The downside is potential multiple copies of libraries; we'll see how that works out in practice.

You do have a choice if you want to avoid multiple copies: you can use the RUST_PATH. RUST_PATH is an environment variable that rustpkg knows about; it's a colon-separated list of workspaces. Like:

declare -x RUST_PATH=/home/tjc/rust:/home/tjc/servo:/home/tjc/miscellaneous-useful-libraries

(if you use bash, anyway.)

It's assumed that if you add a directory to the RUST_PATH, that directory is a workspace: that is, it has a src/ directory with one or more project-specific subdirectories. If it doesn't have a src/ directory, rustpkg won't find any source files -- or installed libraries -- there.

After I implemented all of that, Jack began trying to port some of the Servo submodules to rustpkg. He ran into the issue that rustpkg always installed the build artifacts into the same workspace as the sources, assuming the sources are in a workspace. He wanted to be able to have the source in one workspace, and tell rustpkg to put the build artifacts into a different workspace. As he pointed out, RUST_PATH represents both "where to find binaries" and "where to find sources". This was by design, but that doesn't mean it's necessarily the best design.

Jack wanted to be able to support the Servo submodule scenario without too much restructing. Once ported to rustpkg, the Servo repository will look something like:

servo/src/bin.rs servo/deps/src/foo/lib.rs ...

bin.rs is the main crate module for the Servo executable. foo is an example submodule of Servo; lib.rs is its crate module. There are many other submodules under servo/deps/src. We want to be able to build foo with rustpkg and put its build output in servo/build and servo/lib, so that they can be found when building Servo with rustpkg.

At this point, I said: why not just put the deps directory under servo/src? Jack's answer is that the Servo repository has many subprojects, so the developers want to organize them somewhat. And if the deps directory has subdirectories, the subdirectory names become part of the package ID. Since these subprojects all have their own git repositories, it starts looking a little strange for the local directory structure to affect how you refer to the package.

Then we considered whether or not to just use the extern mod foo = ... mechanism for everything. That is, don't manually store the subprojects locally; rather, write a directive in the Servo crate like extern mod foo = "github.com/mozilla/foo"; that tells rustpkg "fetch the sources in the repository named mozilla/foo on github.com, cache a local copy, and build them". That way it's up to rustpkg how to organize the local files, and the developers won't have to think about it. However, this solution does not work well if the Servo developers often make local uncommitted changes to the libraries they depend on, concurrently with developing Servo. In this scenario, rustpkg will use the revision it found in the remote repository, and not the locally modified version.

The solution we came up with is to change what's allowed to appear in the RUST_PATH. So when building Servo, if somebody wanted to build a particular submodule -- rust-geom -- first, their RUST_PATH would look like:

RUST_PATH=$HOME/servo:$HOME/servo/src/support/geom/rust-geom

and they would type rustpkg build rust-geom to install rust-geom into the $HOME/servo workspace.

I have a pull request in the queue to implement this change. I wasn't sure whether we wanted it to be transitional -- until, perhaps, the Servo repository gets restructured to make it conform to rustpkg's view of the world -- or permanent, so I put it behind a command-line flag, --rust-path-hack. Perhaps instead, rustpkg should conform to Servo's view of the world, since we are motivated by wanting to build Servo and rustc with rustpkg.

The main advantage of allowing the RUST_PATH to only contain workspaces seems to be simplicity. But for a project with many submodules that may be organized in an arbitrarily complicated way, perhaps making the RUST_PATH more flexible makes it simpler to build packages.

Today, I learned something unpleasant about git. Specifically, that if you:

git clone somegarbage a/b/c

where somegarbage isn't a valid repo, it will happily create a/b anyway, and leave them sitting there, empty.

In the context of rustpkg, this was bad because it resulted in a src directory getting created, which resulted in a non-workspace directory getting assumed to be a workspace. Very confusing. There went several hours.

The right thing to do when we try to git clone a package, I think (Eridius on IRC suggested this first, and I initially rejected it) is to make a temporary directory, clone into that, and if the clone is successful, copy it into the actual destination.

Easy enough, right? No, because we don't have a "rename directory" function in the standard library yet; there's one in std::libc, but it should have a safe wrapper. Rather than taking that on today, though, I'm going home.

This was all in the context of implementing the RUST_PATH hack that I talked about yesterday. Jack says that when this patch lands, he'll be able to port 11 Servo packages to build with rustpkg!

Also, in the team meeting today, we discussed my new proposed schedule for rustpkg, and made some adjustments. I'm glad I mostly had the right idea about which bugs were high-priority.

I started off the day by blowing away my build directory in one of my workspaces because last Friday, I was getting a mysterious compiletest crash, sometimes complaining about "too many open files". Building from scratch did not make the crash go away, but asking Brian did; he said there would be a pull request fixing the problem soon, but in the meantime, I could run make check with RUST_THREADS=2, and that worked.

I landed #8712, which fixes #7241, which is making the scenario where copies of the same package exist in multiple workspaces work right. Normally, rustc errors out when it finds multiple crates that match a particular identifier. I changed filesearch so it treats searching directories in the RUST_PATH specially, and breaks out of the loop if it finds a matching crate in one of the RUST_PATH directories.

In the queue is #8773, which implements the --version flag for rustpkg. It's a little thing, but it's nice to have that work right, and it turned out to be just a matter of calling the right function in rustc-as-a-library.

I'm also progressing on #7075, using workcache. Last week I had the problem that I wanted to use borrowed pointers inside an @-function, which is verboten in Rust. I realized that the only reason I needed an @-function was that I was calling a function in libsyntax called each_view_item that I wrote, which uses the SimpleVisitor trait. Long story short, SimpleVisitor uses @-functions (probably because no one got around to rewriting that code), but Visitor is more general and takes &-functions (more or less any sort of closure). So I rewrote the code to use a Visitor, which also made it shorter, since Visitor is fully default-method-ized. Happiness! Now at least my code compiles, though the behavior isn't right yet.

I also finished a temporary change that Jack requested to make it easier to port Servo submodules to rustpkg; I changed the way that RUST_PATH works so you can list a package directory (that is, a subdirectory of a workspace) and not just a workspace in the RUST_PATH, and rustpkg will find source files there but install to a workspace. In addition, it doesn't behave this way unless you pass in the --rust-path-hack flag. I had Jack try it out, and he pointed out that one scenario worked, but one other scenario -- where the current working directory is not a workspace -- didn't. I guess it would be more consistent to allow the CWD to be a non-workspace when the hack is enabled, so I'll change that.

Finally, I did some work on #6408 (allowing package IDs to point into the middle of a repository), finishing the test case and figuring out where in the code I needed to start extending things. First of all, I can see that I need to fix find_and_install_dependencies, the function that scans through extern mods and infers dependencies, so that if it doesn't find an already-compiled library, it doesn't assume the sources for the library being imported live in the same workspace as the importer. I can't believe I didn't already have a test that would have found that, but that's how it goes.

Also, if you would like to read a much more pedagogical post about Rust that doesn't assume all kinds of obscure knowledge, you should read Lindsey Kuper's post on default methods!

Today: meetings, trying my best to get a miscellaneous pull request (collecting together small changes, including enabling some more rustpkg tests) to go through, working on workcache some. I changed the freshness function for files so that just changing the datestamp, even without changing the contents, counts as changing the file. But, I realized that I'm not declaring any discovered dependencies as inputs to workcache; only the files in the main package we're building. That won't do. Fixing it tomorrow.

I got rustpkg to use workcache to enough of a degree that one of the existing tests fails because it isn't rebuilding a file when it's not necessary to :-) The old test used "touch" to make a dependency look as if it had changed, then ran rustpkg and checked datestamps. Since workcache works based on a hash of the file contents, the test fails even though it works correctly.

So that's exciting! Despite Graydon explaining it to me, I hadn't quite grokked how workcache works till now. It's a quite neat idea: most build systems (including) make are declarative. You write a whole other program to explain how to build your program (albeit in a very weird programming language, usually). Workcache, and a system called fbuild that it's based on, are imperative. You express your dependencies in your usual programming language, specifying how to determine if a particular target needs to be rebuilt. That made more sense in my head than it did when I tried to write it down just now.

In any case, the missing piece that I filled in today was the freshness map, which is part of the state that you have to pass around for workcache to work. The freshness map is a hash table that maps "kinds" (just strings) onto functions. The "kinds" in this case are "source file", "executable or library file", "URL", and maybe a few other things. The function takes a name (for example, a file path) and a value (for example, a hash of the file's contents) and returns a boolean indicating whether the given name and hash are up-to-date. The "given" name and hash come from the workcache database (which is persistent) and the freshness function is the code that (in this case) reads in the contents of the file, hashes it, and compares the two hashes for equality.

Most of the time that took was understanding the code well enough to write the previous paragraph; once I understood that, the code was quite simple.

Next step is to change the test cases to reflect that rustpkg won't just rebuild if the timestamp changes[*], and make sure they change the contents of the file if they want to test behavior-if-a-file-changes. But I feel like things are moving now.

[*] Or maybe it should, I'm not sure, but in that case the freshness function would have to be changed so that the hash is a hash of both the file's contents and its metadata.

What I meant to do today: work on workcache.

What I did today: un-ignore most of the rustpkg tests -- except for the ones that depend on features that aren't implemented yet (workcache, and the do, test, and info commands). A bunch of the tests were xfailed because the tests weren't playing nicely with cross-compilation. That's fixed now, so they should work. But I spent most of today getting one test to work, which was one of the tests for custom build logic. The hard part had nothing to do with custom build logic, but rather to do with library names that contain dashes. Previously if you called your library foo-bar, rustpkg would generate a library called libfoo_bar-xxxxx-0.1.dylib (on Mac, anyway), where xxxxx is a hash. Now that we separate the name used as an identifier in Rust code from the actual package ID, we can call the library libfoo-bar-xxxxx-0.1.dylib, but since '-' also separates the name, hash, and version, the old parsing code didn't work anymore. Now it does! (I'm not completely sure that we should be naming libraries this way anyway, but hey, everything works now, or should.)

Running a try build now to make sure everything works on the bots where the host and target are potentially different, and then I'll submit a pull request. And, uh, tomorrow I'll work on workcache.

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tim's journal

Taking metaphors too far since 1995

TMI: Changing Roles

TMI: Rust and Servo participating in the GNOME Outreach Program for Women this year

TMI: Summit and beyond

TMI: Bringing rustpkg into the real world

TMI: Building C dependencies in custom build scripts

TMI: The sweet taste of passing tests

TMI: Packages vs. crates

TMI: Busy day

TMI: Transitive dependencies

TMI: Finishing command-line arguments

TMI: Command-line arguments

TMI: workcache: finished!

TMI: Concurrent conditions

TMI: workcache progress

TMI: Making rustpkg work

TMI: What I learned about git today

TMI: I did all these things today!

TMI: Miscellaneous

TMI: Workcache!

TMI: While my try build runs...

Profile

November 2021

Links

Syndicate

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags