1. 18
  1.  

    1. 7

      I’ve also thought a kind of stackful coroutine in which stackframes are stored as effectively a tree would be an interesting approach to concurrency. It’s not one Rust could pursue, but if I were designing a language that didn’t have Rust’s strong commitment to “the C runtime,” I would start with Rust’s async design from a type perspective but using this sort of tree stackful coroutines so as to allow recursion and arbitrary arity concurrency.

      1. 4

        What surprised me during writing of the above is that the scheme where you split “sync” and “async” parts of the stack actually seems to be zero-cost compatible with the C runtime! That is, just statically coloring the functions allows removing most of the Go machinery and allows direct calls from async to non-async land.

        You of course can’t seamlessly call from sync to async, but even that seems partially tractable: if you further color async functions into “has static stack upper bound” and “doesn’t have static upper bound”, you could use the former subset more or less exactly as Rust futures.

        1. 4

          Maybe. But a lot of this would have been challenging in Rust for a more mundane engineering reason: poor affordances in LLVM and limited ability in the Rust project to modify LLVM in the way that would be necessary to support such a thing. Designing async to be a “front-end only” feature, syntactic sugar for a library, was much more feasible with the skillset available to us.

          1. 2

            That’s most definitely true! In general, it’s amazing how much Rust got done with so little for async/await!

    2. 2

      Very interesting post, as one would expected when seeing that domain name :)

      Apologies for the long comment, I’m not sure how to condense it more.

      I’m not sold on concurrently, especially as a function because it hides the repetition too much. I think we should build on standard for and while since those are so well known, and on top of AsyncIterator which also make sense coming from sync.
      Here’s a draft idea (error handling omitted):

      async fn root_task(group: &TaskGroup) {
          let listener = TcpListener::bind(":0");
      
          // `max_concurrency` creates a new child group where the limit applies
          // `incoming` consumes the listener and yields each new stream
          group.max_concurrency(4).for_each(listener.incoming(), async |stream| {
              do_stuff(group, stream).await; // re-use the root group so client handling isn't bound by the limit of 4
          }).await;
      
          // Wait for all tasks
          // With error handling this would return all errors that occurred in the group
          group.join().await // wait for all tasks
      }
      

      Admittedly, for_each looks similar to concurrently, but I think it’s more obvious what happens. And by using a group we get a single point for extending the API, similar to how we have the Iterator trait where we can add methods.
      Though I’m not sure how to add sugar on top of group.for_each to make it look more like a standard for since there’s no equivalent to group there.

      I think the key point is concurrently and group.for_each are responsible for polling both what we loop on, and the loop body future(s).

      I’ve written something similar in Go: parcour’s (parallelism concurrency) JobGroup, though it’s not as extensive as I’d like, it handles error propagation and ties the cancellation to the group (you can still have finer grained cancellation if you want using a child group + escape hatches for compatibility with the wider ecosystem).
      The most relevant existing group types are:

      • cancel on error: one task failure triggers group cancellation, affecting all sibling and descendant jobs/tasks
      • max concurrency: limits how many tasks can run just like the Semaphore example in the blogpost, or the draft above

      That is, cancellation is not “I cancel thou”. Rather it is “I ask you to stop, and then I cooperatively wait until you do so”.

      YES!! PLEASE!!!1

      Go’s context package implements that idea, and IMO it’s the best feature of Go! I implore anyone thinking about cancellation as a request to look at the docs and more importantly how it’s used (which the docs unfortunately don’t show).

      Callers can request cancellation when they want (see WithCancel), and the API provides easy ways to setup timeouts/deadlines. On the callee side you have to check if the context is “done” regularly (“user” code rarely need to, blocking calls are generally in the stdlib).
      Cancellation is propagated up the stack via standard error handling.

      The most important part of the design is it’s based on nesting, this should really be copied by anyone making something similar: there’s some root contexts (TODO or Background) and all other contexts are children of another. That way cancellation is always chained: any parent task requesting cancellation ends up setting the flag that the current/leaf task sees. Of course cancelling a child context doesn’t cancel the parent.

      The most unfortunate thing about this API is it was invented after the initial Go releases so it’s not integrated everywhere, including in the stdlib where a lot of blocking APIs don’t make use of it (or do via more obtuse means than a function parameter).

      To apply all that to the blogpost’s CancellationSource, it should be constructed as new(parent &Self) so if parent is cancelled, the child is too; but you can cancel the child without affecting the parent.
      And you need a root source, which I would explore only allowing the runtime to create and pass to the root task so no one can break that chain by creating a second unlinked root source.

      Applying it to my draft from above, the TaskGroup would have a method resembling span(&self) -> &GroupSpan, where GroupSpan would be close to Go’s Context or the CancellationToken.
      All that should tie together structured concurrency and cancellation, making it easy for a task to control how its children run and for how long (assuming they’re considerate of the span).

      I guess I have to try making a crate with all this now :D

    3. 2

      The example of processing requests and updating a database and cache lends itself nicely to the kind of pipelining that using LMAX Disruptors offer.

      It seems to me that people seem to prefer, or at least a lot more effort is put into, async-like solutions to parallel programming compared to pipelining.

      Does anyone have thoughts on why that might be the case?

      1. 2

        For parallel programming, pipelining gives you at most constant-times speedup, where the constant is defied by application topology, and is usually small (smaller the number of CPU cores a laptop has these days). It’s much more fruitful to re-phrase the problem in an embarrassingly parallel way, to get infinite speedup.

        For concurrent programming, io_uring is basically disruptor pattern.

        1. 2

          I’m not sure I follow. If you don’t mind, can we build a bit further upon your example? Imagine we got an input queue of requests and we want to produce an output queue of responses.

          First, how do you bridge this gap with your async solution? Do you concurrently map your process function over the input queue (bounded by some worker pool of some size)?

          If so, is what you mean with “infinite speed up” that there will always be plenty of green threads around and the runtime will schedule them best it can, and this scales with the number of cores (because the runtime does)?

          As opposed to pipelining which essentially looks like:

                                        +-- update db -----+
                                       /                    \
          input -> compute_response --+                      +--> output
                                       \                    /
                                        +-- update cache --+
          

          Where we only get an speed up by factor 3, because we only got 3 things we can run in parallel?

          1. 3

            Yes, if your problem is parallel processing, partitioning the data into independent subproblem and solving each subproblem sequentially is usually the faster way to get more speedup.

            If your problem is concurrent processing, than it’s a different problem. All cool async/await implementations literally use LMAX disruptor underneath, in the form of io_uring.

            1. 3

              Here are a couple of advantages I see with the pipeline approach:

              1. Determinism (the outputs preserves the order of the inputs), this will be harder with the async approach;

              2. Batching and backpressure can be trivially applied at any stage of the pipeline (in isolation, without affecting the rest of the pipeline), which will require more reworking in the async case;

              3. We can apply what Jim Gray calls “partition parallelism” or what the Disruptor people call “sharding” at any stage of the pipeline or the whole pipeline. This effectively adds more cores to a stage without breaking determinism.

              It seems to me that by essentially moving some of the decisions that the runtime does to the user, we should be able to do a better job.

              Just throwing everything at the runtime and hope for the best, reminds me of Joe Armstrong’s anecdote of an unmodified Erlang program only running 33 times faster on a 64 core machine, rather than 64 times faster as per the Ericsson higher-ups’ expectations.

              What I’m curious about is: can we do better than Erlang by listening to Jim?

    4. 1

      And of course, in Rust, the killer feature of lazy futures is that you can just borrow data from the enclosing scope.

      Can someone unpack this for me? What is the borrowing mechanism implied here? What about them being lazy enables this?