I recently used RxJava while creating an add-on for Stash, the Git repository management tool from Atlassian. The plugin’s called “Who’s the Expert?” and it analyses commits to a repository to help answer two questions:
- “Who has contributed the most to this project over the past few years?” and
- “Who has made the most significant changes in recent months?”
A Problem Worthy of RxJava’s Attention
In order to achieve this, the plugin has to process a lot of data: it pulls all commits on the default branch for the last 2.5 years and analyses the content of every single one. I knew two things about this code in advance: first, there were going to be a lot of steps to go from a repository name to a leaderboard of the most influential committers, and second, I was pretty certain I’d need some multi-threading mojo in order to get it to perform in an acceptable time frame.
When I started, I’d just recently heard about RxJava at the YOW! 2013 conference in Sydney. Not only was I keen to try it out on something, but the problem I had at hand was the perfect fit for its promise of asynchronous, functional data flow processing with simplified threading.
RxJava in a Nutshell
In case you haven’t come across RxJava yet, it’s a Java port of .NET’s “Reactive Extensions”, and is described as “a library for composing asynchronous and event-based programs by using observable sequences”.
If you’re familiar with Futures, which abstract the delivery of a single value asynchronously in the future, and Iterables, for accessing a sequence of items which has an unknown length, then you can think of RxJava as essentially being a hybrid of the two: a mechanism for asynchronously receiving a stream of events. The power of RxJava comes in the large collection of operations available to be used on the data flow, such as map(), flatMap(), merge(), zip(), join() and many more, which abstract away the most common ways of transforming and combining streams of data.
RxJava Documentation: a Triumph and a Gap
RxJava has the best documentation of any open source library that I’ve ever seen. Not content to just hand on great written documentation of what they’ve produced, they’ve also described almost every operation available on Observables using marble diagrams, which are like visual legends that show how the values coming into an Observable are manipulated by it and what will be emitted on the other side.
While the docs are fantastic for most of the transforming and combining operations, when it came to the threading operations, I struggled to understand exactly how they worked. Based on what I thought they’d do, I forged ahead and attempted to slot them into the working, single-threaded code I already had, but the results were not at all what I expected to see.
The RxJava Thread Experiments
So, realising I had to understand the operations in isolation before I could apply them to my non-trivial data flow sequence, I built a set of experiments using minimal fixtures to test out each operator that I thought might be useful. Without too much effort I figured out enough about what was going on to get the threading behaviour I wanted in the plugin.
Then I realised that, in building these experiments, I’d created a bit of a “missing manual” for threading in RxJava. So I thought it would be worth sharing the experiments and their outcomes here in the hope that it can help others understand the operations quickly.
RxJava’s core API is written in (and can be used from) Java, and it also has “language adapters” for Scala, Clojure, Groovy, JRuby and Kotlin. The examples below are all in Scala, but I imagine they should translate very simply to any of those languages.
RxJava Threading Examples
All the examples below use a simple Observable[Int], which is usually created by accessing a value called ‘generator’. I’ve defined a couple of simple transforming functions that I use to operate on the sequence: shiftUp, which in all these examples just adds 10 to each value, and shiftDown which subtracts 10. I’ve also defined a new method on Observable (using Scala’s “Pimp my Library” pattern) called debug(), which, for each value passing through the Observable, prints out the value, a supplied string (usually the name of the operation just performed) and the thread that the Observable is executing on. It prints each value that’s moving through the flow in its own column so that things are easier to follow when multi-threading is going on.
No Threads
Let’s start with a simple RxJava example with no threading instructions. I have a ‘generator’ that simply produces a small range of numbers:
val maxNumber = 5 val generator: Observable[Int] = Observable.from(1 to maxNumber).debug("Generated")
and a simple program that maps those values twice while debugging, which looks like this:
val shiftedUp = generator.map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.map(shiftDown).debug("Shifted Down") shiftedDown.subscribe(debug("Received", _))
That example produces this output:
[main] Generated 1 [main] Shifted Up 11 [main] Shifted Down 1 [main] Received 1 [main] Generated 2 [main] Shifted Up 12 [main] Shifted Down 2 [main] Received 2 [main] Generated 3 ...
You should notice that all the code ran on the ‘main’ thread, and this happened because that was the thread that called subscribe() on the Observable at the end of the chain. Some people assume that, because RxJava is all about “asynchronous sequences”, everything will happen on different threads by default, but that’s not the case.
Make sure you understand everything that happened in that example, because the rest of the examples are slight variations on the above code.
Observable.subscribeOn()
So, how can we make things happen on a different thread? One of the easiest ways is to use the subscribeOn(Scheduler) function. This function produces a new Observable which will cause your subscription to occur on a thread retrieved from the specified scheduler instead of the thread from which subscribe() is called. You can just insert it into your call chain like this:
val shiftedUp = generator.subscribeOn(IOScheduler()) .map(shiftUp).debug("Shifted Up")
and you’ll get output like this:
[RxCachedThreadScheduler-1] Generated 1 [RxCachedThreadScheduler-1] Shifted Up 11 [RxCachedThreadScheduler-1] Shifted Down 1 [RxCachedThreadScheduler-1] Received 1 [RxCachedThreadScheduler-1] Generated 2 [RxCachedThreadScheduler-1] Shifted Up 12 [RxCachedThreadScheduler-1] Shifted Down 2 [RxCachedThreadScheduler-1] Received 2 [RxCachedThreadScheduler-1] Generated 3 ...
There are two things to note here. Firstly, because the subscription now occurs on a different thread, the subscribe() call will return control to the thread that called it almost immediately (the main thread, in the case of these examples), and the code in the closure passed to subscribe() will execute on another thread. In the case of my examples, where I’m just running a main thread, this meant the program exited as soon as subscribe() was called. To avoid this, I put a sleep after the subscribe() call as a simple way of ensuring the Observable completed before the application shut down. Assuming your application’s running in a web server or some other environment with at least one non-daemon thread, this shouldn’t cause you any concern, but you’ll want to be aware of it if you’re using threaded RxJava in something like a simple command-line app.
Secondly, it’s important to realise that RxJava makes heavy use of immutable objects and the decorator pattern, and that subscribeOn(), and all the other threading methods, are no exception. This means that calling subscribeOn() doesn’t modify the Observables defined up until that point, but instead makes a new Observable which, when called, adds behaviour to the Observables that it decorates, in this case changing the thread on which the subscription occurs.
To illustrate how this can go wrong if you don’t understand it, the following code:
val aGenerator = generator val subscribingOnIO = aGenerator.subscribeOn(IOScheduler()) val shiftedUp = aGenerator.map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.map(shiftDown).debug("Shifted Down") shiftedDown.subscribe(debug("Received", _))
will NOT execute on the IOScheduler, but on the main thread, because the subscribeOn() call created the new ‘subscribingOnIO’ Observable, but it was not used in the rest of the chain.
Observable.observeOn()
observeOn() is similar to subscribeOn(), except that it defines the Scheduler to use only from the point it is added to the chain onwards, whereas subscribeOn() defines the Scheduler used at the generation end of the chain.
Inserting it into the chain in the same place where we had subscribeOn() above:
val shiftedUp = generator.observeOn(IOScheduler()) .map(shiftUp).debug("Shifted Up")
produces this:
[main] Generated 1 [main] Generated 2 [main] Generated 3 [main] Generated 4 [main] Generated 5 [RxCachedThreadScheduler-1] Shifted Up 11 [RxCachedThreadScheduler-1] Shifted Down 1 [RxCachedThreadScheduler-1] Received 1 [RxCachedThreadScheduler-1] Shifted Up 12 [RxCachedThreadScheduler-1] Shifted Down 2 [RxCachedThreadScheduler-1] Received 2 [RxCachedThreadScheduler-1] Shifted Up 13 ...
You can see that the subscription to the source Observable occurs on the thread that called subscribe(), and then code from the observeOn() call onwards is executed using the Scheduler.
On my machine, you can see that the main thread generates all the values before the Scheduler picks up any of them, but once the Scheduler starts processing values, it handles them one at a time until they’re received at the Subscriber. We can see then that there appears to be some parallelism insofar as the main thread is operating in parallel to the IOScheduler thread. Running the example with a much larger sequence of values will show that these will actually run concurrently, i.e. the scheduler thread will begin processing values while the generating thread is still creating new values.
It’s also crucial to note that, even though we provided observeOn() with a Scheduler that is backed by a thread pool, it is not using multiple threads to process the values. This is key: observeOn() is not an instruction saying “process these values using this thread pool”, but only “process these values using one thread from this Scheduler”. We’ll see later how to achieve true parallelism.
Two Observable.observeOn() calls in one chain
observeOn() can can be used at multiple points throughout a flow to change the threads being used for different parts of processing. Here’s an example:
val shiftedUp = generator.observeOn(IOScheduler()) .map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.observeOn(ComputationScheduler()) .map(shiftDown).debug("Shifted Down")
and its output:
[main] Generated 1 [main] Generated 2 [main] Generated 3 [main] Generated 4 [main] Generated 5 [RxCachedThreadScheduler-1] Shifted Up 11 [RxComputationThreadPool-1] Shifted Down 1 [RxComputationThreadPool-1] Received 1 [RxCachedThreadScheduler-1] Shifted Up 12 [RxComputationThreadPool-1] Shifted Down 2 [RxComputationThreadPool-1] Received 2 [RxCachedThreadScheduler-1] Shifted Up 13 ...
Look carefully at the output. There are actually two threads operating (which annoyingly have the same length thread names). Something interesting here is that, even though there are two threads operating in the chain, the first scheduler appears to block waiting on the work in the downstream scheduler to deliver to the Subscriber before continuing. So there is coordination between these two threads. Increasing the source to generate a far longer sequence appears to confirm this is always the case. So there appears to be a queue for values coming into the first observeOn() call, but not for the second.
One example of where using multiple threads in the chain can be useful is for implementing background processing in GUI apps that have a dedicated “UI-Update Thread”, for example Swing’s Event Dispatcher Thread. Matthias Käppler (@mttkay) from SoundCloud has written about using RxJava in Android to try to solve some of the shortcomings of AsyncTask and they’ve contributed their scheduler for the Android message loop to the rxjava-android project.
Observable.subscribeOn() and Observable.observeOn() together
What happens if we put subscribeOn() and observeOn() together? That one is definitely not documented in the API docs! So let’s try it and see:
val shiftedUp = generator.subscribeOn(ComputationScheduler()) .map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.observeOn(IOScheduler()) .map(shiftDown).debug("Shifted Down")
And the output is…
[RxComputationThreadPool-1] Generated 1 [RxComputationThreadPool-1] Shifted Up 11 [RxComputationThreadPool-1] Generated 2 [RxCachedThreadScheduler-1] Shifted Down 1 [RxCachedThreadScheduler-1] Received 1 [RxComputationThreadPool-1] Shifted Up 12 [RxComputationThreadPool-1] Generated 3 [RxCachedThreadScheduler-1] Shifted Down 2 [RxCachedThreadScheduler-1] Received 2 [RxComputationThreadPool-1] Shifted Up 13
As you’d expect, the thread from the subscribeOn() Scheduler is used for all operations from the source Observable to the point where observeOn is called.
On my machine, this combination results in operations on different values being processed in parallel by the two threads. This suggests, in contrast to the double observeOn() example above, that a thread scheduled with subscribeOn() doesn’t block waiting on the work in downstream schedulers before generating and processing more values.
What if we put the subscribeOn() call after the observeOn() call instead of before it?
val shiftedUp = generator.map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.observeOn(IOScheduler()) .map(shiftDown).debug("Shifted Down") shiftedDown.subscribeOn(ComputationScheduler()) .subscribe(debug("Received", _))
This behaves exactly the same as the previous example. So, in a flow that doesn’t have any merge- or join-like operators (something I call a “single-path flow”), the position of a subscribeOn() call makes no difference – it will always dictate the thread used to subscribe to the source Observable.
What would happen, then, if we put two subscribeOn() calls in a single-path flow?
val shiftedUp = generator.subscribeOn(ComputationScheduler()) .map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.subscribeOn(IOScheduler()) .map(shiftDown).debug("Shifted Down") shiftedDown.subscribe(debug("Received", _))
This example is really a bit nonsensical, but my curiosity made me run it anyway. It just uses the Scheduler specified earliest in the flow, so the above example uses the ComputationScheduler and the IOScheduler call is just ignored, although it’s possible that under the hood it still grabbed a thread from that thread pool and just never used it.
Implicit Threading in Operators like Observable.delay()
All the threading we’ve done so far has been explicit, however it’s worth noting that some of RxJava’s operators will use threading internally to achieve their behaviour. As an example, using Observable.delay() like this:
val shiftedUp = generator.map(shiftUp).debug("Shifted Up") val delayed = shiftedUp.delay(Duration(10, MILLISECONDS)).debug("Delayed") val shiftedDown = delayed.map(shiftDown).debug("Shifted Down") shiftedDown.subscribe(debug("Received", _))
results in this kind of thing happening:
[main] Generated 1 [main] Shifted Up 11 [main] Generated 2 [RxComputationThreadPool-1] Delayed 11 [main] Shifted Up 12 [main] Generated 3 [RxComputationThreadPool-1] Shifted Down 1 [RxComputationThreadPool-1] Received 1 [RxComputationThreadPool-1] Delayed 12 [main] Shifted Up 13 [main] Generated 4 [RxComputationThreadPool-1] Shifted Down 2 [RxComputationThreadPool-1] Received 2 [RxComputationThreadPool-1] Delayed 13
This is really something to be aware of rather than concerned about: you should just know that some RxJava operators will result in your Subscriber getting called on a different thread to the one you used to call subscribe(), and you need to handle that.
I once heard Ben Christensen (@benjchristensen), one of the main authors of RxJava, gave some advice along the lines of:
it makes the most sense for Subscribers to always assume that values are delivered asynchronously, even though on some occasions they may be delivered synchronously.
Observable.parallel()
UPDATE: Observable.parallel() was removed from the core RxJava in October 2014 due to consistent misunderstanding around its usage.Its functionality has been moved into a new experimental project called RxJavaParallel, with the aim of merging the results back into the core if a suitable solution is found. Many thanks to Giampaolo Trapasso for pointing this out in the comments.
What if you want to explicitly get some parallel processing going? parallel() is the obvious choice to get the job done, but it results in code looking a little more gnarly than what we’ve looked at before now. Unlike the ‘par’ method on Scala’s collections or the parallelStream()s introduced in Java 8, each of which result in all following operations occurring in parallel, RxJava’s parallel() method has to encapsulate all the operations that should be performed in parallel.
To execute the same operations we’ve been doing above but in parallel, we need code that looks like this:
val shiftedUpAndDownInParallel = generator.parallel((f: Observable[Int]) => { val shiftedUp = f.map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.map(shiftDown).debug("Shifted Down") shiftedDown }) shiftedUpAndDownInParallel.subscribe(debug("Received", _))
Obviously, this needs a little explanation! The parallel() method on an Observable[T] takes one parameter: a function that accepts an Observable[T] and returns an Observable[R], which could emit the same type of the values that are coming into the parallel call, or any other type.
What I believe RxJava does under the hood is:
- create a number of Threads based on the parallelism of the provided Scheduler,
- create one Observable for each of those threads (let’s call these Source Observables),
- invoke your Observable-to-Observable function on each Source Observable, producing what we’ll call Result Observables,
- subscribe to each of the Result Observables,
- partition the incoming values into the Source Observables as they arrive,
- merge all the values emitted from each of the Result Observables into a single Observable[R], which is what was returned from the parallel() call itself.
So the Observable-to-Observable function you’re going to pass into parallel() will be called once for each thread that is going to be used, and needs to set up any operations which should be executed in parallel.
Here’s the output from running the above on my dual core machine, where you can clearly see that multiple values are being processed at the same time on different threads:
[main] Generated 1 [main] Generated 2 [main] Generated 3 [main] Generated 4 [main] Generated 5 [RxComputationThreadPool-1] Shifted Up 11 [RxComputationThreadPool-2] Shifted Up 12 [RxComputationThreadPool-1] Shifted Down 1 [RxComputationThreadPool-1] Received 1 [RxComputationThreadPool-2] Shifted Down 2 [RxComputationThreadPool-2] Received 2 [RxComputationThreadPool-2] Shifted Up 14 [RxComputationThreadPool-1] Shifted Up 13 [RxComputationThreadPool-1] Shifted Down 3 [RxComputationThreadPool-2] Shifted Down 4 [RxComputationThreadPool-1] Received 3 [RxComputationThreadPool-1] Received 4 [RxComputationThreadPool-1] Shifted Up 15 [RxComputationThreadPool-1] Shifted Down 5 [RxComputationThreadPool-1] Received 5
It’s worth noting here that processing is not handed back to the main thread after values are emitted from the Observable created by the parallel call (‘Received’ happens outside parallel()), but it continues on one of the threads from the Scheduler. In fact, both of the threads are used to deliver values to the Subscriber. There’s even something really interesting happening with value 4: notice that it is processed in parallel by thread 2, but is then delivered to the Subscriber by thread 1. So be warned: never make any assumptions about what thread your Subscriber will be called on!
Observable.parallel(), not actually doing anything in parallel
Perhaps you didn’t like the look of the code above using parallel(). A bit messy for you? Maybe you’ve thought, “Well, I can see it uses multiple threads, and they all continue to get used after the parallel() call, so I’ll just put my logic outside the parallel() call, because that looks nicer.” What would happen? So your code might look nice:
val shiftedUp = generator.parallel(f => f).map(shiftUp).debug("Shifted Up") val shiftedDown = shiftedUp.map(shiftDown).debug("Shifted Down") shiftedDown.subscribe(debug("Received", _))
But it’s not going to do anything in parallel:
[main] Generated 1 [main] Generated 2 [main] Generated 3 [main] Generated 4 [main] Generated 5 [RxComputationThreadPool-1] Shifted Up 11 [RxComputationThreadPool-1] Shifted Down 1 [RxComputationThreadPool-1] Received 1 [RxComputationThreadPool-1] Shifted Up 12 [RxComputationThreadPool-1] Shifted Down 2 [RxComputationThreadPool-1] Received 2 [RxComputationThreadPool-1] Shifted Up 14
It might do some things out of order, but it never processes values in parallel. In fact, on my machine, it never even uses more than one thread. I guess the logic inside parallel() figures out that there is never enough work to bother spinning up a second thread.
Parallel Processing When Combining Observables
All the examples we’ve looked at so far have been what I call a “single-path flow”: there has been one source Observable generating/emitting values, and a single path from it to the Subscriber. What if you have more than one source Observable and you want to combine them into single values before the Subscriber? This is quite a common scenario, for instance if you are using RxJava to do microservice orchestration, as Netflix does and as we’ve started doing at Tyro.
Below is a simple flow with two generators which are zipped together before the Subscriber (though I drop side of the tuple immediately after the zip). In an attempt to get some concurrency happening at the Observable end, I’ve inserted a subscribeOn() call after the two streams are zipped.
val generator1 = generator.map(shiftUp).debug("Shifted Up #1") val generator2 = generator.map(shiftUp).debug("Shifted Up #2") val zipped = generator1.zip(generator2).map(_._1).debug("Zipped") val shiftedDown = zipped.map(shiftDown).debug("Shifted Down") shiftedDown.subscribeOn(IOScheduler()).subscribe(debug("Received", _))
The order in which things are executed is interesting, but my attempts to get some multi-threading have failed – everything executes on a single thread from the IOScheduler:
[RxCachedThreadScheduler-1] Generated 1 [RxCachedThreadScheduler-1] Shifted Up #1 11 [RxCachedThreadScheduler-1] Generated 2 [RxCachedThreadScheduler-1] Shifted Up #1 12 [RxCachedThreadScheduler-1] Generated 3 [RxCachedThreadScheduler-1] Shifted Up #1 13 [RxCachedThreadScheduler-1] Generated 4 [RxCachedThreadScheduler-1] Shifted Up #1 14 [RxCachedThreadScheduler-1] Generated 5 [RxCachedThreadScheduler-1] Shifted Up #1 15 [RxCachedThreadScheduler-1] Generated 1 [RxCachedThreadScheduler-1] Shifted Up #2 11 [RxCachedThreadScheduler-1] Zipped 11 [RxCachedThreadScheduler-1] Shifted Down 1 [RxCachedThreadScheduler-1] Received 1 [RxCachedThreadScheduler-1] Generated 2 [RxCachedThreadScheduler-1] Shifted Up #2 12 [RxCachedThreadScheduler-1] Zipped 12 [RxCachedThreadScheduler-1] Shifted Down 2
We really want the two source Observables to execute in parallel, though. How can we make that happen? What if we put the subscribeOn() call before the zip() ?
val generatorSubscribed = generator.subscribeOn(IOScheduler()) val shiftedUp1 = generatorSubscribed.map(shiftUp).debug("Shifted Up #1") val shiftedUp2 = generatorSubscribed.map(shiftUp).debug("Shifted Up #2") val zipped = shiftedUp1.zip(shiftedUp2).map(_._1).debug("Zipped") val shiftedDown = zipped.map(shiftDown).debug("Shifted Down") shiftedDown.subscribe(debug("Received", _))
This gives us some more satisfying results:
[RxCachedThreadScheduler-2] Generated 1 [RxCachedThreadScheduler-1] Generated 1 [RxCachedThreadScheduler-2] Shifted Up #2 11 [RxCachedThreadScheduler-1] Shifted Up #1 11 [RxCachedThreadScheduler-2] Generated 2 [RxCachedThreadScheduler-1] Zipped 11 [RxCachedThreadScheduler-2] Shifted Up #2 12 [RxCachedThreadScheduler-2] Generated 3 [RxCachedThreadScheduler-1] Shifted Down 1 [RxCachedThreadScheduler-1] Received 1 [RxCachedThreadScheduler-1] Generated 2 [RxCachedThreadScheduler-2] Shifted Up #2 13 [RxCachedThreadScheduler-2] Generated 4 [RxCachedThreadScheduler-1] Shifted Up #1 12 [RxCachedThreadScheduler-1] Zipped 12 [RxCachedThreadScheduler-2] Shifted Up #2 14 [RxCachedThreadScheduler-2] Generated 5 [RxCachedThreadScheduler-1] Shifted Down 2 [RxCachedThreadScheduler-1] Received 2
We can clearly see two threads generating values. It seems that whichever thread delivers its value to the zip() second takes responsibility for doing the zipping and pushing the result all the way through the flow until it’s received at the Subscriber.
What can we understand about subscribeOn() from these two examples? I think it helps to think about what happens when you eventually call subscribe() on a flow: your subscription passes up through the flow until it reaches a generating Observable, at which point values start being emitted down. Now, if a subscription passes through a subscribeOn() Observable as it goes up, it seems to grabs a thread on the way through and, when it hits the source Observable, that is the thread used to send values down the chain. In the first combining example above, subscribeOn() was only passed through once, and we got one thread. In the second example, though, the subscription passed through subscribeOn() twice on the way up, once for each fork of the flow above the zip() call, so it grabbed one thread for each of the generators, and that’s how we got our parallel subscriptions.
This example demonstrates why it’s key when using RxJava to understand the difference between setting up a data flow and subscribing to it. The code above only calls subscribeOn() once, but that call creates an Observable that is subscribed to twice when we subscribe to the Observable at the end of the flow.
Trying to Get Parallel Subscriptions With Async.start()
The Async class in the separate rxjava-async-util module offers a suite of utility functions promising to assist you in achieving a range of common asynchronous tasks easily.
The first one on the wiki page – Async.start() – looks just great for something we’re doing at work: firing off a long-running call and reacting to it after it returns a single value some time later. Similar to the previous two examples, we’re doing this for multiple values concurrently and combining the results to give a single value to the Observer. The documentation for start() says “Pass the start( )
method a function that returns a value, and start( )
will execute that function asynchronously and return an Observable that will emit that value to any subsequent Subscribers.” Sounds perfect!
To test out Async.start(), I created this function for “slowly” generating a value:
def sleepThenGenerate(n: Int)() = { debug("Generating", n) Thread.sleep(800) debug("Sending", n) n }
And then used it in this little combining flow:
val asyncGenerator1: Observable[Int] = Async.start(sleepThenGenerate(1) _) val asyncGenerator2: Observable[Int] = Async.start(sleepThenGenerate(2) _) val shiftedUp1 = asyncGenerator1.debug("Generated") .map(shiftUp).debug("Shifted Up #1") val shiftedUp2 = asyncGenerator2.debug("Generated") .map(shiftUp).debug("Shifted Up #2") val zipped = shiftedUp1.zip(shiftedUp2).map(_._1).debug("Zipped") val shiftedDown = zipped.map(shiftDown).debug("Shifted Down") Thread.sleep(100) println("Subscribing...") shiftedDown.subscribe( onNext = debug("Received", _), onCompleted = () => {println("Completed")}, onError = (t) => {} )
I’m expecting when I run this to see:
- two sleepThenGenerate() calls get started in separate threads (like it says on the label) and immediately print “Generating”,
- then, about 100ms later, the main thread to print “Subscribing” right before it calls subscribe()
- then, around 700ms later, both values being generated (and “Sending” and then “Generated” printed),
- then those values being zipped and delivered to the Subscriber, with the corresponding debug messages being printed.
Want to see what happens when I run it?
[RxComputationThreadPool-1] Generating 1 [RxComputationThreadPool-2] Generating 2 Subscribing... [RxComputationThreadPool-1] Sending 1 [RxComputationThreadPool-2] Sending 2 [RxComputationThreadPool-1] Generated 1 [RxComputationThreadPool-2] Generated 2
And that’s it – no more output. Most of what I described above happened exactly as predicted, right down to the debug() call after the generator (which uses doOnNext() under the hood), but then nothing. map(shiftUp) never seems to get called, or at least the debug() after it is never called.
Why is this happening? I’m afraid I honestly don’t know. What I can tell you is that it seems to be some kind of race condition. If I remove the sleep(800) from sleepThenGenerate() so that it just generates a value and – most importantly – does that before the subscribe() call happens, it works just like I was expecting to:
[RxComputationThreadPool-1] Generating 1 [RxComputationThreadPool-2] Generating 2 [RxComputationThreadPool-1] Sending 1 [RxComputationThreadPool-2] Sending 2 Subscribing... [main] Generated 1 [main] Shifted Up #1 11 [main] Generated 2 [main] Shifted Up #2 12 [main] Zipped 11 [main] Shifted Down 1 [main] Received 1 Completed
That’s no good, though. The whole reason I wanted to use this mechanism was to kick off a bunch of slow things, subscribe() straight away, and then react when the results arrive. I feel there could be a bug here in the implementation of Async.start(), though I’m not expert enough to say for sure that I’m using start() correctly. I’m going to see if I can build a test case that shows it’s not behaving and get in touch with the RxJava devs about it (if they don’t stumble upon this blog first).
Achieving Parallel Subscriptions With Async.fromCallable()
Async.start() doesn’t do what I’m looking for, but there are other options in Async that seem to do similar things to what I thought start() did. The most straightforward of these appears to be Async.fromCallable(Callable), which says it will “convert a Callable into an Observable that invokes the callable and emits its result or exception when a Subscriber subscribes”. Does it work? Well, if we plug it into the previous example in place of Async.start():
val asyncGenerator1: Observable[Int] = Async.fromCallable(sleepThenGenerate(1) _) val asyncGenerator2: Observable[Int] = Async.fromCallable(sleepThenGenerate(2) _)
We get the following glorious output:
Subscribing... [RxComputationThreadPool-1] Generating 1 [RxComputationThreadPool-2] Generating 2 [RxComputationThreadPool-1] Sending 1 [RxComputationThreadPool-2] Sending 2 [RxComputationThreadPool-2] Generated 2 [RxComputationThreadPool-1] Generated 1 [RxComputationThreadPool-2] Shifted Up #2 12 [RxComputationThreadPool-1] Shifted Up #1 11 [RxComputationThreadPool-1] Zipped 11 [RxComputationThreadPool-1] Shifted Down 1 [RxComputationThreadPool-1] Received 1 Completed
Just what we expected. You’ll notice a slight but important difference to Async.start(), however (aside from the fact that fromCallable() actually delivers the values to the Subscriber): the lambdas passed into fromCallable() are not invoked until a subscribe() happens, whereas start() kicked them off immediately.
If you really want to achieve that behaviour promised by Async.start(), there are some fairly simple ways to achieve something similar. For example, you could submit your lambda to an ExecutorService and use Observable.from(Future) to convert it into an Observable. Watch out for nuances here around the blocking nature of such Observables, though.
Conclusion?
Well, that’s it for my experiments at the moment. I hope you’ve at least learned the following:
- There are a lot of different ways to control threads when using RxJava;
- almost all of them are incredibly simple to use in your code,
- most of them do pretty much what the documentation describes, but…
- understanding exactly how they’ll behave in your specific circumstances may require some careful analysis and/or isolated testing.
If you’ve got any questions or suggestions, please leave a comment below!
Big Thanks…
… to @del_n and @jazir1979 for reviewing this mega-blog. Though I fear you may not have made it this far. 😉
Want to learn more?
If you’d like to play around with some of my examples, you can clone all the code from this blog from my rxjava-threading-examples GitHub repository.
If you’d like to find out more about RxJava, watch Ben Christensen talk about it at GOTO Aarhus 2013.
If you want to learn who the experts are in your codebase, don’t forget to check out my Stash plugin, “Who’s the Expert?”.
And if you want to learn more about asynchronous data flow programming or using lambdas in functional programming, some of these books might hit the spot:
Image credit: ‘Filamental Light‘ by Evan Leeson
Brilliant post… ! Thank you for taking the time to document your observations!
Million thanks! This is one of the most valuable and MUST READ post for any serious rx java development for sure.
Great addition to the RxJava documentation! I too felt that this part was missing from the official docs. But then again, maybe this isn’t the core of ReactiveX’s mission.
Question (to you or anyone): What if you want *one* callback for a set of zipped or merged results? Is subscribing to onNext in Observable.buffer(n) the only option? I was looking for something like the .get() method for the java.util.concurrency.Future class and I found “toBlocking()”. The problem is that caused an entire new scheduling to happen.
If I modify your last example:
shiftedDown.subscribe(
onCompleted = () => {
println(“Completed. Values: “);
val values = shiftedDown.toBlocking()
println(values.first() + ” and ” + values.last())
},
)
..this didn’t work very well. It started a new execution. Can anyone explain why? Is there an alternative to toBlocking() which returns the already computed values?
Sometimes lambdas obscure important details about the operands if you don’t know the function signature. In the subscribe callback, you’re already operating on the results of the subscription. That means you shouldn’t reference observables from the same chain in that callback. toBlocking() or toList() should be chained before .subscribe() ever gets called.
In fact, you’re also making the same mistake the author warns against by referencing the observable outside of its chain as if each function modifies the observable. That’s not how functional chains work.
Maybe try this:
shiftedDown.toList().subscribe((Action1)(values) -> {
for (int value : values) {
println(value.first() + ” and ” + value.last())
}
})
Oops, just realized you were using .first and .last in the output. You can’t use functional operators like that outside of the chain. Just treat it as a list if that’s what you need.
parallel operator is not supported anymore in RxJava, as Ben Christensen said in http://stackoverflow.com/a/26250237/1360888
“The parallel operator proved to be a problem for almost all use cases and does not do what most expect from it, so it was removed in the 1.0.0.rc.4 release: https://github.com/ReactiveX/RxJava/pull/1716“
All that stuff has moved to https://github.com/ReactiveX/RxJavaParallel
btw, thank you for the great post!
Thanks heaps for this info, Giampaolo. I’ve put an update in the blog post about it.
Thanks for the post.
The async case is not working due to reasons detailed in this thread :
https://groups.google.com/forum/m/#!topic/rxjava/Bd_HwBvyvy4
If this were translated to actual java (maybe doing the additional debug() method using aspects) would be much more usefull for java developers not knowing scala 🙂
Hi,
Just to let you know that the number of the readers who did the reading to the very end just increased by one. It was very informatory and comprehensive. I wish I could find more articles/blog posts like that! A very brilliant one.
Jacek
Very good post Graham! By far the best I have come across related to RxJava and their Schedulers. I ported your scala code to Java and when I execute the “Two Observable.observeOn() calls in one chain” scenario, I do not see the same blocking behavior that you mentioned in your post. I do see that the two threads are running concurrently and that the first scheduler is not in any way waiting on the second. What I did, when running this scenario is add a 10 millisecond sleep/delay during my “up” transformation and a 20 millisecond sleep/delay during my “down” transformation, and when I then pump 200 items through the base observable, I see output like the below (leaving out the first 124 generated items for brevity). As you can see the Scheduler related to the “up” mapping is getting all its work done much quicker than the “down” Scheduler, which is to to be expected given their respective latencies. Thoughts?
FYI, the number after the pipe represents time in milliseconds since the program was executed.
main | 173 – Generated 125
main | 173 – Generated 126
main | 173 – Generated 127
main | 173 – Generated 128
RxCachedThreadScheduler-2 | 179 – Shifted Up 11
RxCachedThreadScheduler-2 | 189 – Shifted Up 12
RxCachedThreadScheduler-1 | 199 – Shifted Down 1
RxCachedThreadScheduler-1 | 199 – Received 1
RxCachedThreadScheduler-2 | 199 – Shifted Up 13
RxCachedThreadScheduler-2 | 209 – Shifted Up 14
RxCachedThreadScheduler-2 | 219 – Shifted Up 15
RxCachedThreadScheduler-1 | 219 – Shifted Down 2
RxCachedThreadScheduler-1 | 219 – Received 2
RxCachedThreadScheduler-2 | 229 – Shifted Up 16
RxCachedThreadScheduler-2 | 239 – Shifted Up 17
RxCachedThreadScheduler-1 | 239 – Shifted Down 3
RxCachedThreadScheduler-1 | 239 – Received 3
RxCachedThreadScheduler-2 | 250 – Shifted Up 18
RxCachedThreadScheduler-1 | 259 – Shifted Down 4
RxCachedThreadScheduler-1 | 259 – Received 4
RxCachedThreadScheduler-2 | 260 – Shifted Up 19
RxCachedThreadScheduler-2 | 270 – Shifted Up 20
RxCachedThreadScheduler-1 | 279 – Shifted Down 5
RxCachedThreadScheduler-1 | 279 – Received 5
…
That was exactly my thought when reading “Two Observable.observeOn() calls in one chain”. I doubt that rxJava has a mechanism that makes an observable to wait, be default, until the emmited value has reached the end of the processing chain. So, your results seem to be correct, at least to my understanding of how it’s supposed to work.
Pingback: Try Android | RxJava
Graham, thanks for your post. It’s a nice reading. I wondered also like you, why “things” are not executed “in parallel” by default.
Finally today, I read the sentence “Rx is single-threaded by default” on http://www.introtorx.com/Content/v1.0.10621.0
WOW! With that in mind things become much clearer for me.
So again: “Rx is single-threaded by default”.
And there is also stated: “While Rx has concurrency features, these should not be mistaken for a concurrency framework. Rx is designed for querying data, and as discussed in the first chapter, parallel computations or composition of asynchronous methods is more appropriate for other frameworks.”
Pingback: Реактивное программирование в Android - ppApp
what is shiftUp?
shiftUp is an Int => Int function that adds 10 to a number. shiftDown subtracts 10.
Pingback: Multithreading Made easy (Kotlin) | Josias Sena
Pingback: 反应性扩展和/或参与者模型如何简化流程编排? – FIXBBS