Skip to content

Can we relax Atomics.wait to allow waiting on the main thread? #177

@juj

Description

@juj

In multithreaded applications, there is the .wait() primitive that allows one to synchronously wait on Worker threads. Main thread is disallowed from .wait()ing, since that is a blocking operation. The effect of this is that main thread is not able to participate in shared memory state like Workers are. In order to remedy this, there is the .waitAsync() primitive that is intended to fill that gap for the main thread.

In summary, there exist the following interactions that are possible to interact with a lock or a CAS-variable:

  1. try-lock and abort (i.e. poll CAS once or a few times, and do something else/yield if not successful, try again later)
  2. infinite try-lock (busy spin until successful)
  3. Atomics.wait (sleep the calling thread until next CAS attempt can be made)
  4. Atomics.waitAsync() (enqueue an even when a next CAS attempt can be made)

Emscripten currently implements the pthread API for multithreading. That API does not unfortunately lend itself to using 4) .waitAsync() above, but it can express 1)-3). Main thread is limited from being able to do 3), so is left with 1) and 2). In many applications problem space, option 1) is not meaningful, so that leaves option 2) as the only way to proceed.

In order to fix the issue that pthreads does not allow one to express .waitAsync()s, we have been looking at both extending the pthreads API with a new pthread_mutex_lock_async() function family, but also creating a new "web-style" Wasm Workers multithreading API that would be designed ground up for the offered web primitives in mind.

In #176 we are discussing some of the technicalities of Atomics.waitAsync() that have come up that have prevented its adoption.

However it is looking very clear that even if/when pthread_mutex_lock_async(), Wasm Workers and #176 are resolved, there will still exist a lot of code that cannot meaningfully be recast in a Atomics.waitAsync() pattern, and they will need to continue to busy-spin access their locks ( 2) above). In most scenarios where the main threads of these applications busy-spin access the locks, they do so in scenarios where most of the time (if not practically always), the contention is zero, so the lock is practically always obtained very quickly. Or they might have scenarios where there can be a lot of contention, but the contention is expected to be very short-lived (a multithreaded malloc() or filesystem access being prime examples).

So these applications do busyspinning, but however, they must do that with a downside: currently main thread is prevented from being able to .wait() for a lock, no matter how short-lived the expected wait time would be.

This restriction, however well-spirited to nudge developers to look towards writing their code to be .waitAsync()-based, seems to be hurting instead: instead of saving performance and responsivenss, the programs instead need to resort to busy-spin-waiting and potentially consuming more battery - an opposite result that was intended.

That raises the question for conversation: would it be possible to lift the restriction that Atomics.wait() cannot wait on the main thread?

The wait would be blocking, but the same application hang watchdog timers would apply. I.e. wait for 10 seconds and the slow script dialog would come up.

Or maybe the max wait period on the main thread would be reduced, to e.g. 1 second or 0.5 seconds, or similar (if it helps implementing a slow script watchdog in some browsers?)

What this would win is that applications that do need to busy-spin on the main thread would be able to actually save battery while doing so, instead of consuming more excess cycles.

The worry that enabling wait on the main thread would invite more use of blocking on the main thread does not seem correct. Applications already need to wait on the main thread for some uses - malloc being a prime example - and it could happen either with a proper sleep construct in place, or without.

If there existed support for waiting on the main thread, browser DevTools would actually be able to detect and highlight this spent time specifically, and be able to show in profilers and DevTools timelines where such sleeps took place. Now without such support, those wait times are lost in an application-specific busy loop.

Also if it was possible to wait on the main thread, the browser could be more aware when it is intervening, and the slow script dialog would be able to highlight that this hang is due to a multithreaded programming hang, which would directly hint towards a programming error, and the programmer to look into their shared data structures usage. In the current state since there is no main thread wait support, when these programming errors come up, people may be unaware of which direction to look at first.

To summarize, the benefit of limiting Atomics.wait() from the main thread seems harmful, since in the needed problem spaces, those sleeps will get replaced with busy for(;;) loops instead. We would rather give the main thread a breather, be able to detect and highlight in DevTools where synchronization related waits occur, and improve battery usage.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions