When an application captures a [=display-surface=], the user agent faces a decision - should the captured [=display-surface=] be brought to the forefront of the user's screen ("focused"), or should the capturing application retain focus. This document proposes a mechanism by which an application can influence this decision.
This document uses the definition of the following concepts from [[SCREEN-CAPTURE]]: display-surface, application [=display-surface=], browser [=display-surface=], window [=display-surface=] and monitor [=display-surface=].
Assume a Web-application that calls {{MediaDevices/getDisplayMedia()}} and the user chooses to capture a tab or a window. It is not currently specified whether the user agent should focus the captured [=display-surface=], or let the capturing application retrain focus.
The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore not well-positioned to make an informed decision with regards to focus.
In contrast, the capturing application is familiar with its own properties, and is better positioned to make this decision. Moreover, by reading {{MediaTrackConstraintSet/displaySurface}} and/or using Capture Handle, the capturing application can learn about the captured [=display-surface=], driving an even more informed decision.
For example, a video conferencing application may wish to:
The conditional-focus mechanism allows the capturing application to instruct the user agent to either switch focus to the captured [=display-surface=], or to avoid such a focus change.
The window of opportunity for the application to make the decision is defined. If the mechanism is not invoked within this window of opportunity, the user agent takes over and makes its own decision.
{{MediaDevices/getDisplayMedia()}} is currently defined such that it returns a {{Promise}}<{{MediaStream}}>. We extend this definition such that when {{MediaDevices/getDisplayMedia()}} is called, if the user elects to capture either an [=application=], [=browser=] or [=window=] [=display-surface=], the video track of the aforementioned {{MediaStream}} will be of type {{FocusableMediaStreamTrack}}.
{{MediaStreamTrack}} is subclassed as {{FocusableMediaStreamTrack}}.
[Exposed=Window] interface FocusableMediaStreamTrack : MediaStreamTrack { undefined focus(CaptureStartFocusBehavior focus_behavior); }; enum CaptureStartFocusBehavior { "focus-captured-surface", "no-focus-change" };
Recall that the {{FocusableMediaStreamTrack}} object was instantiated in response to a
call to {{MediaDevices/getDisplayMedia()}}. That call to
{{MediaDevices/getDisplayMedia()}} returned a {{Promise}}<{{MediaStream}}>
PRMS
. Like any {{Promise}}, PRMS
is settled on a microtask,
which we will name MT
.
When MT
starts executing, a window of opportunity opens for the
application to inform the user agent as to whether it wants the captured
[=display-surface=] to be focused or not. Calls to {{focus()}} may only have an effect
while this window of opportunity is open. It closes as soon as one of the following
happens:
MT
finishes.When the window of opportunity closes, if an explicit decision was not made through calling {{focus()}}, then the user agent MUST make its own decision.
Therefore, when {{focus()}} is called, the user agent MUST run the following steps:
MT
, the user agent MUST have
already made a decision, so raise an {{InvalidStateError}}. Otherwise, proceed.
MT
and within one second of the
capture starting. Therefore, the user agent MUST NOT make its own decision with
respect to focusing the captured [=display-surface=], but rather:
focus_behavior
is set to
{{CaptureStartFocusBehavior/"focus-captured-surface"}}, then the user agent MUST
focus the captured [=display-surface=].
focus_behavior
is set to
{{CaptureStartFocusBehavior/"no-focus-change"}}, then the user agent MUST NOT
focus the captured [=display-surface=].
All examples will assume a predicate named shouldFocus()
which accepts a
video {{MediaStreamTrack}} as input. It is a synchronous function returning either
{{CaptureStartFocusBehavior/"no-focus-change"}} or
{{CaptureStartFocusBehavior/"focus-captured-surface"}}.
function shouldFocus(mediaStreamTrack) { // Synchronous. // Returns "no-focus-change" or "focus-captured-surface". // Has access to Capture Handle. }
Reasonable implementations of this predicate include:
const mediaStream = await navigator.mediaDevices.getDisplayMedia(); const [track] = mediaStream.getVideoTracks(); if (!!track.focus) { track.focus(shouldFocus(track)); // Correct. }
const mediaStream = await navigator.mediaDevices.getDisplayMedia(); const [track] = mediaStream.getVideoTracks(); await someOtherFunction(); // Mistake: Allows MT to finish its execution. if (!!track.focus) { track.focus(shouldFocus(track)); }
const mediaStream = await navigator.mediaDevices.getDisplayMedia(); const [track] = mediaStream.getVideoTracks(); setTimeout(() => { // Mistake: Allows MT to finish its execution. if (!!track.focus) { track.focus(shouldFocus(track)); } }, 1);
const mediaStream = await navigator.mediaDevices.getDisplayMedia(); const [track] = mediaStream.getVideoTracks(); timeConsumingFunc(); // Mistake: Might take longer than 1s. if (!!track.focus) { track.focus(shouldFocus(track)); }