Copyright © 2024 World Wide Web Consortium. W3C® liability, trademark and permissive document license rules apply.
This document proposes a mechanism by which an application APP can opt-in to exposing certain information with another application CAPTR, if CAPTR is screen-capturing the tab in which APP is running. It describes a mechanism for tab capture only.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document is not complete.
This document was published by the Web Real-Time Communications Working Group as a Working Draft using the Recommendation track.
Publication as a Working Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 03 November 2023 W3C Process Document.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MUST and MUST NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
Consider a web-application, running in one tab, which we’ll name "main_app." Assume main_app calls getDisplayMedia and the user chooses to share another tab, where an application is running which we’ll call "captured_app."
Note that:
Both these traits are desirable for the general case, but there exist legitimate use cases where the browser would want to allow applications to opt-in to bridging that gap and enable a connection.
We wish to enable the legitimate use cases while keeping the general case as it was before.
Consider a collaborating presentation software and video-conferencing software. Assume the user is in a VC session. The user starts sharing a presentation. Both applications are interested in letting the VC app discover that it is capturing a slides session, which application, and even which session, so that the VC application will be able to expose controls to the user for flipping through slides. When the user clicks those controls, the VC app will be able to send messages to the presentation app, requesting that it do such things as flip through slides, enter/leave presentation-mode, etc.
The means for transmitting these messages are outside the scope of this document. Some options are:
Capturing applications often wish to gather statistics over what applications their users tend to capture. For example, VC applications would like to know how often their users share presentation applications from specific providers, Wikipedia, CNN, etc. Gathering such information can be used to improve service for the users by introducing new collaborations, such as the one described above.
Users sometimes choose to share the wrong tab. Sometimes they switch to sharing the wrong tab by clicking the share-this-tab-instead button by mistake. A benevolent application could try to protect the user by presenting an in-app dialog for re-confirmation, if they believe that the user may have made a mistake.
This use-case is a sub-case of #3, but deserves its own section due to its importance. The "Hall of Mirrors" effect occurs when users choose to share the tab in which the VC call takes place. When detecting self-capture, a VC application can avoid displaying the captured stream back to the user, thereby avoiding the dreaded effect.
The capture-handle mechanism consists of two main parts - one on the captured side, one on the capturing side.
setCaptureHandleConfig
.
CaptureHandle
.
Applications are allowed to expose information to capturing applications. They would
typically do so before knowing if they even are captured. The mechanism used is calling
setCaptureHandleConfig
with an appropriate CaptureHandleConfig
.
The CaptureHandleConfig dictionary is used to instruct the user agent what information the captured application intends to expose, and to which applications it is willing to expose said information.
WebIDLdictionary CaptureHandleConfig
{
boolean exposeOrigin
= false;
DOMString handle
= "";
sequence<DOMString> permittedOrigins
= [];
};
exposeOrigin
If true
, the user agent MUST expose the captured application's origin
through the origin
field of CaptureHandle
. If
false
, the user agent MUST NOT expose the captured application's origin.
handle
The user agent MUST expose this value as handle
.
Note: Values to this field are limited to 1024 16-bit characters. This limitation is
specified further in setCaptureHandleConfig
.
permittedOrigins
Valid values of this field include:
"*"
If permittedOrigins
consists of the single item
"*"
, then the CaptureHandle
is observable by all
capturers. Otherwise, CaptureHandle
is observable only to capturers whose
origin is lists in permittedOrigins
.
MediaDevices
is extended with a method - setCaptureHandleConfig
-
which accepts a CaptureHandleConfig
object. By calling this method, an application
informs the user agent which information it permits capturing applications to observe.
There is no consensus yet on how setCaptureHandleConfig
should
behave if called more than once, due to concerns over it being misused as a
cross-origin messaging channel itself. This is under discussion in
issue #11.
WebIDLpartial interface MediaDevices {
undefined setCaptureHandleConfig
(optional CaptureHandleConfig
config = {});
};
setCaptureHandleConfig
The user agent MUST run the following validations:
handle
is set to an invalid value, the user agent MUST
reject by raising TypeError
.
permittedOrigins
is set to an invalid value, the user
agent MUST reject by raising NotSupportedError
.
setCaptureHandleConfig
()
is not from the top-level browsing context, the user agent MUST reject by raising InvalidStateError
.
If all validations passed, the user agent MUST accept the new config. The user agent
MUST forget any previous call to setCaptureHandleConfig
; from now on,
the application's CaptureHandleConfig
is config.
The observable CaptureHandle
is re-evaluated for all capturing applications.
CaptureHandle
is different than prior to the call to setCaptureHandleConfig
,
the user agent MUST fire an event named capturehandlechange
.
CaptureHandle
whenever
getCaptureHandle
is called.
Capturing applications which are permitted to observe a track's
CaptureHandle
have two ways of reading it.
getCaptureHandle
.EventListener
at oncapturehandlechange
.
The user agent exposes information about the captured application to the capturing
application through the CaptureHandle
dictionary. Note that a CaptureHandle
object
MUST NOT be given to a capturing application that is not permited to
observe it.
WebIDLdictionary CaptureHandle
{
DOMString origin
;
DOMString handle
;
};
origin
If the captured application opted-in to exposing its origin (by setting
exposeOrigin
to true), then the user agent MUST set
origin
to the origin of the captured application. Otherwise,
origin
is not set.
handle
The user agent MUST set this field to the value which the captured application set in
handle
.
Extend MediaStreamTrack
with a method called getCaptureHandle
.
When the MediaStreamTrack
is a video track derived of screen-capture,
getCaptureHandle
returns the latest observable CaptureHandle
.
Otherwise it returns null
.
There is no consensus yet on whether getCaptureHandle
belongs on MediaStreamTrack
or on a dedicated controller object that is
neither clonable nor transferable, to separate messaging affecting all tracks from consumption
of a single track. This is under discussion in
issue #12.
WebIDLpartial interface MediaStreamTrack {
CaptureHandle
? getCaptureHandle
();
};
getCaptureHandle
If the track in question is not a video track, or does not represent a
browser
display surface, then the user
agent MUST return null
.
If the track is ended, then the user agent MUST return null
.
If the captured application did not set a CaptureHandleConfig
, or if the last time
it set it to the empty CaptureHandleConfig
, then the user agent MUST return
null
.
The user agent MUST compare the origin of the capturing document to those which the
captured application listed in permittedOrigins
. If the
capturing origin is not permitted to observe the CaptureHandle
,
then the user agent MUST return null
.
If all previous validations passed, then the user agent MUST return a
CaptureHandle
dictionary with the values derived of the last
CaptureHandleConfig
set by the captured application.
Whenever the observable CaptureHandle
for a given capturing application changes,
the user agent fires an event named capturehandlechange
. This can happen in the
following cases:
setCaptureHandleConfig
()
with a new
CaptureHandleConfig
. (Note that the new CaptureHandleConfig
might or might not
cause the observable CaptureHandle
to change, e.g. if changing
permittedOrigins
.)
Events are not fired when the track ends, nor after it ends.
MediaStreamTrack
is extended with an EventListener
called
oncapturehandlechange
.
WebIDLpartial interface MediaStreamTrack {
attribute EventHandler oncapturehandlechange
;
};
oncapturehandlechange
EventHandler
for events named capturehandlechange
.
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: