Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trusted Types - use setHTML instead of innerHTML internally #5094

Open
kkmuffme opened this issue Aug 21, 2022 · 21 comments
Open

Trusted Types - use setHTML instead of innerHTML internally #5094

kkmuffme opened this issue Aug 21, 2022 · 21 comments
Milestone

Comments

@kkmuffme
Copy link

Followup to #4409 and it's PR #4927 which were more like for handling TrustedTypes (TT-agnostic).

With Chrome shipping this API soon (v105 https://chromestatus.com/feature/5786893650231296, currently v104) and given the exploits with ".sanitize" (in DomPurify, but also in https://developer.mozilla.org/en-US/docs/Web/API/Sanitizer/sanitize, see https://web.dev/sanitizer/#api-surface), the TT-agnostic approach currently taken can be insufficient/insecure.

Suggested change:
Instead of using .innerHTML = , jQuery should use .setHTML( ... ) when available (https)/supported by the browser


Advantages:

Disadvantages:

  • breaking change for users that used insecure "onclick=",... in their HTML attributes inserted with .html()/.append(),... (which are incompatible with strict CSP policies anyway though) - this is the cost of XSS mitigation though.

Additional discussion needed:

  1. if a trusted type is passed to a jQuery function, should jQuery continue using .innerHTML (instead of .setHTML())?
  • Pro: the trusted type was sanitized according to the users needs (e.g. with certain elements allowed, that might not be allowed by default)
  • Con: can be insecure (as per above)
  1. how would a user pass Sanitizer options to setHtml when using jQuery .html()/.append()/...? Should there be an option for this at all? (if we would keep using .innerHTML for already Trusted Types, there wouldn't be a need for an option, since the user would just sanitize according to his/her needs before passing to jQuery)

Closing thoughts
The current way Trusted Types are handled in jQuery is potentially insecure and still allows for XSS attacks - running afoul of the purpose of Trusted Types.
Furthermore, this minor change, will help make the web safer for users and life easier for developers.

@mgol
Copy link
Member

mgol commented Aug 21, 2022

@koto What do you think?

@mgol
Copy link
Member

mgol commented Aug 21, 2022

@kkmuffme thanks for the ideas. That said, setHTML is experimental & Chromium-only at this point. It’s good to observe this space and maybe even experiment how possible usage within jQuery would look like but we’re unlikely to use this API until it’s available in at least two stable versions of browsers with different web engines.

@mgol mgol added the Discuss in Meeting Reserved for Issues and PRs that anyone would like to discuss in the weekly meeting. label Aug 21, 2022
@kkmuffme
Copy link
Author

Thanks for your feedback. I know it's a bit too early, it's more like something I wanted to put on the roadmap.
The required change is minimal and done quickly anyway, I just want to foster discussion for the 2 points outlined above to find a consensus and allow for quick implementation, once the native Sanitizer is available more widely.

Specifically, setHtml will be a non-flag feature in Chromium (Chrome, Edge, Chrome for Android) in 2 weeks and in Firefox until the end of the year (at latest), which means 75% of the browser market has support for this until then.
Which means, this can probably be added in a few months time.

@mgol
Copy link
Member

mgol commented Aug 21, 2022

That looks like good timing to have it considered then, thanks.

I’m on PTO now but I’ll have a closer look when I’m back.

@LifeIsStrange
Copy link

Sorry didn't read all and don't know how that fit in the discussion but since the sanitizer MVP API will be enabled by default in 8 days, maybe Jquery could leverage it, in addition to trusted types
https://chromestatus.com/feature/5786893650231296

@koto
Copy link

koto commented Aug 22, 2022

The feature is relatively new indeed; The advantages are already listed out in the bug, they are all true. For completeness sake however I'd like to point out potential disadvantages to consider:

  1. It will not be backwards compatible - the sanitizer obviously has an opinionated view as to what should be removed (it started with XSS, but perhaps this would expand to cover other vectors, e.g. Form hijacking with formaction, formmethod WICG/sanitizer-api#169).
  2. The API defaults, and its outputs will also change over time as new vectors emerge, so setHTML behavior can change between browser releases, potentially introducing breakages.
  3. We probably would need to depend on a polyfill, otherwise the resulting DOM and app behavior will be different across browsers (imagine developing an app on Safari that depends on scripts surviving html(), and deploying the app to production with Chrome users).

All that aside, I think switching the default to be secure against XSS would be a tremendous change. For TrustedHTML inputs, these could skip the setHTML and continue using innerHTML, as these values were already determined by the developer to be OK. That way protecting against XSS in jQuery apps is straightforward - use its regular DOM manipulating APIs with strings, and wherever you want to have JS values reach the DOM, produce the value using Trusted Types and pass it to jQuery. jQuery apps become safe against XSS by default.

You probably want to include config hooks that let one specify SanitizerConfig or a Sanitizer in case some configs need to be tweaked (e.g. you don't want forms in your application).

@mgol
Copy link
Member

mgol commented Aug 22, 2022

Thanks for the input, @koto.

The possibility of the sanitizer adding more elements to the blocklist sounds risky: it may break some apps and browsers try to avoid breaking changes.

I think the concerns presented & especially cross-browser issues will make this a 5.0.0 item. For context, we’re removing what we can from 4.0.0 pending issues & will plan a release within a few months. 4.x will still support IE 11 but 5.0.0 will drop all IE versions. If by the time 5.0.0 comes out Safari implements the API, at least we’d have cross-browser issues resolved.

@LifeIsStrange
Copy link

I see the value in skipping sanitization of trusted types however if current jquery doesn't sanitize at all, unconditionally, changing the policy to sanitize by default is a performance regression risk right?
I'm not an expert in sanitization but if it has to be run on each user supplied dom strings/elements, the performance, cpu, battery and ecological cost must be non-negligible right?
If so sanitization should be an optin-global jquery config. And maybe throw a warning until the end user has explicitly chosen a policy via a setter.

@koto
Copy link

koto commented Aug 22, 2022

@LifeIsStrange You could benchmark this, but I'd suspect the cost is mostly in the HTML parser, which the browser runs anyways whenever innerHTML is called , operating on the resulting tree (to sanitize it) should be quite efficient. Browser vendors are well positioned, and quite motivated to make setHTML as efficient as .innerHTML.

@kkmuffme
Copy link
Author

@mgol

The possibility of the sanitizer adding more elements to the blocklist sounds risky: it may break some apps and browsers try to avoid breaking changes.

The sanitize API is far beyond this stage (DomPurify which is the base for this exists since 8 years) that there are things added willynilly. Currently the Sanitize API is only updated to handle additional XSS vectors within what Sanitize does already (e.g. Firefox just had a security issue fixed very recently).

While it's true, that the sanitize API might remove additional insecure things in the future, these would be probably very limited use cases (since all major things are in it already now). Additionally, the things are removed only because of security concerns - which is a good thing (and if they break something: it's good they broke, since your js is probably unsafe)

@LifeIsStrange

I see the value in skipping sanitization of trusted types however if current jquery doesn't sanitize at all, unconditionally, changing the policy to sanitize by default is a performance regression risk right?

You are right. However, security always comes at a cost.

Here's a basic benchmark (please run in Chrome v105, I tested on Firefox only and there benchmarks are off due to security reasons/limits of performance.now, so this is just a rough estimate when done in Firefox)
Basic string: https://www.measurethat.net/Benchmarks/Show/20543/0/dompurify-vs-native-vs-none
Extremely long/complex string: https://www.measurethat.net/Benchmarks/Show/20544/0/dompurify-vs-native-vs-none-long

Shows that innerHTML is about 10 times faster than setHTML() in Firefox. Looks like much, right?
It's not. See .innerHTML is also 10 times faster than jQuery.html(): https://www.measurethat.net/Benchmarks/Show/18606/0/jquery-html-vs-elementinnerhtml-fixed
And nobody (well, react/angular users :D) make a fuss about it, since it's negligible in most real world use cases.

As koto correctly states:

Browser vendors are well positioned, and quite motivated to make setHTML as efficient as .innerHTML.
(Chrome's setHTML seems to be faster than Firefox's (can somebody please run the above benchmark in chrome 105?), so we get some more data)

@mgol
Copy link
Member

mgol commented Aug 22, 2022

A 10x slowdown is not insignificant. Remember the jQuery overhead would apply on top of this additional one.

I imagine hooking into the sanitizer should be optional anyway, though, so the default behavior would be still fast.

@kkmuffme
Copy link
Author

I think worrying about performance at the current time isn't really relevant - the native sanitizers will get faster over the next few months (the test I ran is not representative, as Firefox's performance tools are limited for security reasons. Somebody would need to run the above benchmark on Chrome 105 to get a reliable number)

The current implementation (TT-agnostic with prior sanitizing) is a 35x slowdown.

I'd say once Chrome 105 is released, we could create a PR for this simple change. Then run synthetic and real world benchmarks to see what the impact on performance really is - and then go from there.
Premature optimization and stuff :D

@kkmuffme
Copy link
Author

kkmuffme commented Aug 23, 2022

Just some thoughs on how to improve performance too in case it turns out this really is an issue:
jQuery could internally handle trusted types via an internal trustedTypes.createPolicy (which does nothing, just pass through). This would cover a high number of use cases.

// when the jQuery selector is a basic HTML tag without any attributes it's safe (technically even some attributes are safe, but that's up for discussion)
// addClass, attr, prop, text are safe, myHtml should be instanceof TrustedHTML 
let myHtml = jQuery( '<strong>' ).addClass( 'bar' ).attr( 'data-hello', worldVar ).text( someVar ).prop( 'checked', true );

// still TrustedHTML, as this is safe
myHtml.removeClass();

// there are no unsafe setters, since all unsafe setters use setHTML internally in jQuery:
myHtml.html( foo ); // still safe, since .html() uses .setHTML() internally, which is safe, so we still have TrustedHTML

@koto
Copy link

koto commented Aug 24, 2022

I am not sure how TrustedHTML got mixed up in here, but none of the code snippets above require this type, apart from myHtml.html(foo) if foo would trigger the path that innerHTMLs.

It wouldn't be secure for .text() to return TrustedHTML. textContent and other text getters return raw contents of the text nodes, and not HTML:

p = document.createElement("pre"); p.innerText = '<img src=x onerror=alert(1)>'; p.innerText
> '<img src=x onerror=alert(1)>'

I don't see a need for a passthrough policy inside jQuery, could you elaborate? More specifically, I don't know how Trusted Types would be able to workaround a performance bottleneck of a native sanitizer, if there is one. If you have a dynamic data that needs sanitizing, use a sanitizer. If you have a dynamic data that doesn't need sanitizing, use DOM createElement & friends APIs , or their jQuery wrappers - or create a TrustedHTML value if you want to pass the whole snippet to innerHTML.

@kkmuffme
Copy link
Author

I totally agree with you - my comment was in regards on the impact on performance (and I was wrong about .text() of course), as LifeIsStrange said this might be a problem for some people.

If you have a dynamic data that doesn't need sanitizing, use DOM createElement & friends APIs , or their jQuery wrappers - or create a TrustedHTML value if you want to pass the whole snippet to innerHTML.

Exactly, however this is not what the average jQuery codebase looks like nor what tons of developers do.
There are tons of codebases that use code like the above - an element is created and then passed to jQuery.html
Afaik jQuery will now call setHTML instead of innerHTML, which is much slower, even though it is not necessary here; since myHtml is safe already.

@koto
Copy link

koto commented Aug 26, 2022

jQuery.html(element) doesn't call innerHTML. The branch for elements through .html follows this path:

jquery/src/manipulation.js

Lines 398 to 400 in 016872f

if ( elem ) {
this.empty().append( value );
}

jquery/src/manipulation.js

Lines 304 to 311 in 016872f

append: function() {
return domManip( this, arguments, function( elem ) {
if ( this.nodeType === 1 || this.nodeType === 11 || this.nodeType === 9 ) {
var target = manipulationTarget( this, elem );
target.appendChild( elem );
}
} );
},

fragment = buildFragment( args, collection[ 0 ].ownerDocument, false, collection, ignored );

// Add nodes directly
if ( toType( elem ) === "object" && ( elem.nodeType || isArrayLike( elem ) ) ) {
jQuery.merge( nodes, elem.nodeType ? [ elem ] : elem );

Currently the .innerHTML is only done for jQuery.html(string) and jQuery.html(TrustedHTML), with the actual sink here:

tmp.innerHTML = jQuery.htmlPrefilter( elem );

So, if I understand correctly, we'd just have to make sure that any PR that introduces setHTML continues to use innerHTML for TrustedHTML-like values.

@kkmuffme
Copy link
Author

So, if I understand correctly, we'd just have to make sure that any PR that introduces setHTML continues to use innerHTML for TrustedHTML-like values.

Yes.

Additionally: anything that is from user input (non-static string?) must be sanitized before using even when used with a non-innerHTML method. e.g.
.append( jQuery('#foo').val() ) is not safe and must be sanitized with setHTML before appending.

@timmywil timmywil added Manipulation and removed Discuss in Meeting Reserved for Issues and PRs that anyone would like to discuss in the weekly meeting. labels Aug 29, 2022
@timmywil timmywil added this to the 5.0.0 milestone Aug 29, 2022
@kkmuffme
Copy link
Author

kkmuffme commented Sep 6, 2022

@koto @mgol with Chrome 105 (first Chrome with official support for setHTML) the benchmark already is twice as fast as the benchmark from a few weeks ago with Firefox:
https://www.measurethat.net/Benchmarks/ShowResult/331515
setHTML is about 5 times slower than no sanitizing, which on real world scenarios doesn't make a difference at all

@kkmuffme
Copy link
Author

@koto @mgol Chrome 107 setHTML is only about 4 times slower than no sanitizing, however at 45k/ops/sec (compared to 185k/ops/sec without setHTML) it's so fast, there won't be any feelable (measurable?) difference for the user.

I guess by the time this feature is implemented in jQuery, the performance aspect can be completely ignored already, as it's irrelevant by then, since the difference is small.

@SuperPat45
Copy link

Chrome since version 119 temporarily removed the HTML Sanitizer API:
https://developer.mozilla.org/en-US/docs/Web/API/Element/setHTML#browser_compatibility

The reason for the removal is that the specification is incomplete, which has changed significantly since the addition of Sanitizer to Chrome. Once the specification is ready, the API will be returned.

@lukewarlow
Copy link

Btw wrt to discussions about using setHTML or innerHTML depending on if the argument is already trusted. It would be better to use setHTMLUnsafe instead of innerHTML because the parsing between setHTML and setHTMLUnsafe will be more similar (thinking stuff like declarative shadow dom).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants