Yoav's blog thing

First post ever

2011-02-24T00:00:00Z

So, I've decided to try this blogging thing all the cool kids been talking about.
The last straw was PPK's reason for shutting down comments on his blog: "The opinions and musings of the average blog commenter are just not very interesting. If they were, they’d have a blog of their own."
I have interesting opinions and musings. I do! Therefore, I must have a blog.
So I built this thing. Or more accurately, I took bloggart , copied its theme, broke it and re-built it to be more flexible (I have little love for CSS grids)
What can I tell about myself?
I'm a developer dealing with web performance for the last 11 years. I live in the country side in southern France with my wife and 3 kids. I'm a big fan of the open web and free software, but up until now, I never had enough time to participate beyond bug reports. I'm hoping that would change when the kids grow up a little (and the right project/itch comes along)
This blog will be used mainly to voice out technical opinions. So don't expect vacation notes and/or general bitchin' regarding the hardship of raising kids...
Anyway, That's all for now

P.S. This is just the initial version. I didn't yet tested everything on every browser, so there are some rough edges. Besides, I have no clue what I'm doing here. ... Please be gentle :)

UA spoofers have feelings too!

2011-03-07T00:00:00Z

There's been a lot of noise recently in the web dev world regarding UA sniffing vs. feature detection. It all started when Alex Russell wrote out a post suggesting that there are cases where feature detection UA sniffing can be used, and where feature detection wastes precious time asking questions we already know the answer to. As he predicted, that stirred up a lot of controversy. Nicolas Zakas backed him up (more or less), Faruk Ates gave a history lecture, and the entire comment thread on Alex's post is very entertaining.

I agree with many of the points Alex makes, and detecting the UA on the server side has a *huge* advantage: We can avoid sending useless JS and image data to browsers/devices that will never use them. But, a couple of issues make good counter-arguments:

Writing *correct* UA sniffing code is hard
UA spoofers are left in the dark here. We would serve them content according to what they're pretending to be, rather then content according to their actual browser

The first problem can be solved by a reference project that does the actual detection for major server side languages. The second problem is more complicated. UA spoofing is a practice that came to be in order to circumvent badly written UA sniffing & UA based blocking. While unfortunate, this technique is necessary for minority browser users, as well as in other cases. I for one have to use it when I'm using my phone for 3G tethering. My operator's network only allows phone UAs to go through the phone APN, so I fake it. And when I'm getting mobile sites on my desktop browser, that is... well, let's say it's unfortunate.

What we have so far is:

Feature detection *all the time* slows down things
UA sniffing kills UA spoofing

So, there must be a third way.
What if we could count on UA sniffing for major browsers UNLESS we detect spoofing is in place?
I thought thoroughly regarding a generic solution here, but failed miserably. We can't trust UA strings (neither sent over the wire nor window properties). We can't trust other window properties (such as vendor) as 100% accurate since they as well may be spoofed.
So, do we raise a big white flag? Give up on the idea that a reliable method can be used to detect browsers and avoid feature detection for every single feature we want to use?
Unless...
We can cover the most common use cases for UA spoofing and avoid messing them up. These cases are:

Browsers that pretend to be IE so they won't be blocked by backwards sites
Browsers that pretend to be mobile devices so they won't be blocked by DPI on their network

If anyone ever reads this and finds other use cases for UA spoofing, please leave a comment.

With these use cases in mind we can do the following:

Detect UAs on the server side
If spoofing is suspected, add appropriate code snippet to the page's top
If UA unknown or spoofing detected, feature detect
Otherwise (UA is known), send JSON with known features

That way, if IE UA is seen on server side, we add a conditional comment to the page's top. If a recent mobile device UA is seen (iOS, Android) we can detect it by checking for touch events. There might still be cases we haven't thought about that will still be delivering content according to their advertized UA, but hey, that's a risk you take on when spoofing. In most cases, that makes the UA string reliable. We can then serve a JSON feature set for everything we're absolutely sure the browser supports, and leave feature detect for everything else.

So, thoughts? Ideas? Irrational emotional responses?
Bring it on...:)

Putting IE to sleep

2011-03-31T00:00:00Z

I had a twitter discussion today with Robert Nyman regarding how we should treat old IEs (6,7 and 8) when we develop. The trigger was his tweet regarding his post from 2 years back that we should stop developing for IE6. Since twitter is fairly limited for this kind of things, here's my real, chatty opinion on this subject.

Stats below are taken from statcounter and regard EU and North America. Other markets' mileage may vary :)

So, what's the problem?

While we certainly can simply ignore IE6 in most of the world (around 2% market share), IE7 still has 7%-11% in the western world. But then again, this is where IE6 was 2 years ago when Robert wrote his post. In any case IE7 can be ignored or simply nudged to upgrade, since it is by all means an obsolete browser.

The real trouble start with IE8. While IE8 is much better then its older brothers, it is still a piece of crap in comparison to today's browsers and it does not support any of the new APIs we need to make the web awesome. It has a market share of 26-34% in the western world and since IE9 is not for XP, it is not going away anytime before the end of 2014 when XP is *finally* decommissioned. It will probably last a little while longer after that as well.

What can we do about it?

There are a few approaches that web developers can use in order to drive people away from the old IEs into the modern web:

Advocate - Campaigns like HTML5forXP are trying to get the users to upgrade through awareness
Nag - Display in-site messages that notify the user that he would be getting a better experience if he'd upgrade to a modern browser or install Chrome Frame
Ignore - Stop testing on old IEs and trying to create a similar experience for these users using various polyfills
Exclude - Block out old IE users from sites until they upgrade

While ignoring is tempting, you probably don't want 40% of your users to have a shitty experience on your site, so my personal favorite is "nag and ignore the none-essential parts" approach. (kinda like twitter with border radius)

On the other hand, I can't help from reflecting on the fact that Macromedia Flash (before it was bought by Adobe) gained 98% market share through Exclusion. "If you want to see this website - you MUST install Flash" was the paradigm that got it there.

The big question is "Who was the first to exclude users without Flash?" (If anyone knows, I'd love to hear about it). A bigger question is "Will one of the big guns on the web today (Google, Yahoo, Facebook, Bing) start excluding services from old IEs?". I know some of them don't support IE6, but who will be the first to not support IE8? I'm only guessing here, but it probably won't be Bing...

That's it for now.

Thoughts?

UPDATE: I found some "Way back machine" stats that indicate that Macromedia Flash made a final market share leap from 90% to 95% in the summer of 2000. Could not find stats before that though...

My take on adaptive images

2011-05-31T00:00:00Z

Proposal flood

In the last few days there have been a lot of proposals regarding ways to load lower resolution images for lower resolution devices. There have been Nicolas Gallagher's Responsive images using CSS3, a proposal on the W3C list, and yesterday came Robert Nyman's proposal.

Obviously, I also have an opinion on the matter :)

My contribution to the flood

While the proposals from Nicolas & Robert are interesting, they throw a huge maintenance burden on the CSS, and make it practically uncacheable in many fast-updating pages. In a CNN-like scenario, where the images are changing every 1-2 hours, the CSS (or at least one CSS file) will have to change along with them, which is bad for performance, and basically means dynamically generated CSS. I think it is bad news.

I think we are much better off with something in the lines of:

img{
    src-prefix: "big_";
}
@media screen and (max-width: 600px){
    img{
        src-prefix: "small_";
    }
}

When an image defined as <img src="img.jpg"/> will be requested, it will be requested as "big_img.jpg" in devices that exceed 600px max-width, as "small_img.jpg" in devices that have lower resolution, and as "img.jpg" in browsers that do not support this new attribute.

To me this is a solution that has very small maintenance, keeps all changing image content in the HTML where it belongs, and has an inherit fallback.

Please leave a comment here or on twitter if you have some feedback on this.

Yoav

Update: As Nicolas Gallagher pointed out in the comments, his proposal does not put maintenance burden on the CSS. Extra URLs are written as data attributes in the HTML. I still think my proposal is better, though :)

Responsive images - hacks won't cut it

2011-07-23T00:00:00Z

TL;DR

Responsive images are important for mobile performance. Hacks may solve the problem, but they come with their own performance penalty. Browser vendors must step up to create a standard, supported method.

What we have

There have been several techniques published lately that enable responsive images using various hacks:

Harry Roberts suggested to use background images & media queries to deliver larger images to desktop browsers
Keith Clark suggested to use JS at the document head to plant cookies that will then send the device dimensions to the server with every image request. The server can then serve different image dimensions on the same image URL
Yet another approach is that of the filament group which is based on dynamically modifying the base tag according to device dimensions

Not good enough

The reason we need responsive images in the first place is to avoid downloading excessively large images and avoid the performance penalty that such excessive download incurs.

All current techniques avoid some of this performance penalty, but come with new performance issues. Using the same URL for images in different dimensions means that the images can't be cached by intermediary cache servers. Using only/mainly background images means that images download won't start until the CSS was downloaded and parsed. It also means that the CSS that contains content-related image URLs cannot be long term cacheable. Dynamically modifying the base tag is generally frowned upon by browser vendors since it can mess up the preload scanner, that loads external resources before all CSS and JS were downloaded and run.

All in all, since techniques that modify the URL prevent the preload scanner from working properly, and techniques that don't modify the URL prevent caching, I don't see how a responsive images hack can avoid a performance penalty of its own, which kinda misses the point.

What we need

We need browser vendors to step up and propose (i.e. implement ☺) a standard method to do this. A standard method that will be supported by the preload scanner, and therefore, won't delay image download and won't have a performance penalty.

I have made a proposal a couple of months ago for such a method, in response to Nicolas Gallagher's and Robert Nyman's proposals, but basically any method that will keep the URL maintenance burden in the HTML, keep both CSS & images cacheable, and will have no performance penalty will be welcome.

Thoughts?

Simpler responsive images proposal

2011-07-25T00:00:00Z

TL;DR

Adding a media attribute that supports queries to the base tag is all that's required to have responsive images with no performance penalty.

The thread

After my last post Nicolas Gallagher pointed me towards a mail thread on html-public mailing list that discusses appropriate solutions to the responsive images problem.^*

There were a few suggested solutions there:

Each image tag will have child source tags with a media attribute in each
A new "image" format that will deliver the browser the real image URLs according to dimensions. The browser will then fetch the image it needs according to that info
Web authors will use a progressive image format and browsers will terminate the connection once they have enough data to properly reconstruct a downsized image
Allow media attribute on all elements
Add HTTP headers that will allow content negotiation

In my opinion, only the two solutions that involve the media attribute can resolve the problem with a front-end only solution, where content stays in the HTML (leaving the CSS cacheable independently from content) without any performance problems. The downside of both is that they add a lot of repeating code to the HTML. Each resource will have to be defined several times while adding a media query to each resource. A lot of copy-pasting...

Eureka

That got me thinking of a "conditional comment"-like media query syntax inside the HTML that will enable to define a different base tag according to dimensions. Then I realized that we don't need the fancy new syntax I just made up. All we need is a media attribute in the base tag that supports queries.

A base tag with a media attribute will enable us to set the base for relative URLs according to dimensions, so we would be able to simply have small images in one directory and larger images in another one, without having to specify that on a per-image basis.

Also, adding media attribute only to the base tag will probably be simpler to implement than adding it to all resources.

While that solution won't provide maximal flexibility in determining the different resolution URLs, I believe it is good enough to resolve the responsive images problem in a clean, pure front-end manner.

Thoughts?

^{* I actually saw the initial suggestions there a couple of months ago, but missed the followup responses}

Preloaders, cookies and race conditions

2011-09-28T00:00:00Z

Responsive Images and cookies

Jason Grigsby wrote a great post summarizing different approaches to responsive images and what sucks about them. Among other things, he discussed the problem of getting the first page load right. After a short discussion with him on Twitter, I decided to take a deeper look into the Filament group's cookie based method.

Tests

Testing their demo site using Firefox showed that the first browse brings in the smaller "mobile" image. The same test on Chrome downloaded the larger image. I became intrigued.

From their source code, I saw that they are using an external script that modifies the page's domain cookies. From sniffing on my traffic to the demo site, I saw that on Firefox, the image's request was sent without cookies before the external script finished downloading.

Conclusions

After a short research and some asking-around I came up with:

Chrome's preloader blocks image download until all <head> scripts and stylesheets were downloaded for performance reasons.
Firefox's preloader does not block for external scripts by design. It does block for inlined scripts though.
Running this test in IE9 shows that its preloader does not block for neither external nor inlined scripts.

It seems that preloader behavior of each browser is focused on performance rather than sending cookies that may have been set using scripts. Inlined scripts currently block preloading in both Firefox and Chrome, but not on IE9.

Bottom line

Different browsers act differently with regard to which resources they download before/after the <head> scripts are done loading and running. Furthermore, that behavior is not defined in any spec, and may change with every new release. We cannot and should not count on it.

Unblocking blocking stylesheets

2011-10-07T00:00:00Z

TL;DR

Spoiler: If you have inline scripts that must be inside your page, adding an empty <div> before all stylesheet links will avoid a CSS bottleneck!!!

Do we really need Async CSS?

I've stumbled upon a post by Guy Podjarny called Eliminating the CSS bottleneck which talks about using async stylesheets in order to avoid external stylesheets from blocking the download of other resources. What bothered me about the entire post is that according to BrowserScope this issue is non existent in all modern browsers. All modern browsers tested there are not blocking parallel downloads of stylesheets and other resources. Even when an inline script is present between the external stylesheet and other resources, most browsers (IE8 is the only exception) will continue to download the resources.

I've commented on that post, saying that the only point I see for async stylesheet is to avoid Front-end SPOF. Guy responded that a stylesheet followed by an inlined script will still block resources in modern browsers, and provided a sample page that proved his point. I rechecked the BrowserScope tests and was baffled. Now I have two tests that are supposed to test the same thing, but show different results...

Not really

So I did what every reasonable developer would do and started cutting off parts and changing both tests till they were almost identical. The "almost" part was a <p> tag that was written in the BrowserScope test *before* the stylesheet link tag. Once I added a similiar <p> tag to the blocking example, it stopped blocking!!! And an empty <div> had the same effect.

Conclusions

A simple empty <div> tag before any stylesheet links can save you from performance issues if you must add scripts inside your page's end
The BrowserScope test is not good enough and should be modified to avoid <p> tag before the stylesheet.
Here are the blocking vs. non-blocking examples

If anyone has a reasonable explanation as to why a div before the stylesheet links releases them from blocking, please share in the comments.

Update:Guy updated his blog post with an explanation. It seems this trick causes Chrome & Firefox to start the body earlier, and they simply don't block for body stylesheets.

Update 2:Fixed broken links.

Responsive image format

2012-05-07T00:00:00Z

Can't be done?

All along the responsive images debate, there were several people that claimed that the salvation will come in the form of a new image format that will enable images that are automagically responsive.

My response to these claims was always that it can't be done.

It can't be done since the browser needs to download the image in order for it to analyze which parts of the image it needs. Yes, the browser can start to download the image and reset the connection once it has enough data to display the image properly, but that will always download much more than actually neccessary. (not to mention, an extremely ugly solution)

Also, introducing new image formats to the web is less than trivial and extremely slow at best (If you're not convinced, see Mozilla's response to WebP a year ago.)

And don't get me started on the lack of fallback mechanisms for new image formats :)

So, in one of the latest twitter discussions, when the subject came up, I was about to make all the above claims once again. But then I realized I was wrong all along. It can be done, it can be done gracefully, and it can be done with current image formats

HOW?!?!

The web already has a "responsive" format, which is progressive JPEG. The only issue at hand is getting the browsers to download only the neccesary bytes of the progressive JPEG.

Here's how we can do this:

The author will compress the progressive JPEG with multiple scans
The browser would download an initial buffer of each image (10-20K), using the "Range" request header
This initial buffer will contain the image's dimensions and (optionally) a "scan info" JPEG comment that will state the byte breakpoints of each one of the JPEG scans (slightly similar to the MP4 video format meta data)
If the image is not a progressive JPEG, the browser will download the rest of the image's byte range
When the scan info comment is present, the browser will download only the byte range that it actaully needs, as soon as it knows the image's presentation size.
When the scan info comment is not present, the browser can rely on dimension based heuristics and the "Content-Length" header to try and guess how many bytes it needs to really download.

Advantages

DRY and easy to maintain - no need to sync the URLs with the correct resolution between the image storage and the HTML/CSS. Only a single image must be stored on the server, which will significantly simplify authors' lives.
The image optimization can be easily automated.
Any progressive image used in a responsive design (or that its display dimensions are smaller than its real dimensions) can benefit from this, even if the author is not aware of responsive images.

Downsides

The optimization burden with this approach will lie on the shoulders of browser vendors. Browsers will have to come up with heuristics that correlate between number of bits per scan and the "visually acceptable" output dimensions.
Two request for every large image, might have a negative effect on the download speed & uplink bandwidth. Browser vendors will have to make sure it won't negatively effect speed. SPDY can resolve the uplink bandwidth concerns.
It is not certain that savings using the "responsive progressive" method are identical to savings possible using resize methods. If it proves to be an issue, it can probably be optimized in the encoder.

Disclaimers

This proposal does not claim that all the current <picture> tag efforts are not neccessary. They are required to enable "art direction responsiveness" to images, and give authors that need it more control over the actual images delivered to users.

With that said, most authors might not want to be bothered with the markup changes required. A new, complementary image convention (not really a new format) that will provide most of the same benefits, and can be applied using automated tools can have a huge advantage.

It is also worth noting that I did not conduct a full byte size comparison research between the responsive progressive method and full image resizing. See the example below for an anecdotal comparison using a single large image.

Examples

All of the images in the responsive progressive example are a single progressive JPEG that was truncated after several scans.

This is an attempt to simulate what a single progressive JPEG might look like at various resolutions when only a part of its scans are used, and how much the browsers will have to download.

We can see here that the thumbnail image below is significantly larger as responsive progressive than it is as a resized thumbnail, and the largest image is about the same size.

IMO, the responsive progressive images look significantly better than their resized counterparts, so there's probably room for optimization here.

The original image is 1920x1280, weighs 217K and can be found here (It is part of Ubuntu's default wallpapers package)

240x160 - responsive progressive - 17K

240x160 - resize - 5.2K

480x320 - responsive progressive - 21K

480x320 - resize - 15K

960x640 - responsive progressive - 57K

960x640 - resize - 59K

Update: I just saw a slightly similar proposal here. My main problem with it is that a new format will take too long to implement and deploy, and will have no fallback for older browsers.

Images. Can we have less?

2012-07-24T00:00:00Z

# Summary for the impatient

Lossless compression with current formats can reduce image size on the web by 12.5%.

PNG24 with an Alpha channel comprise 14% of images on the web. We can cut their size by 80% using WebP.

Savings from lossless optimization can save 1.5% of overall Internet traffic!*

Savings from conversion of PNG24 with an alpha channel to WebP can save 2.1% of overall Internet traffic!!!

That's 2.8 Tbps!!! That's over 10 million kitty photos per second**!!! Save bandwidth for the kittens!

# How it began

A couple of months ago, I attended (the awesome) Mobilism Conference in Amsterdam, and got introduced to the legendary Steve Souders. We got talking about images and the possibilities in lossless compression. He suggested to send me over the image URLs for the top 200K Alexa sites and that I'll run some lossless compression analysis on them. How can you say no to that?

So, 5.8M image URLs later, I started downloading, analyzing and optimizing images. That took ages, mostly because ls is slow with 5.8M files in a single directory. (I found a solution to that since)

# Results

# Bytes distribution according to type:

Image format	% from overall images
JPG	65.7%
GIF	11.5%
PNG8	1.3%
PNG24	5.1%
PNG24α	14%
other	2.4%

# Lossless optimizations:

Optimization	% savings
JPEG EXIF removal	6.6%
JPEG EXIF removal & optimized Huffman	13.3%
JPEG EXIF removal, optimized Huffman & Convert to progressive	15.1%
PNG8 pngcrush	2.6%
PNG24 pngcrush	11%
PNG24α pngcrush	14.4%

Overall it seems that with these lossless optimization techniques, about 12.5% of image data can be saved.

# Notes:

I used jpegtran for the JPEG optimization and pngcrush for the PNG optimization.
In order to speed things up, I did the optimization experiments over a sample of 100K random images from each type.

# PNG24α

PNG24 images with an alpha channel (PNG color type 6) are the only way to put high quality real life images with an alpha channel on the web today. This is the reason they comprise 14% of overall image traffic on the web. What distinguishes them from other image formats is that in most cases, they are the wrong format for the job. JPEGs represent real life images with significantly smaller byte sizes. The only reason they are used is their alpha channel. That's where WebP fits it.

# WebP

WebP is a new(ish) image format from Google. It is a derivative of their VP8 video codec, and provides significant image savings. One of the killer features of their latest release is an alpha channel. It means that PNG24α images can be converted to WebP (in its lossy variant) with minimal quality losses and huge savings.

# PNG24α => WebP

I ran that conversion on the set of 100K PNG24α images. What I got was 80% size reduction in average for these images. From looking at Google's latest research, even if they don't say it out loud, they get similar results in their latest study. (0.6 bits per pixel for lossy WebP vs. 3.6 bits per pixel for PNG)

# What's the catch?

There are 2 problems with deploying WebP today:

Browser support
- WebP's previous version is currently only supported by Chrome, Android & Opera. WebP's current version will probably be supported in Chrome in 3-6 months.
- FireFox has [refused][] to implement the format in its previous incarnation for various [reasons][]. Let's hope they would reconsider the format in it's current version.
- Microsoft and Apple have not made any public comments.
Lack of fallback mechanisms for the <img> element.
- That means that implementing WebP requires server side logic, and caching that varies according to User-Agent.
- The [proposed <picture> element](http://www.w3.org/community/respimg/wiki/Picture_Element_Proposal) does not include such a mechanism either. It probably should.

# What I did not yet test?

I did not yet test WebP's benefits for lossy images, which Google claim to be around 30%. These savings are likely to make WebP even more attractive.

# Conclusions

Better lossless image compression using current formats by web authors can provide 12.5% savings of images data. Web authors should start using the free tools that do that, and should start doing this today! No excuses!
WebP in its latest incarnation can provide dramatically higher savings, especially in the use case of real-life alpha channel photos. It would increase potential image data savings to at least 21.7%.
We need browsers to either support WebP or offer a better alternative. Current file formats are not good enough, especiall for the use case of real-life photo with alpha channel.
We need a fallback mechanism in HTML that will enable browsers and authors to experiment with new file formats without cache-busting server side hacks.

* Assuming that images comprise 15% of overall Internet traffic, which is a conservative assumption

** Assuming 35KB per kitty photo, similar to this one:

Fetching responsive image format

2012-08-27T00:00:00Z

I just read Jason Grigsby's post, and tried to answer it in the comments, but saw that my response passed the limits of a reasonable comment. So here I am.

This post is a proposal for a file structure that will enable browsers to fetch images encoded using a responsive image format.

# But which format?

Regardless of the image format that will eventually be used, a large part of the problem is coming up with a way to download only the required parts of the responsive image, without downloading unneeded image data and without reseting the TCP connection.

In any case, the format itself should be constructed in layers, where the first layer contains the image's lowest resolution, and each further layer adds more detail. An example of such layers are JPEG's progressive mode scans.

# Earlier proposals

In a recent discussion, Jason linked me to a proposal for a responsive image format. While I didn't find the proposal practical because of its use of JPEG-XR, I did like the way it suggested to handle fetching of the different layers (for different resolutions). Actually, I liked it more than I liked my own proposal to use ranges.

The main disadvantage of this method is that it may cost up to a full round-trip time (RTT) per layer to fetch an image. If you have more then simple low/high resolution layer, the delay might quickly add up.

# Responsive image file structure

The image will be split into two or more files
Each one of these files will have its own URL
The image's header and the first (lowest resolution) layer will be in a single file. This file's URL will be part of the HTML and will trigger fetching of the image.
Other files may contains one or more layers
If a file contains more than a single layer, the layers must be in ascending order, from lower resolution to higher one.
The first layer should contain meta data that includes the number of files, which layers each file contains and the byte offset of each layer inside each file.
The HTTP response headers of the first layer should contain a list of files to the followup layers.

# Image loading process

The browser will fetch the image's first layer file, as part of the page's loading process, using the lookahead pre-parser. That first layer will provide the browser with all the information it needs to further download more layers (which might be in one or more further files) as it sees fit. Fetching more layers will be based on the file structure. Files that only contain needed layers will be fetched in their entirety. For files that also contain unneeded layers, "Range" requests will be used.

# Advantages

That file structure will give the author enough flexibility to arrange the image's layers in an optimal way. In case the author knows that its server and front-end cache support the HTTP "Range" header, he can use a single file to serve all the layers beyond the first layer. If this is not the case, the author can serve each layer in a file of its own.

From the browser's perspective, this structure enables it to fetch additional layers as soon as it knows the dimensions of the image to be displayed. Additional layers can be fetched using "Range" (where supported) or using separate HTTP requests. In case that separate HTTP requests are used, the browser can also parallel them, since it has all the URLs for the layers it needs once it got the first layer. The requests for the different layers can also be pipelined in this case.

By definition, the browser needs to wait for the layout phase in order to be absolutely sure it needs to download followup layers. If that would prove to be a performance bottleneck, the browser can heuristically download followup layers before it is certain they are needed (based on viewport size, image dimensions, etc).

Another advantage is that for "non-responsive" images, the browser simply downloads the image itself. There's no need to declare in the markup if an image is responsive or not.

# Disadvantages

When compared to simple image fetching, image fetching with the technique described above may suffer up to a single RTT delay, when "Range" is supported. If "Range" is not supported, the delay per image may go up, even though it is not likely that it will reach the maximal "RTT per layer" performance cost. This disadvantage is probably negligable compared to the time savings that will result from fewer bytes passing over the wire.

On the other hand, for retina display devices that download all the image's layers, this delay may be noticeable.

Thoughts?

How Big Is Art-Direction?

2013-05-13T00:00:00Z

For a while now, the art-direction use-case have been treated by browser vendors as resolution-switching's imaginary friend.

When talking to people who work for browser vendors about that use-case, I've heard phrases like "that's a really obscure use-case" and "No one is really doing art-direction".

This got me wondering — how big is that use-case? How many Web developers & designers are willing to go the extra mile, optimize their images (from a UI perspective), and adapt them so that they'd be a perfect fit to the layout they're in?

# Methodology

With the lack of solid data on the subject, I had to go get some :)

Arguably, one of the easiest ways for Web developers to implement art-direction today is to use picturefill — the framework that polyfills the picture element's syntax. So all I had to do is find sites using picturefill and see which ones use the framework for art-direction rather than simple resolution-switching.

I've used the WebDevData scripts to get a hold of Alexa's top 50K websites' HTML. Then I grepped through those HTML files to find pages that contain "data-picture" (the data attribute used by picturefill), downloaded the images and (manually) went through the results to find which sites art-direct their images. Not very scalable, but it works for a small amount of sites.

# Results

The results showed that 24% (7 out of 29) of the sites that use picturefill, use it to implement art-direction. While a larger sample would be better, this is a strong indication that the art-direction use-case is an important use-case for responsive images.

# Update

Embedding the Gist with the results:

Who Is Sizer Soze?

2013-06-17T00:00:00Z

# For the impatient

Sizer-Soze is a utility that enables you to evaluate how much you could save by properly resizing your images to match their display size on various viewports.

Basically it shows you how much image data you could save if you deployed an ideal responsive images solution. If you already have a responsive images solution in place, it enables you to see how far it is from that ideal, and improve it accordingly.

# How it started

One Saturday morning a few weeks back, I read a blog post by Jason Grigsby that ~~pissed me off~~ inspired me, like his posts often do :)

He wrote about responsive images and how we should calculate their breakpoints according to our performance budget. (If you haven't read it yet, you probably should).

He also wrote that this approach would be difficult to use with proposed viewport-based responsive image solutions such as <picture>and srcset. This is the part where we disagree.

I believe that for most cases, a simple build step can be used to make the translation from layout to viewport based breakpoints easy for Web developers. Since the alternative to viewport-based solutions are layout-based solutions, which have inherent performance issues, I think this is our best way forward with responsive images.

The discussion of a "Responsive images performance budget" got me thinking that we lack tools that can calculate the size difference between the images we serve our users and their actual display size in responsive designs, tools that can be used as part of a build system.

Currently, we have no clue how much image data we are sending for nothing! You cannot handle a budget if you don't know how much you're spending.

A couple of hours later, I had an initial version of Sizer-Soze up and running.

# How it works

Sizer-Soze is comprised of several scripts. It started with mostly bash scripts, but I've completely re-written it in Python a few days back, since I found that I've reached the limits of the previous architecture, and needed more speed, parallelism and stability.

I'll try to describe the roles of the various scripts, without going into specifics that may change in the future.

# getImageDimensions

This is a PhantomJS script, that gets a single Web page URL as its input, and well as the viewport dimensions it needs to examine.

It then outputs the various image resources that it picked up from the Web page and the display dimensions of each. Since PhantomJS is a real headless browser, it picks up both the images that are defined directly in the HTML, and the images that are added dynamically using scripts.

The script detects only content images (i.e. CSS background images are not detected). It may also miss images that are added later on in the Web page's lifecycle, such as lazy loaded images the are added following a user scroll, or another user action. Data URIs are also ignored for the time being.

When "display: none" images are present in the page, the script can take quite a while to run (up to 25 seconds), since it waits to see if these images are a part of a carousel (and are displayed eventually) or simply hidden for this particular breakpoint.

# downlodr

Downloads the page's image resources in a parallel fashion.

# resizeBenefits

Performs lossless optimization on the image resources (using image_optim), and resizes them to their displayed size (using ImageMagick followed by image_optim). It then outputs the optimization savings for each image, as well as the resize savings.

The results are written (by the sizer script we'll discuss later) to the output directory (set in the settings file) under a directory with the site's slugified name as results_<VIEWPORT>.txt (e.g. /tmp/http-microsoft-com/results_360.txt)

# sizer

Binds the above mentioned scripts together, by iterating over them for various predefined viewports, and outputting the summary of their results (i.e. how much can be saved for each viewport by optimization and by resizing) to standard output, so to the screen by default.

# bulkSizer

This script simply takes in a text file full of URLs and runs sizer on these URLs in a multi-process manner, that makes sure that the long time each one of these runs take doesn't accumulate, and makes sure running the script on a bulk of Web sites doesn't take forever.

# How can it be used?

Well, Tim Kadlec wrote a post recently about his findings using Sizer-Soze to evaluate how much image data is wasted in responsive designs from the mediaqueri.es collection. He came up with staggering 72% image data savings for small viewports and 41% savings for relatively large viewports.

I have ran a similar test on a list of responsive Web sites from the RWD twitter stream, and came up with similar results for small viewports, and even worse results for 1260px wide viewports. (53% image data savings can be acheived by resizing).

If the authors of these sites would have run these same tests before shipping them, I'm guessing the situation would've been significantly different.

While using Sizer-Soze to test other people's sites can be fun, you should probably use Sizer-Soze to check the responsive sites you're building, and how well the images you're serving your users fit their displayed size.

For that purpose, you can integrate Sizer-Soze into your build scripts and use its output to trigger alerts (or fail the build) if your performance budget is exceeded.

One more thing — the RICG have started a repo called Sizer-Soze-frontend. For now the project is in its very early stages, but hopefully it will soon become something developers can rely on, and use for occasional testing, to see if their sites are kept in check without installing anything on their local machine.

# The future

I'm planning to keep evolving this utility to make it faster, more maintainable and easier to use. Better build process integration is a high priority for me. I also intend to improve it, and make sure that it covers all content images.

You guys can help by using it, finding bugs & possible improvements, filing issues, and sending pull-requests :)

Both the back-end and the front-end projects can use some more helping hands. If you want to contribute, feel free to hop on the #sizer-soze channel on Freenode's IRC server.

# Why Sizer-Soze????

I named the initial script sizer as a temporary name. It was all downhill from there.

Responsive Image Container

2013-09-09T00:00:00Z

It's been a year since I last wrote about it, but the dream of the "magical image format" that will solve world hunger and/or the responsive images problem (whichever one comes first) lives on.

A few weeks back I started wondering if such an image format can be used to solve both the art-direction and resolution-switching use-cases.

I had a few ideas on how this can be done, so I created a prototype to prove that it's feasible. This prototype is now available, ready to be tinkered with.

In this post I'll try to explain what this prototype does, what it cannot do, how it works, and its advantages and disadvantages over markup solutions. I'll also try to de-unicorn the responsive image format concept, and make it more tangible and less magical.

# You've got something against markup solutions?

No, I don't! Honest! Some of my best friends are markup solutions.

I've been part of the RICG for a while now, prototyping, promoting and presenting markup solutions. Current markup solutions (picture and srcset) are great and can cover all the important use cases for responsive images, and if it was up to me, I'd vote for shipping both picture and srcset (in its resolution switching version) in all browsers tomorrow.

But the overall markup based solution has some flaws.

Here's some of the criticism I've been hearing for the last year or so when talking responsive images markup solutions.

# Too verbose

Markup solution are by definition verbose, since they must enumerate all the various resources. When art-direction is involved, they must also state the breakpoints, which adds to that verbosity.

# Mixing presentation and content

Art-direction markup solution needs to keep layout breakpoints in the markup. That mixes presentation and content, and means that layout changes will force markup changes.

There have been constructive discussions on how this can be resolved, by bringing back the MQ definitions into CSS, but it's not certain when any of this will be defined and implemented.

# Define viewport based breakpoints

This one is heard often from developers. For performance reasons, markup based solutions are based on the viewport size, rather than on the image's dimensions. Since the images' layout dimensions are not yet known to the browser by the time it start fetching images, it cannot rely on them to decide which resource to fetch.

For developers, that means that some sort of "viewport=>dimensions" table needs to be created on the server-side/build-step or inside the developer's head in order to properly create images that are ideally sized for a certain viewport dimensions and layout.

While a build step can resolve that issue in many cases, it can get complicated in cases where a single components is used over multiple pages, with varying dimensions in each.

# Result in excessive download in some cases

OK, this one is something I hear mostly in my head (and from other Web performance freaks on occasion).

From a performance perspective, any solution that's based on separate resources for different screen sizes/dimensions requires re-downloading of the entire image if the screen size or dimensions change to a higher resolution than before. Since it's highly possible that most of that image data is already in the browser's memory or cache, re-downloading everything from scratch makes me sad.

All of the above made me wonder (again) how wonderful life would be if we had a file format based solution, that can address these concerns.

# Why would a file format do better?

The burden is put on the image encoder. The markup stays identical to what it is today. A single tag with a single resource.
Automated conversion of sites to such a responsive images solution may be easier, since the automation layer would just focus on the images themselves rather than the page's markup and layout.
Image layout changes (following viewport dimension changes) can be handled by downloading only the difference between current image and the higher resolution one, without re-downloading the data that the browser already has in its memory.
Web developers will not need to maintain multiple versions of each image resource, even though they would have to keep a non-responsive version of the image, for content negotiation purposes.

This is my attempt at a simpler, file format based solution that will let Web developers do much less grunt work, avoid downloading useless image data (even when conditions change), while keeping preloaders working.

# Why not progressive JPEG?

Progressive JPEG can fill this role for the resolution switching case, but it's extremely rigid.

There are strict limits on the lowest image quality, and from what I've seen, it is often too data-heavy. The minimal difference between resolutions is also limited, and doesn't give enough control to encoders that want to do better.

Furthermore, progressive JPEG cannot do art-direction at all.

# How would it look like?

A responsive image container, containing internal layers that can be either WebP, JPEG-XR, or any future format. It uses resizing and crop operations to cover both the resolution switching and the art direction use cases.

The decoder (e.g. the browser) will then be able to download just the number of layers it needs (and their bytes) in order to show a certain image. Each layer will provide enhancement on the layer before it, giving the decoder the data it needs to show it properly in a higher resolution.

# How does it work?

The encoder takes the original image, along with a description of the required output resolutions and optionally art-direction directives.
It then outputs a layer per resolution that the final image should be perfectly rendered in.
Each layer represents the difference in image data between the previous layer, when "stretched" on the current layer's canvas, and the current layer's "original" image. That way, the decoder can construct the layers one by one, each time using the previous layer to recreate the current one, creating a higher resolution image as it goes.

Support for resolution switching is obvious in this case, but art-direction can also be supported by positioning the previous layer on the current one and being able to give it certain dimensions.

Let's look at some examples:

# Art-direction

Here's a photo that used often in discussion of the art-direction use-case (I've been too lazy to search for a new one):

let's take a look at what the smallest layer would look like:

That's just a cropped version of the original - nothing special.

Now one layer above that:

You can see that pixels that don't appear in the previous layer are shown normally, while pixels that do only contain the difference between them and the equivalent ones in the previous layer.

And the third, final layer:

# Resolution switching

A high resolution photo of a fruit:

The first layer - showing a significantly downsized version

The second layer - A diff between a medium sized version and the "stretched" previous layer

And the third layer - containing a diff between the original and the "stretched" previous layer

If you're interested in more details you can go to the repo. More details on the container's structure are also there.

# But I need more from art-direction

I've seen cases where rotation and image repositioning is required for art-direction cases. It was usually in order to add a logo/slogan at different locations around the image itself, depending on the viewport dimensions.

This use-case is probably better served by CSS. CSS transforms can handle rotation and CSS positioning, along with media specific background images, can probably handle the rest.

If your art-direction case is special, and can't be handled by either one of those, I'd love to hear about it.

# How will it be fetched?

That's where things get tricky. A special fetching mechanism must be created in order to fetch this type of images. I can't say that I have that part all figured out, but here's my rough idea on how it may work.

My proposed mechanism relies on HTTP ranges, similar to the fetching mechanisms of the <video> element, when seeks are involved.

More specifically:

Resources that should be fetched progressively should be flagged as such. One possibility is to add a progressive attribute on the element describing the resource.
Once the browser detects an image resource with a progressive attribute on it, it picks the initial requested range for that resource. The initial range request can be any one of:
- A relatively small fixed range for all images (e.g. 8KB)
- Specified by the author (e.g. as a value of the progressive attribute)
- Some heuristic
- Based on a manifest (we'll get to that later)
The browser can fetch this initial range at the same time it requests the entire resource today, or even sooner, since the chances of starving critical path resources (e.g. CSS & JS) are slimmer once the payloads are of known size.
Once the browser has downloaded the image's initial range, it has the file's offset table box, which links byte offset to resolution. That means that once the browser has calculated the page's layout, it'd know exactly which byte range it needs in order to display the image correctly.
Assuming the browser sees fit, it can heuristically fetch follow-up layers(i.e. higher resolutions), even before it knows for certain that they are needed.
Once the browser has the page's layout, it can complete fetching of all the required image layers.

The above mechanism will increase the number of HTTP requests, which in an HTTP/1.1 world will probably introduce some delay in many cases.

That mechanism can be optimized by defining a manifest that would describe the image resources' bytes ranges to the browser. The idea for adding a manifest was proposed by Cyril Concolato at last year's TPAC, and it makes a lot of sense, borrowing from our collective experience with video streaming. It can enable browsers to avoid fetching an arbitrary initial range (at least once the manifest was downloaded itself).

Adding a manifest will prevent these extra requests for everything requested after layout, and may help to prevent them (using heuristics) even before layout.

Creating a manifest can be easily delegated to either build tools or the server side layer, so devs don't have to manually deal with these image specific details.

# Can't we simply reset the connection?

In theory we can address this by fetching the entire image, and reset the connection once the browser has all the necessary data, but that will most likely introduce serious performance issues.

The problems with reseting a TCP connection during a browsing session are:

It terminates an already connected, warmed up TCP connection which setup had a significant performance cost, and that could have be re-used for future resources.
It sends at least an RTT worth of data down the pipe, the time it takes for the browser's reset to reach the server. That data is never read by the browser, which means wasted bandwidth, and slower load times.

# Downsides of this approach?

It involves touching and modifying many pieces of the browser stack, which means that standardization and implementation may be painful and take a while.
The monochrome/print use case cannot be addressed by this type of a solution.
The decoding algorithm involves a per-layer upscaling, which may be processing heavy. Therefore, decoding performance may be an issue. Moving this to the GPU may help, but I don't know that area well enough to be the judge of that. If you have an opinion the subject, I'd appreciate your comments.
Introducing a new file format is a long process. As we have seen with the introduction of past image formats, the lack of a client-side mechanism makes this a painful process for Web developers. Since new file formats start out being supported in some browsers but not others, a server-side mechanism must be used (hopefully based on the Accept header, rather than on UA). I'm hoping that the fact that this new file format is very simple and relies on other file formats to do the heavy lifting, may help here, but I'm not sure it would.
As discussed above, it's likely to increase the number of requests, and may introduce some delay in HTTP/1.1.
This solution cannot answer the need for "pixel perfect" images, which is mainly needed to improve decoding speed. Even if it would, it's not certain that decoding speed would benefit from it.
Relying on HTTP ranges for the fetching mechanism can result in some problem with intermediate cache server, which don't support it.

# So, should we dump markup solutions?

Not at all. This is a prototype, showing how most of the responsive images use-cases would have been solved by such a container.

Reaching consensus on this solution, defining it in detail and implementing it in an interoperable way may be a long process. The performance implications on HTTP/1.1 sites and decoding speed still needs to be explored.

I believe this may be a way to simplify responsive images in the future, but I don't think we should wait for the ideal solution.

# To sum it up

If you just skipped here, that's OK. It's a long post.

Just to sum it up, I've demonstrated (along with a prototype) how a responsive image format can work, and can resolve most of the responsive images use cases. I also went into some detail about which other bits would have to be added to the platform in order to make it a viable solution.

I consider this to be a long term solution since some key issues need to be addressed before this solution can be practical.
IMO, the main issue is decoding performance, with download performance impact on HTTP/1.1 being a close second.

I think it's worth while to continue to explore this option, but not wait for it. Responsive images need an in-the-browser, real-life solution ~~two years ago~~ today, not two years from now.

Long Overdue

2015-05-08T00:00:00Z

I owe you fine folks a blog post.

I recently tidied up my blog and moved it to a new backend and in the process realized that I haven't blogged in over 18 months. (!!!)

# Where did I disappear?

A lot have happened during that time:

The markup based responsive images solutions, which seemed to be facing a dead-end in September 2013, were revived and significantly improved. In order to defuse initial resistance from the Blink project, I started implementing required infrastructure in order to implement the features there.

After a while when I realized that this would take me a long while unless I did that full time, I went to the community (With the help of wonderful RICG folks - particular shout out to Mat and Geri) and asked for your help.

The contributions from the community did not disappoint, and I started what I thought would be a 1-2 months project, but actually turned out to be more than that. The features landed in Blink around 3 months after the campaign started, and shipped in the fall.

The work in WebKit was next, and srcset and sizes made it in, but as far as picture goes, there was some missing infrastructure that made it difficult to implement. I'm still working on it.

On top of that, I wrote articles, gave talks and even wrote a chapter for the upcoming (and totally awesome!) Smashing Book. But I just didn't blog, partly because my writing energy was spent on the above commitments, and partly because of dumb technical reasons - my blog was in poor shape, and I wanted to overhaul it, but didn't get to it. So I procrastinated. Writing code is easier :)

# Employment

As some of you may know, I started working at Akamai at the beginning of the year. I wish I did a blog post announcement on that at the time, but for all the ~~poor excuses~~ reasons I stated above, I didn't. So this is my attempt at making up for it.

What do I do at Akamai, you ask?

I'm a Principal Architect at the Front End Optimization team, with two distinct responsibilities. On the one hand, I make sure that our front-end optimizations are as awesome as they can be, and squeeze everything they can from customer sites' performance. On the other hand, I'm still doing the same thing I did as part of the RICG - Making sure that performance sensitive standards are promoted and implemented in browsers.

I'm now over four months in, and I couldn't be happier. I'm working with an awesome bunch of smart people, pushing some great features both inside the organization and as part of the Web platform. And yes, I'm still remote :)

# What's next

As far as my Web platform work goes, I'm currently working on three different issues:

Responsive images - push in WebKit and improve the implementation in Blink
Client Hints - Enable a server-side based responsive images solution
Resource Hints - Enable Web sites to clue the browser in on things they know are going to happen in the near future

Work is under way for all the above subjects, and I'm hoping that some of it will be able to ship soon. Stay tuned.

# So, I'm back!

Sorry again for disappearing. I'll do my best to make sure that future posts won't be 18 months from now :D

In fact, I have a few blog posts I've been itching to write for a while. So

Deprecating HTTP

2015-05-11T00:00:00Z

Mozilla have recently announced that they are planning to deprecate insecure-HTTP, which includes denying new features from sites that are served over HTTP connections. I believe that is a mistake.

I tweeted about it, but a longer form is in order, so here goes.

# Why HTTPS everywhere is important

Let me start by saying that I strongly believe that the Web should move to HTTPS, and serving content over plain-text HTTP is a mistake. (And yes, this blog is still over HTTP. Apologies. A bug to be fixed soon. Not ironic though.)

Now, why do I think HTTPS is a must?

Well, even if you don't think your content is worth securing for the sake of your users (AKA "so someone will know they browsed my nyan cat collection. Big deal"), not serving it over HTTPS opens your users to various risks. An attacker (which may be their ISP or local coffee shop wifi) can inject their own ads, poison your user's cache or just serve them with the wrong information.

On top of that, if your site includes any form of login, the user's credentials can be stolen by anyone on their network. Anyone.

So, I believe that eventual deprecation of HTTP and forcing HTTPS everywhere is a Good Thing™.

What don't I like about Mozilla's plan, then?

# "Deprecating insecure HTTP" != "HTTPS everywhere"

Mozilla are pushing for something called "opportunistic encryption" as a replacement for insecure-HTTP. The problem is that opportunistic encryption is a misleading name, but I get why it was picked. "Easily circumvented HTTPS" doesn't quite have the same ring to it.

What is this "opportunistic encryption", you ask? Well, In order for you to be certain that the encrypted TLS connection you established is with the server you think you established it with, the TLS connection keys are signed by a certificate, which is guarantied to be issued only by the entity you think you're talking to. So a signed certificate means your encrypted connection goes all the way through to the origin server.

With "opportunistic encryption" OTOH, the certificate can be self-signed, which means that there are no guaranties regarding who signed it, and the encrypted connection can be terminated by any router along the way. In other words, there's no guarantied end-to-end encryption, and intercepting, looking into and changing packets being sent to the user is trivial.

Since network operators are not afraid to perform downgrade attacks, there's no reason to believe this neutered form of TLS will stop the bad actors among them from actively intercepting the user's traffic and changing it, in order to add their own ads, super-cookies or worse.

The only promise of "opportunistic encryption" is that passive attacks would be more difficult (but not impossible).

Yet, Mozilla are pushing for that as a "cheap" replacement for HTTP, on the expense of actually secure HTTPS.

# Free certs!!!

One more reason why certificate cost is soon to be a non-issue is an extremely cool new initiative called "Let's encrypt", that would provide free and easy to install certificates to anyone.

That initiative, driven by both Mozilla and Akamai, will make the "opportunistic encryption" approach even less relevant.

(Disclaimer: I work for Akamai. I'm also not involved in the Let's Encrypt effort in any way)

# Applying pressure in the wrong place

Now going back to Mozilla's deprecation plans, the part that I dislike the most is denying new features from HTTP sites, regardless of the features' security sensitivity.

The Chrome security team have long restricted new security-sensitive (AKA "powerful") features to HTTPS only. The most famous case is Service Workers.

It has arguably hurt adoption of those features, but serving these feature over HTTP would have meant significantly compromising the users' security. The limitation to HTTPS in these cases had real, security-based reasons.

Mozilla's proposal is significantly different. Mozilla wants to limit all new features, with the hope that developers would then fall in line and implement HTTPS on their sites.

In my view this type of thinking shows a lack of understanding why developers haven't moved to HTTPS yet.

Switching a large site to HTTPS requires a lot of work and can cost a lot of money. It may require time investment from developers that management needs to approve. That same management often doesn't care about new browser features, and disabling new features on HTTP is not likely to make a huge impact on their decisions.

So, in order to justify dedicating time and efforts into moving to HTTPS, you need to convince the business people that this is the right thing to do from a business perspective (i.e. that it would cost them real-life money long-term if they won't switch). Unfortunately, keeping the users safe is not always a convincing argument.

Having free certificates is awesome for the long tail of independent developers, but for large sites, that's hardly the main issue.

From what I hear, in many cases there's also another blocker. Many sites include business-essential 3rd party widgets, often ads. If these widgets cannot be served over HTTPS, that's a big issue preventing serving the entire site over HTTPS.

Since you cannot mix HTTP content inside you HTTPS site (for good reason, the chain is as strong as its weakest link), such a site cannot move to HTTPS without suffering mixed-content blocking (or warnings, in the best case).

# What would may work

We've established that we need to convince the business folks, rather than the developers. So, how can we do that?

The first and obvious way is SEO. Google have announced last August that with all other things being equal, HTTPS sites will get higher search ranking than HTTP sites. Since SEO is language that business folks understand well, I think that this is a good first step in motivating businesses to move to HTTPS. The next step would be to increase the importance of that ranking signal, and penalize HTTP sites' ranking.

Next, Mozilla's Henri Sivonen had an interesting idea: limit cookie persistency over HTTP, rather than keeping innocent features hostage. While I'm not 100% certain this won't have side-effects on unsuspecting Web developers, and won't render some currently working legitimate use cases worthless over HTTP, that method does apply pressure in the right place.

3rd party widgets often rely on cookie persistency in order to track users across sites provide their users with a personally adapted experience. Providing that persistency only on HTTPS is a sure-fire way to get their attention, move their widgets to work on HTTPS, and perhaps even to get them pushing the content sites to adopt HTTPS (by giving them better ad rates, etc).

# In conclusion

Moving the Web to HTTPS is important in order to keep our users safe and maintain their long term trust in the Web. The way to do that is by convincing the business folks that they have to.

So, while HTTPS everywhere is a noble goal, denying new features from HTTP will only alienate developers and hamper new feature adoption without doing much to convince the people that need convincing.

Update:

In the comments, Patrick McManus clarified that Mozilla does not plan to consider opportunistic encryption as a secure context. Therefore, as part of the deprecation plan, new features will be denied from "opportunistically encrypted" sites as well as regular HTTP sites.

I guess my misunderstanding stems from the term "insecure HTTP", which I assumed means that opportunistic encryption would be considered "secure HTTP". But you know what they say about assuming. So I was wrong about that and I apologize.

I still think opportunistic encryption is a bad idea that will at best be a distraction on our way to securing the Web, but apparently it is not related to Mozilla's deprecation plans.

By the people

2015-07-10T00:00:00Z

The Web platform is a wonderful thing. Its reach is unparalleled in human history. It enables people all over the world to access vital information, education and entertainment. People old and young, rich and poor. It makes their lives better than they would have been without it. It also enables commerce, banking, and improved supply chains. The world's economy would have been very different without the Web platform.

The Web platform has over 3 billion users and that number keeps climbing. It also has millions of developers. But, would you care to take a guess about the number of people that are actually working on the platform itself? Making sure that it will continue to grow, and that it would do so in the right direction? Fixing bugs, evolving features and coming up with new ideas that will make the developers' lives easier, and the users' lives better?

Well, how many are they? If you're not a browser developer or involved with the Web standards community, you may have thought to yourself that there are several dozens of thousands of engineers that are working on it. After all, the world's economy depends on it, and browsers are built by huge corporations.

The truth is, the actual number is significantly lower than that. I'd estimate the people working on browsers and standards to be only a few hundreds of engineers, and if we count only the people working on the core platform, it's probably even less than that. That means that the people working on the platform have a lot of work on their hands, and have to laser-focus on the things that matter to them the most. That is also the answer to "Why does it take that long to get something into browsers?"

So, I'm afraid we're heading towards...

# A tragedy of the commons

Yeah, the Web is like the common grazing field of our global village, that everyone enjoys, but very few people actually care for.

How can we avoid it? How can we make sure that the people that benefit from the platform can continue to do so in the long term? How can we direct a small percentage of these huge financial gains that the Web provides back towards developing the platform?

Well, what we need is more developers involved in the Web standards and browser communities. We need them to help spec out the features that they need and think are missing, and eventually, we need them to push for these features inside of the open-source browsers, since the hard truth is no one will do it for them.

That's what I realized after I joined the RICG back in 2012. After blabbing on about responsive images for a while, I realized that if we won't make it happen, it just won't. Not because of the malevolence of browser vendors, but simply because it wasn't their main focus, and since the issue was fairly complex that meant that it's not going to get worked on unless we do.

Since I'm (also) a C++ developer and had some WebKit experience (read: I built WebKit once in 2008), I took on the job of prototyping <picture> in WebKit, in order to prove the feature's feasibility.

Later on, I started working on the features themselves, both in Blink and in WebKit, and when I saw it's going to take a long while if I'd continue to do this during my evenings, I fired my client and started doing that as my full-time job. In order to finance some of that effort, the RICG started an Indiegogo campaign. It really helped me to get through this period without having my bank manager calling me twice a week, but more than that, it helped to raise awareness to the fact that this is a thing people can do. Regular Joes, not employed by one of the browser vendors, can just start working on browsers and make things happen.

After the campaign ended, I was able to find a full-time job with an (awesome) employer that enables me to contribute to the platform. Even if Akamai is not a browser vendor, a large chunk of my job is now dedicated to working on the Web platform and on browsers, pushing forward the features that Akamai care about.

And this is what we need. We need software organizations to give their employees time to work on the Web features that matter to them and that would help drive their business. Not as charity, but as a competitive advantage.

The main hurdle is that getting started working on the platform is hard. When I started as part of the RICG, it took me a couple of months of my spare evenings to get everything set up and figure out all the bits of WebKit code that I needed to mess around with in order to prototype the old <picture> in WebKit. I guess I'm more stubborn than most, so I got through that phase, but we cannot reasonably expect everyone to go through the same hurdles.

Enter the...

# WICG

Which is why I was super excited when I was approached in order to co-chair the Web Platform Incubator Community Group, a community group dedicated to helping people get into the Web standards world. The role of the WICG is to create a framework and a community that would be there to make it as easy as possible to go from "I think the Web is missing this thing" to a serious, well-thought-out use-case document and proposal that you can take to the relevant standards body and argue for, and turn into a Web specification yourself either as part of a W3C community group, working group, or on your own GitHub repo.

Personally, I would also do my best to help anyone that's interested in getting started hashing out the feature on actual browsers. Prototyping can go a long way to prove a certain approach is viable, and actually working on a feature can go a long way into making sure that it makes it into browsers sooner rather than later.

# Cool. Now what?

Join the WICG. Spread the word. And then, we can start building the Web together, one spec and one feature at a time.

It is time to democratize the way the Web platform evolves. And I truly believe that doing that will ensure a better Web for everyone.

Adapting without assumptions

2015-09-28T00:00:00Z

There have been a lot of talk recently about the Network Info API.

Paul Kinlan published an article about using Service Worker along with the Network Info API to send network information up to the server and let the server adapt its responses to these network info bits. There is also an intent to implement the downlinkMax attribute in the Blink rendering engine.

Since I have Opinions™ on the matter, and Twitter and mailing lists aren't always the ideal medium, I wrote them down here.

This is a lengthy post, so if you don't have time to read through it all, its claims are:

Current NetInfo API is not providing useful info.
We should improve current API (proposal).
We should improve our overall capabilities to adapt content based on user conditions, beyond just network conditions.

# Current NetInfo API doesn't expose what devs need

The current API is built around the following attributes:

type - indicates the "type" of network, with rather coarse granularity. e.g. "cellular" vs. "wifi".
downlinkMax - indicates the maximum downlink speed of the underlying first-hop technology or an estimate of it. It has finer granularity, but has a certain duality to it, where the developer is not sure if they are getting a value based on a set of predefined values or a bandwidth estimate which is more likely to be related to reality.
onchange - An event handler that indicates that the network has changed, so that the app can somehow change behavior as a result.

The problem with the above is that it rarely provides Web developers with useful and actionable data without them having to make huge (and often false) assumptions about what that info means for the things they actually care about (and which are, for most cases, not the things this API exposes).

If you take a closer look at the downlinkMax table you can see that the info you get from it is dubious at best. If your user is on an Edge network, you would be led to think that their available download speed is 384 kbps. While they most probably don't have that much bandwidth at their disposal, you can use that in order to figure out that they are on a not-so-great network, and change the resources you serve them accordingly.

But, what if they are WiFi-tethering over their 2G phone? In that case, you'd be led to think that the connection type is "WiFi" and the speed is capped at 11 Mbps. Not too shabby.

Except that the user would be experiencing even worse network conditions in the latter case than in the former one, without the developer knowing anything about it.

There are many other cases where looking at downlinkMax will lead you to the wrong conclusions. For example, take the case where your users are on an extremely lossy WiFi network (AKA: "hotel/conference WiFi") where their effective bandwidth is very low. Or the case where they are on an HSDPA network which in theory can reach 14.3Mbps, but in reality, they are sharing a cell with thousands of other users, all trying to download cat-based entertainment, since they are all waiting for the bus/train/plane, which means the cell's bandwidth is thinly divided between all those users, and the cell's backhaul network (which is fetching those cats from the landline internet) is saturated as well.

In fact, the only case where downlinkMax is useful is in the "user is on an Edge network" case. For everything else, you're out of luck: bad or tethered WiFi, 3G with poor coverage, poor backhaul, etc. will all present themselves as pretty good networks. That means that we could effectively replace downlinkMax with an isUserOnEdge boolean.

Even if we look at possible downlinkMax improvements using a bandwidth estimate of some sort, according to the current spec:

That estimate would be of the first hop, which means it cannot take into account backhaul congestion, tethering and other similar scenarios.
There's no way for developers to distinguish between a first-hop bandwidth estimate, and a theoretical maximum bandwidth which will never be reached.

All of which leads me to believe that downlinkMax is not providing the information that developers actually need, and makes me worry that the info will be abused by developers (due to lack of better network information) if we would expose it.

# So, what do developers need?

The general use-case that developers are trying to tackle here is that of content adaptation to the user's condition. I'd claim that the main use-case would be to serve rich content to devices that can handle it right now, while providing decent and fast experience to devices that can't handle the rich content, due to some constraints.

Some of the specific use-cases I heard people mention are:

Download smaller/other resources when network conditions are bad.
- That is the use-case most often cited. While the "smaller resource" parts of that can be partly resolved with srcset and progressive video loading, that often means serving physically smaller resources, where what the developer actually wants is just applying harsher compression, at the expense of quality, but that would still be better than serving smaller resources and upscaling them. There can also be cases where we would want to serve different content based on network conditions. (e.g. replace video ads with static ads)
Download smaller/other resources when a low-end device can't handle the load.

Low-end devices with very little memory and processing power can't always handle the load of rendering the full Web sites with all its images, videos and scripts. In some cases developers need to detect that and send a simplified version.
See Tim Kadlec's excellent "Reaching everyone, fast" talk for more details on that use-case.

Avoid syncing/downloading large chunks of data.

Some Web apps need to sync or download a lot of data, which may be costly, battery draining or clog the device's storage, depending on the user's conditions and their device. Developers need a way to know when the user is in conditions where they are likely to get pissed at them for starting such a costly operation.

Warn users before heavy downloads

Related to the last use-case, having a standard way to let users know that a large download is about to take place and allowing them to avoid it, would enable the browser to handle that "permission" and may be used to avoid bugging the user about that in the future.

Now, if we take these use-cases into consideration, what are the constraints that we need to expose to developers that would enable them to successfully tackle these use cases?

I think the list would include:

Actual network conditions
User preference - Does the user prefer fast delivery over heavy but fancy one?
Device capabilities - Can the device handle the resources I'm sending its way, or will it crash a burn on them?
Battery - If battery is scarce, maybe the user doesn't need that fancy animation, and they just want the address to get where they want to?
Monetary cost of traffic (and if the user considers that cost expensive)

Let's dive into each one of those.

# Network conditions

The current NetInfo API talks about exposing network information, basically mimicking the Android APIs that can give an app developer the same info. So, as we've seen, this info gives the developer the rough network type and the theoretical bandwidth limits of the network the user is on.

But as a developer, I don't much care about which first-hop radio technology is used, nor what is its theoretical limit. What I want to know is "Is the end-to-end network fast enough to deliver all those rich (read: heavy) resources in time for them to provide a pleasant user experience rather than a burdensome one?"

So, we don't need to expose information about the network, as much as we need to expose the product of the overall end-to-end network conditions.

What developers need to know is the network conditions that the user is experiencing, and in most cases, what is their effective bandwidth.

While that's hard to deliver (and I once wrote why measuring bandwidth is hard), the good folks of Google Chrome net stack are working to prove that hard != impossible. So, it looks like having an in-the-browser end-to-end network estimation is no longer a pipe dream.

Now, once we've estimated the network conditions, should we expose the raw values?

I believe we shouldn't, at least not as a high-level "your bandwidth is X" single number.

The raw network information of incoming effective bandwidth and round-trip-times can be overwhelming, and the potential for misuse is too high. It's also very likely to change rapidly, causing non-deterministic code behavior if exposed through script, and huge variance if exposed through Client-Hints.

What I believe we need to expose is a set of actionable, discrete values, and browsers would "translate" the stream of raw network data into one of those values. That would also enable browsers to start with rough bandwidth estimations, and iterate on them, making sure they're more accurate over time.

As far as the values themselves, I propose something like unusable, bad, decent, good and excellent, because naming is hard.

Having discrete and imprecise values also has the advantage of enabling browsers to evolve what these values mean over time, since today's "decent" may very well be tomorrow's "bad". We already have a Web platform precedent for similar discrete values as part of the update-frequency Media Query.

As a bonus, imprecise values would significantly decrease the privacy concerns that exposing the raw bandwidth would raise.

# User preferences

We already have a proposal for this one. It's called the Save-Data header that is part of the Client-Hints specification. It might be a good idea to also expose that to JavaScript.

The main question that remains here is how do we get the user's preferences. As far as I understand, the idea in Chrome is to take advantage of a user's opt-in to their compression proxy as an indication that they are interested in data savings in general.

That's probably a good start, but we can evolve that to be so much smarter over time, depending on many other factors that the browser has about the user. (e.g. geography, data saving settings at the OS level, etc.)

# Device capabilities

The current state of the art at detecting old and busted devices and avoiding sending them resources that they would choke on (due to constrained CPU and memory) is dubbed "cutting the mustard". While a good effort to make due with what we have today, it is (again) making a bunch of potentially false assumptions.

The "cutting the mustard" method means detecting the presence of modern APIs and concluding from their absence that the device in question is old and busted. While their absence can indicate that, their presence doesn't mean that the device is full-powered high-end smartphone. There are many low-end devices out there today with shiny new FirefoxOS installations. Any Android 4 phone may have an always-up-to-date Chrome, regardless of its memory and CPU (which can be extremely low).

Bottom line is: we cannot assume the state of the user's hardware from the state of their software.

On the other hand, exposing all the different metrics that determine the device's capacity is tricky. Do we expose raw CPU cycles? Raw memory? What should happen when CPU or memory are busy with a different app?

The solution to that is not very different from the one for network conditions. We can expose a set of discrete and actionable values, that can evolve over time.

The browsers can estimate the state of current hardware and current available processing power and memory, and "translate" that into a "rank" which would give developers an idea of what they are dealing with, and allow them to adapt their sites accordingly.

Lacking better names, the values could be minimal, low, mid and high.

# Battery state

That's easy, we already have that! The Battery status API is a candidate recommendation specification, and is fully supported in Chrome/Opera and partially supported in Firefox. All that's left is to hope that support to other modern browsers would arrive soon.

# Monetary cost

That part is tricky since browsers don't actually have info regarding the data costs, and in many cases (such as tethered WiFi) our assumptions about the cost implications of network type are wrong.

I think that the only way out of this puzzle is asking the user. Browsers need to expose an interface asking the user for their preference regarding cost (e.g. enable them to mark certain WiFi networks as expensive, mark roaming as expensive, etc.).

Another option is to expose a way for developers to ask the user's permission to perform large downloads (e.g. message synchronization, video download, etc.), and the browser can remember that preference for the current network, across multiple sites.

What we definitely shouldn't do is tell developers that they should deduce cost from the network type being WiFi. Even if this is a pattern often used in the native apps world, it is blatantly wrong and is ignoring tethering as well as the fact that many cellular plans have unlimited data. (which brings back memories of me trying to sync music albums over unlimited 4G before going on a 12 hour flight, and the native app in question telling me "we'll sync as soon as you're on WiFi". Argh!)

# Why not progressive enhancement?

Why do we need to expose all that info at all? Why can't we just build our Web sites to progressively enhance, so that the content downloads progressively, and the users get the basic content before all the fancy stuff downloads, so if their network conditions are bad, they just get the basic parts.

Well, progressive enhancement is great for many things, but cannot support some cases of content adaptation without adding extra delays.

The use-case of adapting resource byte-size to network conditions cannot be fully addressed with progressive enhancement, since it gives us no control over the compression quality of the resources we're serving our users. While dimensions can be controlled through srcset and progressive video loading, they can often be crude instruments for that purpose, since upscaling smaller resolution resources would often have worse quality than a heavily compressed resource.
There are cases in which developers would want to tailor the site to the network conditions, e.g. sending a single, decent quality image instead of multiple low resolution images or to replace video ads with static ads.
Progressive enhancement can't take into account the user's monetary cost of the network or the user's preference, and will continue to download the "fancy" bits even if the user prefers they won't be downloaded.
Progressive enhancement can't "go easy" on devices that would download all the site's images, scripts and fonts only to later on choke on them, due to lack of memory and CPU. In order to properly support low-end devices as well as high-end ones without adding unnecessary delays to the common case, developers need an indication of device capabilities (ideally as a Client-Hint on the navigational request) in order to serve a simplified version of the site to devices that can't handle more than that.

# What happens when multiple paths are used?

As pointed out by Ryan Sleevi of Chrome networking fame, multi-path would break any attempts to expose either the available or theoretical network bandwidth. That is absolutely true, and yet another reason why we don't want to expose the raw bandwidth, but a discrete and abstract value instead. The browser can then expose the overall effective bandwidth it sees (aggregated from all network connections), even in a multipath world.

# How do we prevent it being the new User Agent string?

Another concern that was raised is that exposing network information would result in worse user experience (due the developer abuse of the exposed data), and would therefore result in browsers lying about the actual conditions the user is in.

In my view, the doom of the User-Agent string as an API was that it requires developers to make assumptions about what that string means regarding other things that actually matter to them (e.g. feature support).

While I agree with those concerns regarding downlinkMax and type, I believe that as long as we keep the assumptions that developers have to make to a minimum, there's no reason developers would abuse APIs and harm their user's experience while doing so. That also means that there would be no reason for browsers to eventually lie, and provide false API values.

# What about the extensible Web?

Doesn't the concept of exposing a high-level value rather than the raw data stand at odds with the Extensible Web manifesto?

I don't think it is, as long as we also strive to expose the raw data eventually. But exposing the full breadth of network info or device capabilities info is not trivial. It would most probably require an API based on the Performance Timeline and I suspect it would have some privacy gotchas, since exposing the user's detailed network, CPU and memory usage patterns smells like something that would have interesting privacy implications.

So, we should totally try to expose a low-level API, but I don't think we should hold exposing the high level info (which I suspect would satisfy most use-cases) until we have figured out how to safely do that.

# To sum it up

I strongly believe that exposing network conditions as well as other factors about the user's environment would provide a solid foundation for developers to better adapt the sites they serve to the user's conditions. We need to be careful about what we expose though, and make sure that it will not result in assumptions, abuse and lies.

Thanks to Paul Kinlan, Tim Kadlec and Jake Archibald for reviewing and commenting on an early draft of this post.

Being Pushy

2016-08-02T00:00:00Z

I've spent a few days last week in Stockholm attending the HTTP Workshop, and taken part in many fascinating discussions. One of them revolved around HTTP push, its advantages, disadvantages and the results we see from early experiments on that front.

The general attitude towards push was skeptical, due to the not-so-great results presented from early deployments, so I'd like to share my slightly-more-optimistic opinion.

# What can push do that preload can't

A recurring theme from the skeptics was "push is only saving 1 RTT in comparison to preload". That is often not true in practice, as there is one major use case that push enables and preload cannot.

# Utilizing server think-time

HTML responses are rarely static resources nowadays. They are often dynamically generated using a higher-level language (which may be slightly on the slower side) while gathering the info needed for their creation from a database. While the back-end's response time is something you can and should optimize, response times in the order of hundreds of milliseconds are not uncommon.

There's a common advice to "flush early" your HTML, and start sending the first chunks of your HTML in parallel to querying the database and constructing its dynamic parts. However, not all server-side architectures make it easy to implement early flushing.

Another factor that makes early flushing harder than it should be is the fact that at the time we need to start sending data down to the browser, we're not yet sure that the response construction will complete successfully. In case something in the response creation logic goes wrong (e.g. database error or server-side code failing to run), we need to build a way to "roll-back" the already-sent response into our application logic, and display an error message instead.

While it's certainly possible to do that (even automatically), there's no generic way to do that today as part of the protocol.

So, the common scenario is one where the Web server is waiting a few hundred milliseconds for the back-end to construct the page, and only then starts to send it down. This is the point where we hit slow start, so we can only send around 14KB in our first RTT, 28KB in the second, etc. Therefore, it takes us think-time + slow-start time in order to deliver our HTML. And during that think-time the browser has no idea what resources would be needed next so it doesn't send any requests for the critical path resources would be needed.

And even if we're trying to be smart and add preload headers for those resources, they do nothing to utilize that think-time if we don't early-flush the document's start.

Now, compare that to what we can do with H2 push. The server can use the think-time in order to push required critical resources - typically CSS and JS ones. So, by the time think-time is over, there's a good chance we already sent all the required critical resources to the browser.

For extra credit, these resources also warm up our TCP connection and increase its congestion window, making sure that on the first RTT after the think-time the HTML could be sent using a congestion window of 28KB, 56KB or even more (depending on think-time and how much we pushed during it).

Let's take a look at a concrete example: How would the loading of an 120KB HTML page with critical CSS of 24KB and critical JS of 74KB over a network with an RTT of 100ms and infinite bandwidth?

Without push, we wait 300ms for HTML generation, then 4 RTTs to send the HTML, due to slow-start, and another RTT for the requests for JS and CSS to come in and send their responses. Overall 800ms for first render.

With push, the CSS and JS are sent as soon as the request for the HTML arrives, it takes them 3 RTTs to be sent (again, due to slow start) and they bump up the CWND to ~128KB, so when the HTML is ready to be sent, it can be sent down within a single RTT. Overall time for first render: 400ms.

That's a 50% speedup to first render! Not too shabby...

# Where push is not-so-great

One of the reasons I believe people are Using It Wrong™ when it comes to push is that they're using it in scenarios where it doesn't provide that much benefit or even causing effective harm.

# Blindly pushing static resources

One of the major things you can do wrong with push is saying to yourself: "Hey, I have these static resources that all my pages need, I'll just configure it to be pushed on all pages".

The main reason this is a bad idea is caching. These resources are likely to be in the browser's cache after the user visits the first page, and you keep pushing it to no end. You could argue that it's no worse than inlining all those resources and you'd be right, but I'd argue back that inlining all those resources would also be a bad idea :)

So, if you are blindly pushing resources that way, make sure that it's only stuff you would have inlined, which is basically your critical CSS. Otherwise, you run a risk of making repeat visits significantly slower.

You may think that stream resets will save you from wasting too much bandwidth and time on pushing already-cached resources. You'd be wrong. Apparently, not all browsers check their caches and terminate push stream of cached resources. And even if they do, you're still sending the resource data for a full RTT before the stream reset reaches the server. Especially if you're doing that for multiple resources, that may end up as a lot of wasted data.

# Getting stuff into the browser's cache

You may think that push gets stuff into the browser's cache and can be used to e.g. invalidate current resources. At least at the moment, that is not the case. One of the topics of discussion in the workshop revolved around the fact that we may need to change current push behavior to support direct interaction with the browser's cache, but right now, push is simply not doing that. Pushed responses go into this special push-only cache, and they go into the HTTP cache only when there's an actual request for them.

So if you're pushing resources in hope that they'd be used in some future navigation, the browser may throw them out of the push cache way before they'd actually be needed.

At least that's the way the implementations work today.

# Filling the pipe after the HTML was sent down

Often in the page's download cycle there are gaps in the utilized bandwidth, meaning that we're not sending down the required resources as fast as we could be, usually due to late discovery of those resources by the browser.

While you should try to fill in these gaps by sending down resources that the page needs, it is often better to do that with preload rather than push. As preload takes caching, cookies and content negotiation into account, it doesn't run the risks of over sending or sending the wrong resource that push does. For filling in these gaps, there's no advantage for push, only disadvantages. So it's significantly better not to use push for that purpose, but use preload instead.

# Cache Digests

We saw that one of push's big disadvantages is that the server is not necessarily aware of the browser's cache state and therefore when pushing we run a risk of pushing something that's already in the cache.

There's a proposed standard extension that would resolve that called cache-digests. The basic idea is that the browser would send a digest to the server when the HTTP/2 connection is initialized, and the server can then estimate with high accuracy if a resource is in the browser's cache before sending it down.

It's still early days for that proposal and it may have to be somewhat simplified in order to make its implementation less expensive, but I'd argue that currently H2 push is only half a feature without it.

# To sum it up

H2 push can be used to significantly improve loading performance, and when used right can speed up the very first critical path loading, resulting in improved performance metrics all across the board.

Push is still very much new technology, and like all new tools, it may take a while before we figure out the optimal way to use it. Often that means one or two sore thumbs along the way.

So, initial results from early experiments may not be everything that we hoped for, but let's treat those results as an indication that we need to get smarter about the way we use push, rather than concluding it's not a useful feature.

Thanks to Tim Kadlec and Marcos Caceres for reviewing this post. (and special thanks to Tim for a prototype of the RTT diagrams)

TPAC 2016 report

2016-10-04T00:00:00Z

A couple of weeks ago I attended TPAC, the annual week-long W3C ~~festivities~~ meeting, and I'd like to share my notes and impressions from a few of the sessions we ran.

# Responsive Images

While most days at TPAC are split into Working-Group-specific meetings, Wednesday is traditionally filled with breakout sessions that aren't necessarily affiliated with any WG in particular. As part of that, we ran a session about responsive images and their aspect ratios.

The subject was actually triggered by an email sent by Jason Grigsby that very morning to the public-respimg mailing list pointing out that aspect-ratio info is becoming a issue in real-life deployments of responsive images. The email stated that more and more frameworks and blog posts are advocating for developers to include explicit fixed width and height attributes on their images, in order to avoid content re-layout when the image dimensions are downloaded. Such re-layouts cause the content to "jump around", and are rightfully considered bad UX on mobile.

While the original email advocated for adding aspect-ratio info into sizes, it is not really necessary, as the browser needs to know the aspect-ratio info at initial layout time and not before that. Therefore, just adding an explicit aspect-ratio to CSS might be enough to resolve the problem.

At the same time, there are current CSS methods to achieve that, as Simon Pieters pointed out during the meeting (specifying explicit dimensions per image breakpoint, or using padding percentage based hacks). Based on that, if we want to encourage authors to define aspect ratios, we may need to add a markup equivalent. In some scenarios (e.g. CMS), the people adding the images may have little control on CSS, and adding HTML controls equivalent to height/width may be helpful.

One slightly tangent, but extremely relevant point that was raised during the meeting was that MPEG recently standardized a new image format container called HEIF, which enables (in more or less the same way) many of the ideas I prototyped when I discussed a Responsive Image Container. Having that in standard form (and in a codec agnostic container) may increase the chances of such an image format getting implemented in browsers. That would enable a single image resource to serve multiple resolutions and cropped versions, with the browser downloading only the required bytes. Exciting stuff!!

Afterwards, the meeting went on to discuss the future h descriptor, and use cases for it.

We concluded that we need to gather up use cases for aspect-ratio definition on elements, on images (and the impact on their loading) as well as use cases for height definition in sizes.

Full minutes for the meeting.

# WICG

Later that day, we ran a session about the WICG and the process around it. The session followed some contention the day before, so we spent a large part of it explaining the goals of the WICG: getting more people involved in Web standards, and using a bare-minimum-red-tape process in order to facilitate that.

We talked about the need to get early feedback on standards from outside of the standards echo-chamber, which is the main motivation of passing standard proposals through an incubation phase, even if we're pretty certain they'd end up being worked on in a certain WG.

We discussed getting more people involved, and the fact that any proposal needs to be pushed and championed. No one is sitting around waiting for your proposals, and if you want to see new features and capabilities get implemented, you need to find the right people and get them interested. This is also something that the chairs can help with, by making sure that the right people are on the discourse proposal threads.

Another point raised was that in order to push a proposal through incubation you need to get positive feedback from the community, whereas WG work just requires lack of negative feedback, so once a proposal gets through incubation, you know that there's interest for it, rather than just lack of opposition. That helps to make sure that the right things are being worked on.

We talked about the meaning of "incubation" and how it can be done within the WICG or outside of it. Chris Wilson defined it as:

Ability to fail gracefully, rather than continue to work on a proposal because work started and it's in the charter
Open process which can include many people from many organizations.

Finally, we agreed we need hard metrics to make sure we measure success properly, and achieve the WICG's goals of wider participation, better feedback loop, and mature proposals by the time they graduate.

Meeting minutes.

# WebPerfWG

The 2 days of the Web Performance Working Group meetings were extremely exciting and filled with new proposals for things that can improve our capabilities to monitor and accelerate our sites. The meetings were already summarized in detail elsewhere, so I'll just give a brief overview of the different exciting proposals:

Long Task Observer is a new proposed mechanism that would enable you to know when the main thread was too busy with a long task - a long-running script, heavy layout operations, etc. That can give you a good indication of your app's responsiveness to user input, and can indicate a problem in the wild with certain user interactions that you won't necessarily see in the testing lab.
First paint metrics is a proposed API that will finally expose a standard way to know when the user started seeing content on screen. Current RUM performance metrics lack any indication of visual metrics, so this proposal is a huge step forward.
The Hero Element API and declarative User Timing marks would both give us a way to report the "First Meaningful Paint" and answer the question "when was the page at a stage where the user saw meaningful content?". One of the downsides of current User Timing API is that it only allows you to mark milestones that have a JS event attached to them. These proposals would enable us to apply the user timing concept to markup based elements, which could significantly increase its usage.
The Memory Pressure API would enable sites to know that the device is running low on memory and is likely to evict their tab soon. That can enable sites to reduce the memory usage in such dire times, in order to try and avoid being evicted. Related to that, we also discussed crash reporting for cases where the site ended up being evicted from memory, which is an event that's currently largely invisible to developers.

# To sum it up

TPAC this year was fun as always, and was filled with exciting new developments. It was great to discuss the next steps on the responsive images front, and seeing that a file format solution might not be that far off.

The WICG discussions left me optimistic. I believe we'd be able to unite the standards community around the concept of incubations and would be able to get more people involved, resulting in better Web standards.

And finally, it was exciting to see so many proposals for new performance standards that would help making the web significantly (and measurably) faster.

Till next year! :)

A Tale of Four Caches

2017-01-11T00:00:00Z

This is a republication of my Perf Calendar post, because I really like it and wanted it to be on my blog. Own your content and all that...

There's a lot of talk these days about browser caches in relation to preload, HTTP/2 push and Service workers, but also a lot of confusion.

So, I'd like to tell you a story about one request's journey to fulfill its destiny and find a matching resource.

# Questy's Journey

Questy was a request. It was created inside the rendering engine (also called "renderer" to keep things shorter), with one burning desire: to find a resource that would make its existence complete and to live together happily ever after, at least until the current document is detached when the tab is closed.

Questy, dreaming of its resource

So Questy started its journey in its pursuit for happiness. But where would it find a resource that would be just the right one for it?

The closest place to look for one was at the...

# Memory Cache

The Memory Cache had a large container full of resources. It contained all the resources that the renderer fetched as part of the current document and kept during the document's lifetime. That means that if the resource Questy is looking for was already fetched elsewhere in the current document, that resource will be found in the Memory Cache.

But a name like "the short term memory cache" might have been more appropriate: the memory cache keeps resources around only until the end of their navigation, and in some cases, even less then that.

The short term memory cache and its container

There are many potential reasons why the resource Questy is looking for was already fetched.

The preloader is probably the biggest one. If Questy was created as a result of a DOM node creation by the HTML parser, there's a good chance that the resource it needs was already fetched earlier on, during the HTML tokenization phase by the preloader.

Explicit preload directives (<link rel=preload>) is another big case where the preloaded resources are stored in the Memory Cache.

Otherwise, it's also possible that a previous DOM node or CSS rule triggered a fetch for the same resource. For example, a page can contain multiple <img> elements all with the same src attribute, which fetch only a single resource. The mechanism enabling those multiple elements to fetch only a single resource is the Memory Cache.

But, the Memory Cache would not give requests a matching resource that easily. Obviously, in order for a request and a resource to match, they must have matching URLs. But, that's not sufficient. They must also have a matching resource type (so a resource fetched as a script cannot match a request for an image), CORS mode and a few other characteristics.

One thing that Memory Cache doesn't care about is HTTP semantics. If the resource stored in it has max-age=0 or no-cache Cache-Control headers, that's not something that Memory Cache cares about. Since it's allowing the reuse of the resource in the current navigation, HTTP semantics are not that important here.

The only exception to that is no-store directives which the memory cache does respect in certain situations (for example, when the resource is reused by a separate node).

So, Questy went ahead and asked the Memory Cache for a matching resource. Alas, one was not to be found.

Questy did not give up. It got past the Resource Timing and DevTools network registration point, where it registered as a request looking for a resource (which meant it will now show up in DevTools as well as in resource timing, assuming it will find its resource eventually).

After that administrative part was done, it relentlessly continued towards the...

# Service Worker Cache

Unlike the Memory Cache, the Service Worker doesn't follow any conventional rules. It is, in a way, unpredictable, only abiding to what their master, the Web developer, tells them.

A hard-working service worker

First of all, it only exists if a Service Worker was installed by the page. And since its logic is defined by the Web developer using JavaScript, rather than built into the browser, Questy had no idea if it would find a resource for it, and even if it would, would that resource be everything it dreamed of? Would it be a matching resource, stored in its cache? Or just a crafted response, created by the twisted logic of the Service Worker's master?

No one can tell. Since Service Workers are given their own logic, matching requests and potential resources, wrapped in a Response object, can be done any way they see fit.

Service Worker has a cache API, which enables it to keep resources around. One major difference between it and the Memory Cache is that it is persistent. Resources stored in that cache are kept around, even if the tab closes or the browser restarted. One case where they get evicted from the cache if the developer explicitly evicts them (using cache.delete(resource)). Another case happens if the browser runs out of storage space, and in that case, the entire Service Worker cache gets nuked, along with all other origin storage, such as indexedDB, localStorage, etc. That way, the Service Worker can know that the resources in that cache are in sync among themselves and with other origin storage.

The Service Worker is responsible for a certain scope, which at most, is limited to a single host. Service Workers can therefore only serve responses to requests requested from a document inside that scope.

Questy went up to the Service Worker and asked it if it has a resource for it. But the Service Worker had never seen that resource coming from that scope before and therefore had no corresponding resource to give Questy. So Service Worker sent Questy to carry on (using a fetch() call), and continue searching for a resource in the treacherous lands of the network stack.

And once in the network stack, the best place to look for a resource was the...

# HTTP Cache

The HTTP cache (also sometimes called "Disk cache" among its friends) is quite different from the caches Questy seen before it.

On the one hand, it is persistent, allowing resources to be reused between sessions and even across sites. If a resource was cached by one site, there's no problem for the HTTP cache to allow its reuse by other sites.

At the same time, the HTTP cache abides to HTTP semantics (the name kinda gives that part away). It will happily serve resources that it considers "fresh" (based on caching lifetime, indicated by their response's caching headers), revalidate resources that need revalidation, and refuse to store resources that it shouldn't store.

An overly strict HTTP cache

Since it's a persistent cache, it also needs to evict resources, but unlike the Service Worker cache, resources can be evicted one by one, whenever the cache feels like it needs the space to store more important or more popular resources.

The HTTP cache has a memory based component, where resource matching is being done for requests coming in. But if it actually finds a matching resource, it needs to fetch the resource contents from disk, which can be an expensive operation.

The HTTP cache seemed rather strict, but Questy built up the courage to ask it if it has a matching resource for it. The response was negative :/

It will have to continue on towards the network. The journey over the network is scary and unpredictable, but Questy knew that it must find its resource no matter what. So it carried on. It found a corresponding HTTP/2 session, and was well on its way to be sent over the network, when suddenly it saw the...

# Push "Cache"

The Push cache (better described as the "unclaimed push streams container", but that's less catchy as names go) is where HTTP/2 push resources are stored. They are stored as part of an HTTP/2 session, which has several implications.

The unclaimed push stream container AKA the push cache

The container is in no-way persistent. If the session is terminated, all the resources which weren't claimed (i.e. were not matched with a request for them) are gone. If a resource is fetched using a different HTTP/2 session, it won't get matched. On top of that, resources are kept around in the push cache container only for a limited amount of time. (~5 minutes in Chromium-based browsers)

The push cache matches a request to a resource according to its URL, as well as its various request headers, but it does not apply strict HTTP semantics.

Questy had little faith, but still it asked the push cache if it has a matching resource. And to its surprise, it did!! Questy adopted the resource (which meant it removed the HTTP/2 stream from the unclaimed container) and was happy as a clam. Now it can start making its way back to the renderer with its resource.

On their way back, they went across the HTTP cache, which stopped them along the way to take a copy of the resource and store it in case future requests would need it.

Once they made it out of the net stack and back in Service Worker land, the Service Worker also stored a copy of the resource in its cache, before sending both back to the renderer.

And finally, once they got back to the renderer, Memory Cache kept a reference of the resource (rather than a copy), that it can use to assign the same resource to future requests in that same navigation session that may need it.

And they lived happily ever after, until the document got detached and both got to meet the Garbage Collector.

But that's a story for another day.

# Takeaways

So, what can we learn from Questy's journey?

Different requests can get matched by resources in different caches of the browser.
The cache from which the request got matched can have an impact on the way this request is represented in DevTools and Resource Timing.
Pushed resources are not stored persistently unless their stream got adopted by a request.
Non-cacheable preloaded resources won't be around for the next navigation. That's one of the major differences between preload and prefetch.
There are many underspecified areas here where observable behavior may differ between browser implementations. We need to fix that.

All in all, if you're using preload, H2 push, Service Worker or other advanced techniques when trying to speed up your site, you may notice a few cases where the internal cache implementation is showing. Being aware of these internal caches and how they operate might help you to better understand what is going on and hopefully help to avoid unnecessary frustrations.

Thanks to Tim Kadlec and Jake Archibald for reviewing an early version of this article. And huge thanks to Yaara Weiss for this article's illustrations and for being an awesome kid in general.

Google

2018-09-07T00:00:00Z

One of the biggest advantages of working on the web has always been the people. Ever since I started contributing to the web platform, back in 2013, working with browser engineers was always a great experience. An amazing collection of super smart, thoughtful and kind folks.

That’s what kept me invested in the Chromium project throughout the years since then as an external contributor. First when I was working for myself, as a nighttime hobby, then as part of the Responsive Images Community Group, and finally working on browsers and standards as part of my job at Akamai.

Therefore, I’m beyond thrilled to announce that I’m joining Google as part of Chrome’s Developer Relations team, to work on browsers and standards as my full time job! I’ll be focusing on Web performance and the web ecosystem, so making the web faster would be my literal job description!! \o/

My new role will enable me to focus 100% of my time on standards, as well as make sure that all parts of the ecosystem are part of the solution rather than part of the problem. So while my main focus will be standards and browser features, I’ll happily help fight slowness wherever it makes most sense to do so, be it frameworks, CMS, build systems, CDNs, or elsewhere.

I’m extremely thankful to Akamai for enabling me to work on standards during my wonderful (almost) 4 years there. I had the privilege of working there with extremely talented people, and feel that together we harnessed the power of standards and browser work to make our users’ lives better. I’m grateful for that, confident that I’m leaving that work in great hands, and looking forward to continuing to collaborate with the Akamai team as part of my new role at Google.

Today is my last day at Akamai. Next week, I’ll be traveling to SmashingConf. After that, I will take a couple of weeks off to unwind, before starting at Google on October 1st, and to be perfectly honest, I can’t wait!! I’m super excited about working side-by-side with this new-yet-familiar team, in order to solve this web performance thing once and for all!!

WebPerfWG F2F summary - June 2019

2019-07-30T00:00:00Z

Last month the Web Performance WG had a face to face meeting (or F2F, for short), and I wanted to write down something about it ever since. Now I’m stuck on an airplane, so here goes! :)

The F2F was extremely fun and productive. While most of the ~~usual suspects~~ group’s regular attendees were there, we also had a lot of industry folks that recently joined, which made the event extremely valuable. Nothing like browser and industry folks interaction to get a better understanding of each other’s worlds and constraints.

If you’re curious, detailed minutes as well as the video from the meeting are available. But for the sake of those of you who don’t have over 6.5 hours to kill, here’s a summary!

# Highlights

~31 attendees (physical and remote) from all major browser vendors, but also from analytics providers (SpeedCurve and Akamai), content platforms (Salesforce, Shopify), large web properties (Facebook, Wikipedia, Microsoft Excel), as well as ad networks (Google Ads and Microsoft News).
We spent the morning with a series of 7 presentations from industry folks, getting an overview of how they are using the various APIs that the group has developed, what works well, and most importantly, where the gaps are.
Then we spent the afternoon diving into those gaps, and worked together to better define the problem space and how we’d go about to tackle it. As part of that we talked about
- Memory reporting APIs
- Scheduling APIs
- CPU reporting
- Single page apps metrics

# Industry sessions

We heard presentations from Microsoft Excel, Akamai, Wikimedia, Salesforce, Shopify, Microsoft News and Google Ads. My main takeaways from these sessions were:

# High-level themes

Customers of analytics vendors are not performance experts, so automatic solutions that require no extra work from them are likely to get significantly more adoption.
Measuring performance entries using PerformanceObserver currently requires developers to run their scripts early, or deal with missing entries. Buffering performance entries by default will help avoid that anti-pattern.
When debugging performance issues only seen in the wild based on RUM data, better attribution can help significantly. We need to improve the current state on that front.
Image compression and asset management is hard.
- Client Hints will help
- Image related Feature Policy instructions can also provide early development-time warnings for oversized images.
Origin Trials are great for web properties, but CMSes and analytics vendors cannot really opt into them on behalf of their customers. Those systems don’t always control the site’s headers, and beyond that, managing those trials adds a lot of complexity.

# Missing APIs

# Single page apps measurement

Single Page apps don’t currently have any standard methods to measure their performance. It became clear that the group needs to build standards methods to address that. See related discussion.

# Runtime performance measurement

Frame timing
- Sites want to keep track of long frames that reflect performance issues in the wild. Currently there’s no performance-friendly to do that, so they end up (ab)using rAF for that purpose, draining users batteries.
More metrics regarding the main thread and what is keeping it busy

# Scheduling

Rendering signal to tell the browser that it should stop focusing on processing DOM nodes and render what it has.
Isolation of same-origin iframes, to prevent main-thread interference between them and the main content
Lazy loading JS components can often mean that other page components will hog the main thread. Better scheduling APIs and isInputPending/isPaintPending APIs can help with that.
There are inherent trade-offs between the user’s need to see the content, the advertiser’s needs to capture user’s attention and the publisher’s needs to make money. Having some explicit signals from the publisher on the preferred trade-off can enable the browser to better align with the publisher’s needs. (but can also result in worse user experience)

# Backend performance measurement

Different CMSes suffer from similar issues around app and theme profiling. Would be good to define a common convention for that. Server-Timing can help.

# Device information

CPU reporting as well as “browser health”, which came back as a recurrent theme, so we decided to dedicate an afternoon session to hashing out that problem. See discussion below.

# Memory reporting

Memory leak detection would be really helpful. We discussed this one later as well.

# Gaps in current APIs

Long Tasks’ lack of attribution is a major reason why people haven’t adopted the API
Resource Timing needs some long-awaited improvements: better initiator, non-200 responses, insight into in-flight requests, “fetched from cache” indication, and resource processing time.
Making metrics from same-origin iframes available to the parent would simplify analytics libraries.
Current API implementations suffer from bugs which make it hard to tell signal from noise. Implementations need to do better.

# Deep Dives

# JS Memory API

Ulan Degenbaev from Google’s Chrome team presented his work on a Javascript heap memory accounting API. The main use-case he’s trying to tackle is one of regression detection - getting memory accounting for their apps will enable developers to see when things changed significantly, and catch memory regressions quickly.

This is not the first proposal on that front - but previous proposals to tackle memory reporting have ran into trouble when it comes to security. This proposal internalized that lesson and comes with significant cross-origin protections from the start.

After his presentation, a discussion ensued about the security characteristics of the proposal and reporting of various scenarios (e.g. where should objects created in one iframe and moved to another be reported?)

Afterwards the discussion drifted towards reporting of live memory vs. GCed memory, and trying to figure out if there’s a way to report GCed memory (which is a much cleaner signal for memory regression detection), without actually exposing GC times, which can pose a security risk.

Then we discussed the Memory Pressure API - another older proposal that was never implemented, but that now sees renewed interest from multiple browser vendors.

# Scheduling APIs

Scott Haseley from the Chrome team talked about his work on main thread scheduling APIs, in order to help developers break-up long tasks. Many frameworks have their own scheduler, but without an underlying platform primitive, they cannot coordinate tasks between themselves and between them and the browser. Notifying the browser of the various tasks and their priority will help bridge that gap. The presentation was followed by a discussion on cases where this would be useful (e.g. rendering/animating while waiting for user input, cross-framework collaboration), and whether priority inversion should necessarily be resolved by the initial design.

# CPU reporting

Many folks in the room wanted a way to tell two things: how much theoretical processing power does the user device have and how much of that power is currently available to me?

The use cases for this vary from debugging user complaints, normalizing performance results, blocking some 3rd party content on low powered devices, or serving content which requires less CPU (e.g. replace video ads with still-image ones).

Currently these use-cases are somewhat tackled by User-Agent string based profiling, but that’s inaccurate and cumbersome.

There are also the obvious tension in such an API as it’s likely to expose a lot of fingerprintable entropy, so whatever we come up with needs to expose the least number of bits possible.

Finally, we managed to form a task force to gather up all the use cases so that we can outline what a solution may look like.

# Single Page App metrics

Many of our metrics are focused around page load: navigation timing, paint timing, first-input timing, largest-contentful-paint and maybe others. There’s no way for developers or frameworks to declare a “soft navigation”, which will reset the clock on those metrics, notify the browser when a new navigation starts, enable by-default buffering of entries related to this soft navigation and also potentially help terminate any in-flight requests that are related to the previous soft navigation.

Analytics providers use a bunch of heuristics to detect soft-navigations, but it’s tricky, fragile and inaccurate. An explicit signal would’ve been significantly better.

During that session we discussed current heuristic methods (e.g. watching pushState usage) and whether they can be a good fit for in-browser heuristics, or if an explicit signal is likely to be more successful.

We also had a side discussion about “component navigations” in apps where each one of the components can have a separate lifecycle.

Finally, we agreed that a dedicated task force should gather up the use-cases that will enable us to discuss a solution in further depth.

# WG Process

Finally, we discussed the WG’s process and few points came up:

Would be great to have an IM solution to augment the team’s calls.
Transcripts are useful, but scribing is hard. We should try to improve that process.
Onboarding guide can be useful to help new folks get up to speed
Separating issue discussion from design calls helps folks who are not involved in the old bugs’ details. We could also create topic-specific task forces that can have their own dedicated calls.

# Feedback

Personally, I was looking forward to the F2F, and it didn’t disappoint. It felt like a true gathering of the performance community, where browser folks were outnumbered by folks that lead the web performance work in their companies, and work with the group’s APIs on a daily basis.

I was also happy to see that I’m not alone in feeling that the day was extremely productive. One browser vendor representative, told me that the event was full of “great presentations with so much actionable info. I feel like every WG should run such a session”, which was a huge compliment.

Another person, from the industry side of things, for which this was the first WG meeting they attended, said it was “a great conference” and that they “learned a ton”.

And looking at the post-F2F survey results showed a 4.78 (out of 5) score to the question “was the day useful?”, and that 78.9% of attendees will definitely attend again, while 21.1% will try to make it.

The only downside is that it seems remote attendance wasn’t as smooth as it should’ve been, scoring 4/5 on average. Folks also wished there was a social event following the work day, which we should totally plan for next time.

And yes, given the quality feedback the day had, I think we’ll definitely aim to repeat that day next year.

# Hackathon

The F2F meeting was followed by a full-day hackathon where we managed to close a bunch of WG members in a room and hash out various long standing issues. As a result we had a flurry of activity on the WG’s Github repos:

Navigation Timing
- Made progress on a significant issue around cross-origin reporting of navigation timing, which resulted in a PR that since landed.
- Otherwise, landed 2 more cleanup PRs, and made some progress on tests.
Performance Timeline
- Moved supportedEntryTypes to use a registry, cleaned up specifications that relied on the previous definition and backported those changes to L2.
requestIdleCallback
- Had a fruitful discussion on the one remaining thorny issue that’s blocking the spec from shipping, resulting in a clear path towards resolving it.
Page Visibility
- 3 PRs that removed the “prerender” value, better defined the behavior when the window is obscured, and improved the cross-references. Once those land, we’d be very close to shipping the spec.
LongTasks saw 6 closed issues, 3 newly opened ones, 24 comments on 11 more issues, and 4 new cleanup PRs.
Paint Timing had gotten a couple of cleanup PRs.

# Summary

For me, the main goal of this F2F meeting was to make sure the real-life use cases for performance APIs are clear to everyone working on them, to prevent us going full speed ahead in the wrong direction. I think that goal was achieved.

Beyond that, we made huge progress on subjects that the WG has been noodling on for the last few years: Single-Page-App reporting, CPU reporting and FrameTiming. We’ve discussed them, have a clear path forward, and assigned task forces to further investigate those areas and come back with clear use-case documents. I hope we’d be able to dedicate some WG time to discussing potential designs to resolve those use-cases at TPAC.

Finally, the fact that we were able to see so many new faces in the group was truly encouraging. There’s nothing like having the API’s “customers” in the room to make sure we stay on track and develop solutions that will solve real-life problems, and significantly improve user experience on the web.

Thanks to Addy Osmani, Kris Baxter, and Ilya Grigorik for reviewing!

So, you don't like a web platform proposal

2023-07-20T00:00:00Z

Has this ever happened to you?

You wake up one morning, scrolling the feeds while sipping your coffee, when all of the sudden you land on a post related to a web platform proposal that you really don't like. Worse, one that you believe would have significant negative consequences on the web if shipped?

At that point, you may feel that your insights and experience can be valuable to help steer the platform from making what you're sure is a huge mistake. That's great!! Getting involved in web platform discussions is essential to ensure it's built for and by everyone.

At the same time, there are some pitfalls you may want to avoid when engaging in those discussions.

Given that the above has certainly happened to me, here are some lessons I learned in my years working on the web platform, both before and after I was employed by a browser vendor.

# Things to bear in mind

# Don't assume consensus nor finished state

Often a proposal is just that - someone trying to solve a problem by proposing technical means to address it. Having a proposal sent out to public forums doesn't necessarily imply that the sender's employer is determined on pushing that proposal as is.

It also doesn't mean that the proposal is "done" and the proposal authors won't appreciate constructive suggestions for improvement. Different proposals may be in different stages of their development, and early stage proposals are often extremely malleable.

All that means is that with the right kind of feedback at the right time you can raise concerns early, and significantly increase the chance they would be properly addressed and mitigated.

# Don't assume a hidden agenda

When thinking about a new proposal, it's often safe to assume that Occam's razor is applicable and the reason it is being proposed is that the team proposing it is trying to tackle the use cases the proposal handles. While the full set of organizational motivations behind supporting certain use cases may not always public (e.g. a new device type not yet announced, legal obligations, etc), the use cases themselves should give a clear enough picture of what is being solved.

# Avoid legal language

The fastest way to get someone working for a large corporation to disengage from a discussion is by using legal or quasi-legal language. Such language will prevent them from replying to your claims without talking to their corporate legal counsel, which will probably mean they will not reply to your claims. If you want to have a productive exchange with the folks making the proposal, it's best to not pretend you're a lawyer. (and if you are one, may be best to pretend you're not)

# We're all humans

Every one working on the web platform is a human being, with human feelings, who's trying to do their job. Even if you disagree with their choice of employment, their technical decisions or their conclusions, that doesn't change that fact.

To be more concrete and clear, personal attacks or threats addressed at the folks working on the platform are not OK. That's not how you get your voice heard, that's how you get yourself banned!

# What should I do then?

# Be the signal, not the noise

In cases where controversial browser proposals (or lack of adoption for features folks want, which is a related, but different, subject), it's not uncommon to see issues with dozens or even hundreds of comments from presumably well-intentioned folks, trying to influence the team working on the feature to change their minds.

In the many years I've been working on the web platform, I've yet to see this work. Not even once.

On the receiving end, this creates a deluge of emails that's very hard to sort out. While some of those may be full of technical insights, it's very hard to find them in that pile and distinguish them from the other forms of commentary. So while it may feel good to join a good old-fashioned internet pile-up, it's very unlikely to lead to the outcomes you actually want.

You should instead try to provide meaningful technical feedback (more on that in the next section), and do that in places where that signal is less likely to drown in the noise.

# Provide technical arguments

There are a few things you want to focus on when debating technical proposals.

# Use cases

The use cases the proposal tackles are typically the core of the problem the team pushing the proposal is trying to solve. Everything else flows from that. Focusing on use cases would enable you to distill the essence of the proposal, and potentially propose alternatives that still address them without the bits you find harmful or risky.

In some cases, you may consider the use cases themselves to be ones you think shouldn't be supported on the web. If that's the case, if I'm being honest, you're up for an uphill battle. But you can still make your case by building a solid argument as to why these use cases shouldn't be supported on the web, while considering the different trade-offs that support for them or lack-thereof would entail. At the very least, that would help you establish a common language with the feature's proponents and have a frank discussion regarding the trade-offs.

In other cases, adjacent use cases you may care about are not covered by the proposal. Raising issues on that front can help expand the proposal to cover those use cases or at the very least ensure that it can be expanded in the future.

# Risks

If the proposal contains risks in terms of compatibility, interoperability, or any other risks to the open and safe nature of the web platform, that's something worthwhile pointing out.

Any such risks need to be addressed by the proposal and properly mitigated before that feature is shipped. That doesn't mean that any claim for risks would be taken at face value, but if your arguments about the risk are sound, you can expect the proposal owners to respond to them.

# Considered alternatives

Another area to focus on is what an alternative proposal that addresses the use cases may look like. In many cases, such alternatives are already outlined in the proposal's explainer, with their trade-offs spelled out. But it's also possible that some reasonable alternative was not considered, and could be an improvement on the current proposal. If such an alternative comes to mind, that could be good feedback to the team working on the feature, so that they can consider it and potentially change course.

# Use professional language and be kind

This should come without saying, but.. people are less likely to understand and internalize your constructive feedback when it's littered with distracting and unprofessional language.

Beyond that, you should remember that on the other end of the keyboard there are humans that are trying to do their job to the best that they can. They are most likely stressed out about engaging publicly regarding their project and how it'd be received. Even if you disagree with them or even the premise of their work, providing your feedback with kindness and empathy has literally no downsides. You can deliver the exact same message without the sarcasm.

# So, get (constructively) involved!

Obviously, the above doesn't guarantee that the next point of feedback you provide on a proposal would be accepted and integrated. But at the same time, I think these guidelines can increase your chances of being heard and impacting the outcome of the discussions you're involved in. And after all, that's the point of getting involved, right?

_{Thanks to Johann Hofmann for reviewing an early draft of this post!}

Task Attribution Semantics

2023-11-21T00:00:00Z

I’ve been thinking a lot lately about task attribution semantics and learning a ton about that subject, so I thought I’d document the ways in which I’ve come to think about it.

This may not be a subject everyone would find fascinating, so it’s mostly aimed at other like-minded folks who’ve been thinking about this space a lot, or that one person that thought about this a lot and then got distracted and forgot everything. (AKA - future me)

# What’s Task attribution again?

When I’m talking about task attribution, I’m talking about the browser’s ability to track why it’s running the current task, and its ability to attribute that task to some party on the document, or to past tasks.

This is an important capability because it enables us to create causality related heuristics and algorithms. If action X triggered action Y, then do Z. That’s an extremely powerful primitive that the web has been missing for a long while.

We now have that in Chromium (and use it for Soft Navigation heuristics and ResourceTiming initiators, both still experimental). I’ve put together a spec for it, and am hoping that other engines follow.

I wrote more about it a while ago, but I should really update that with a proper post. Maybe soon!

# Different semantics of task attribution

What am I talking about when I’m saying “semantics”? Essentially, there are multiple different ways in which we can attribute a certain task to other tasks or to the party that initiated it.

Those different ways are not always obvious, because for the straightforward case, they all kinda give you the same answer. As a result, it took me a while to wrap my head around their differences, and I’m hoping to save someone some time by outlining them.

So, what different types of task attribution we may have?

# Provenance

That's a fancy word for saying "where is this thing coming from?". That can be done by inspecting the JS stack and picking the top frame. It can tell us which JS file loaded the current function, and in many cases, that can be enough.

# Registration

A slightly more complex example is one where we attribute a task to the party or task that registered the code. (See below for a caveat on that)

Basically:

// This is Task A
document.addEventListener(“load”, () => {
  // When task B runs, it knows it was registered by Task A
});

You can already sorta kinda achieve that today by wrapping addEventListener and other functions that queue callbacks, and e.g. annotate the callbacks that are passed to it with information about the task that registered them. Angular’s zone.js does something very similar.

# Caller

The slightly more complex semantic you may want is caller attribution - you want to attribute the task to the party or task that called it.

How is that different?

Let’s say you have a custom element called <my-element>.

Your main script defines it, and registers its connectedCallback().

Then some other script, running as a result of an unrelated task, creates a <my-element> and connects it.

Which task would you say caused the connectedCallback() to run? The main script that registered it, or the other script task that caused it to be called?

While registration semantics would say the former, caller semantics require the latter.

So, with this semantic, in those scenarios, you want to be able to know which task initiated the call, not which task registered it.

The tricky part about caller semantics is that they need to be applied on every callback type separately. Different callbacks get called when different things happen in the browser, and in order to maintain caller semantics we need to know what those things are for each API.

Caller semantics get even more complicated when we consider promises, and how we want to attribute related continuations.

# Wait, what’s a continuation?

Let’s say we have the following code:

(async () => {
  DoSomeThings();
  await new Promise(r => AsyncWork(r));
  DoSomeMoreThings();
})();

DoSomeMoreThings() is a continuation of the async function that awaited on the AsyncWork() Promise. The same applies if we were to write the code without any await syntax:

(() => { 
  DoSomeThings();
  (new Promise(r =>AsyncWork(r))).then(DoSomeMoreThings());
})();

So, what task do we want to attribute DoSomeMoreThings() to? The one that ran this code initially (and then it’d be in the same conceptual task as DoSomeThings())? Or the one that resolved AsyncWork()?

We have two options here:

# Continuation resolver

With this option, we’re going to attribute that task to whatever task resolved AsyncWork(). That’s the approach that e.g. LongAnimationFrames is taking.

That’s a legitimate approach, that enables you to know why a certain task is running now, rather than earlier or later.

# Continuation registration

A different approach would be to treat the awaited Promise as a distraction, and attribute the task to the original task that awaited on that Promise. In that case you would consider DoSomeMoreThings() the same task as the one that called DoSomeThings(), with the task that resolved AsyncWork() being a completely different one.

# But.. but why?

I know the above can be a bit confusing and a lot to take in in one sitting. It took me many months of working on this to fully realize all the complexity and the different cases that apply to each kind of semantic.

But we need all these different kinds of semantics when we want task attribution to answer different questions for us. Each of them answers a subtly different question:

# Who's code is it?

That's a straight-forward question with a straight-forward answer. Provenance semantics (AKA - looking at the top of the stack) can help us answer that relatively easily, by pointing at the JS file with which the current function was downloaded. While this can be trickier at times (e.g. one script evaling a string that arrived from somewhere else), the common case is relatively simple.

# Who wanted this code to run?

That’s a question that’s answered by registration semantics. It’s very similar to observing the JS stack in devtools or in errors, and trying to find the “guilty” party up the stack, only that the “stack” now expands to the task that registered that callback, and the tasks that initiated it.

As a prime example of that, consider a library that runs a DoLotsOfWork() function whenever the page calls its async scheduleWork() function. When we’re trying to attribute the work that DoLotsOfWork() is doing, we need to go beyond the superficial, and understand who is calling the code that schedules it? That can be helpful in order to tell them to quit it.

In many cases, that kind of task attribution is enough. And it is the simplest.

# Who is running this code?

That’s a question that is subtly different from the above one.

A few examples of that difference:

With web components lifecycle callbacks, you can have the page register these callbacks at load time, but have them be triggered by different code (e.g. click handlers) that create and attach these custom elements to the DOM.

Similarly, in coding patterns such as React’s useEffect, the setup callback would run whenever a component is added to the page or gets re-rendered, which can again happen by asynchronous code.

Another example could be e.g. User Timing observers that are registered by one party (e.g. your RUM provider), but triggered by another (e.g. your code that wants to time certain milestones).

In these examples, we want to be able to say who is running that code. Who attached a new component to the DOM? Who created a user timing measurement?

This question is answered by caller semantics, and specifically their continuation registration variant. That enables us to know which task is running the code, even if along the way we awaited data or something else to happen.

This is the semantics that Chromium’s TaskAttribution is implementing and that enables us to e.g. attribute DOM changes to a click event handler in code that looks something like:

let prefetch_promise;
link.addEventListener(“hover”, () => {
  prefetch_promise = new Promise(r => {
    fetch(next_route).then(r);
  });  
};

link.addEventListener(“click”, async () => {
  TurnOnSpinner();
  await prefetch_promise;
  AddElementsToDOM(response);
});

Due to continuation registration caller semantics, we can attribute the DOM changes in AddElementsToDOM to the click event handler task, and not to the prefetch one.

# Why is it running now?

But, we could be interested in yet another question, to which the answer is continuation resolver semantics. That answer is slightly simpler than the continuation registration one, as it doesn’t require us to maintain state on continuations created by the JS engine, and we can keep all the state on the web platform side of things.

This could’ve been the right answer to e.g. a use case like Soft Navigation Heuristics if a pattern like the following would’ve been common:

let click_promise = new Promise (r => {
  link.addEventListener(“click”, r);
});
link.addEventListener(“hover”, () => {
  prefetch_promise = new Promise(async r => {
    const response = await fetch(next_route);
    await click_promise;
    AddElementsToDOM(response);
  });  
};

Here, if we wanted to attribute the DOM element addition to the click event, we’d need the “why is it running now?” answer, so that would require continuation resolver semantics.

Because this is not a common pattern we’ve seen in the wild, that’s not what we ended up doing with Chromium’s TaskAttribution, at least for now.

Continuation resolver semantics can get slightly more complex in cases where we’re waiting on multiple promises.

Let’s say you wanted to do the following (which no one should never ever do, for multiple reasons):

const carousel = new Promise(r => loadCarousel(r));
const menu = new Promise(r => renderMenu(r));
const content = new Promise(r => renderContent(r));

Promise.all([carousel, menu, content]).then(() => {
  loadSubscribeToNewsletterPopup();
};

In that case, continuation resolver caller semantics enable us to assign the responsibility for the timing in which the newsletter popup appeared to the last of the promises that were resolved.

# Is "task" the right abstraction?

In registration semantics we talked about callbacks having their registration task as their parent task. Reality is actually slightly more complicated than that, as we can have multiple tasks register callbacks that all run as part of the same event loop task.

In Chromium’s Task Attribution implementation, each one of these callbacks would have its own task ID.

In that sense, one can say that tasks in the context of attribution are somewhat decoupled from HTML’s tasks, and a rename may be in order. (E.g. to Context)

Other implementation-motivated changes like trimming of long task-chains also indicate that it might be worthwhile to compress Tasks into Contexts.

# In summary

Tasks on the web are complex, and attributing those tasks is no different.

There are many different ways to think about tasks and their attribution: who defined them? Who scheduled them to run? Who runs them? And why are they running now?

The concepts I outline above are aimed to be web specification concepts. I hope they can be used by different high level features (e.g. Soft Navigation Heuristics) to Do The Right Thing™.

I similarly hope that these are not concepts web developers would ever have to think about. Browsers should make sure that high-level APIs use the right semantics for them, so that developers don’t have to care.

_{Thanks to Michal Mocny, Annie Sullivan and Scott Haseley for their reviews and insights on an early draft of this post!}

Why Chromium cares about standards

2023-12-07T00:00:00Z

_{I wrote a Google-internal version of this post on the train back from W3C TPAC in Seville, but thought this could be useful to the broader Chromium community. At the same time, this is my own personal opinion. I speak for no one other than myself.}

_{Here goes!}

TPAC was an amazing week, full of great folks from a large variety of companies, all working together to build the open web. But at least a few folks, mostly coming from organizations that haven’t been traditionally contributing to the web, seemed to not fully understand why standards are important.

Such folks working on Chromium, while going through the Blink process, run a risk of getting discouraged. Process without understanding of its purpose sure seems a lot like pointless bureaucracy.

So this is my attempt to right that wrong and explain...

# Why?

Chromium is an implementation of the open web platform. That's a fundamental fact that we should keep in mind when working on browsers that rely on it.

Given that fundamental fact, the goal of APIs we’re shipping in Chromium is to move the entire web platform forward, not just Chromium browsers. We want to drive these features into the web’s baseline and need to bring other stakeholders with us in order to make that happen.

On top of that, we have to be careful in the APIs we expose to the open web, as there's a good chance that these APIs would be used by some pages, more or less forever.

Removals on the web can be extremely hard and costly, and tbh, they are not fun. At the same time, no one wants to maintain a feature that is known to be a bad idea. So we need to try and make sure that the APIs that we ship on the web have a reasonable quality bar.

We want these APIs to be more-or-less consistent with the rest of the platform. To be ergonomic. To be compatible with existing content, interoperable with other web platform implementations (i.e. other engines, such as WebKit, Gecko and Ladybird) as well as with deployed network components (e.g. CDNs and other intermediaries).

In short, there are a lot of considerations that go into shipping features on the web, in order to make sure that the cost of shipping them (on the platform and on web developers) is minimal and will be outweighed by their usefulness.

This is the reason we can't just design our feature over some internal design doc and treat the standards and Blink process as a checkbox that needs to be checked.

We can’t just go:

Explainer? None
TAG review? N/A
Vendor positions? No signal
Developer signal? None
WPTs? Nope!

We need to fill in those fields with meaningful links. But we need to go beyond that and make a genuine effort at achieving the goals that those fields represent.

So let’s talk about the reasons we are filling those fields in the first place!

# Eventual Interoperability

The web's superpower is in its reach. It's a ubiquitous platform and web pages can run on a huge variety of device form factors, operating systems and browsers.

When web developers write web sites, they do that for the broader web, not just for Chromium browsers, at least not when they're doing it right. And if web developers would start writing Chromium-only web sites, they'd give up a lot of that reach. That'd be a shame for them, but more importantly would also erode user trust in the web, resulting in the entire platform losing prominence.

So, we want web developers to write interoperable sites. They can only do that if the changes we introduce to the platform are interoperable. There are cases where we're introducing capabilities that other implementations do not support, in which case these capabilities won't be immediately interoperable.

In such cases where there’s a reasonable fallback or a polyfill, we should ensure that:

other implementations don't break as a result of us shipping the feature, and web developers use feature detection in order to ensure that
When other implementations are interested in adopting that capability, they could easily do that in an interoperable way.

We ensure (1) happens through API design, documentation and code examples.

(2) is the reason we run things through the standards process, vendor positions, developer signals, and web platform tests.

More specifically:

The standards process, through filing WG issues, working collaboratively on standard PRs or in-the-open incubations enables us to get other vendors up to speed, and do our best to bring them onboard.
- That process also helps provide IPR guarantees and enable cross-company collaborations, and reassure non-Chromium browser vendors that they can safely follow.
vendor positions enable us to surface proposals to other implementers' internal teams, and gauge their level of interest in the problem space and proposed solution. It also enables us to try and tease out technical feedback as early as possible, so that we can respond to it and potentially adapt our proposal to increase probability for interoperability.
- As a concrete example of the above, Pointer Events are a feature where Chromium integrated Apple’s technical feedback despite the fact that they had no plans to ship it. Years later, when Apple wanted to support the relevant use cases, it was easy for them to adopt Pointer Events, resulting in increased interoperability.
Developer signals help us get a sense of the usefulness of the feature we’re trying to ship, and its potential for adoption by developers. For features that developers want badly, they are likely to demand them from other vendors, increasing the likelihood of multiple implementations.
Web platform tests enable us to ensure that when other vendors catch up, their implementations would match ours and the platform would be predictable and interoperable. It also makes it cheaper for other vendors to catch up, increasing the likelihood of them following in our footsteps which leads to improved interoperability.

In cases where there’s no reasonable fallback for developers, the feature we’re shipping may not as useful as it can be until it’s supported ubiquitously. (at least not without giving up on reach)

In those cases, we should have an explicit plan for adoption with other vendors. In such cases, the above becomes even more important.

# Eventual?

It’s entirely understandable that the “eventual” part is frustrating. When there isn’t active support and engagement from other implementers, doing all that work now in favor of future nebulous benefits can feel like a wasted investment. But it’s important to understand that not doing that will effectively make it impossible for other implementations to (eventually) catch up in an interoperable way, and will result in a forked platform. That in turn would result in diminished reach for developers, causing them to invest their efforts elsewhere.

Even if both TAG and other vendors currently oppose a certain feature, taking in feedback about the feature helps remove future opposition, once they’d see value in the use case, or get enough developer demand for the feature.

# Consistency

Beyond interop, the process is aiming to ensure the APIs we are shipping are ergonomic, easy to use for web developers, and are generally consistent with the rest of the platform.

This is where the TAG review comes in. The W3C TAG is composed of web platform experts that represent the broader industry - browser engineers, web developers, privacy advocates and more.

Their design reviews are aiming to provide shipping APIs the quality attributes we're after. Beyond that, the TAG is an influential stakeholder who we’d like to get on board with the feature, and reviews help with that as well.

But reviews only do what they're intended if we go into them while actively seeking feedback. If the TAG review is a checkbox, and we go into it ready to justify our initial design choices or agree to disagree, that's just an unpleasant experience to all involved.

Hence it’s important to engage with the TAG early, respond to their feedback whenever it’s actionable and integrate it into our API design.

# Transparency

Another major goal of the process is transparency in what we're working on. We can't be transparent while using industry-specific jargon and while hiding what actual changes mean behind walls of text of discussions and processing models.

This is where explainers come into play. When folks look at a passing intent, an explainer helps them better understand what the feature is all about and how web developers are supposed to use it.

That's true for web developers coming in from the Intent To Ship bot on social media, folks from the web community who want to understand what we're shipping as well as to API owners who review dozens of intents in a typical week.

For all of these cases you don’t want to force people to jump through hoops like reading through complex Github discussions and/or algorithmic pseudo-code in order to understand what your feature is, what it does, and what they'd need to do in order to use it. As a feature owner, it’s probably a good investment on your part to minimize the time reviewing your feature would take, to reduce the time the overall process would take. Also, you definitely don’t want people to have to be subject-matter experts in order to understand what we’re trying to ship.

While explainers don't replace either specifications or the final developer facing documentation, they sit somewhere in between and serve a crucial role in the process’ transparency. They enable us to bring in the broader community along for the ride.

# In closing

I know what you're thinking. The above sounds like a lot of work. And it is. Creating all the related artifacts takes time.

Taking feedback into account can delay shipping timelines, and requires re-opening an implementation you thought is already done. It can be tedious. I get it.

At the same time, it's also critically important and an essential part of working on the web platform. Investing work in the Blink process early on in the feature’s lifetime can go a long way to reduce its cost on the platform in the next 30 years or more. By ensuring the feature is properly reviewed, specified and documented, we’re minimizing costs for the millions of web developers that would use it, and future engineers that would maintain and improve on it.

We're here to create an interoperable & capable web, that web developers love developing for and that belongs to everyone.

And the process is crucial to achieve that.

So I’m hoping that having the end goal we're trying to achieve in mind can make the occasional frustration and extra work along the way more bearable, as well as help us ship better web platform features and capabilities.

_{Huge thanks to Rick Byers, Chris Harrelson and Chris Wilson for providing great feedback on an earlier version of this post.}

Skiing

2024-02-10T00:00:00Z

When I was 13 years old, a friend invited me to join them ice skating. There was an ice skating rink that recently opened and was all the rage. I never really went ice skating before, but thought to myself: "how bad can it be?"

Pretty bad, turns out. While others around me for which it was the first time ice-skating were more or less stable, that was not the case for me. I kept falling and had a hard time leaving the safety of the external rink wall. After 20 minutes or so on the ice, I fell and twisted my left ankle. It hurt like hell, it was swollen and blue-ish and I couldn't put any weight on that leg. At all.

A visit to the doctor's office and an X-ray later, I was diagnosed with a severely sprained ankle. I had a hard time moving at all for a week, and after that, I was still unable to put any weight on it for 3 weeks. That meant I had to hop around on one leg if I wanted to get anywhere.

Eventually, my ankle recovered. But my main take away from this experience was that I'm just not good at these sort of things.

"These sort of things" included everything that required balance and coordination. I had a hard time learning to ride a bike only a couple of years earlier, after taking a stab at it at the age of 6, falling, hurting myself and deciding that this is not for me.. (You may detect a pattern here)

And while I managed to ride a bike, that didn't transfer to other domains which required balance. I never set foot in that ice rink again. And friends that tried to convince me I should learn to rollerblade got a weird look in return. The same went for skateboarding, as well as surfing (which was a popular thing where I grew up, a 5 minutes walk from the beach).

So, "I'm not good at these things" became my thing. And it prevented me from ever trying anything that I felt required any of the skills I believed I inherently lacked.

Fast forward 20+ years later, and a friend suggested I'll join him and a few others on their weekly mountain biking rides. I live in an area that has some world-class trails, so in an atypical manner, I decided to give it a try.

I was bad! I wasn't in good shape for the climbs, and I was even (relatively) slower going downhill. My balance on the bike was completely off, and I made everyone wait for me again and again. The easiest, natural thing for me to do would've been to give up and go do something else instead.

But for some reason I stuck with it. The folks I rode with were cool and non-judgemental, and I felt like I'm improving from one bike ride to the next. Which got me thinking - if I can improve my balance and coordination on the bike, maybe I'm not a lost cause?

That brings us to skiing. I live about a 90 minute ride away from the Alps, and have been intrigued about skiing as a result for a long while. But it was always something others can do, and I just inherently can't. Plus, getting into skiing requires a significant investment. Ski resorts are expensive, especially when the kids are off school. There's also a lot of equipment involved. What if we spend all that money and end up hating it? Or worse, everyone would love it but me, and I'll end up dragging the rest of the family down?

But at some point a couple of years back, my curiosity overcame my fear. I took the plunge, rented a place for the family for a week, rented equipment and took a week's worth of ski lessons for beginners.

I was bad. I could barely stay upright with my skis on. I kept falling down, again and again and again. I managed to ski down very gentle slopes, by sheer brute force. At the end of the third day, I started having a throbbing pain on my left side between my abs and ribs, and by the end of the fourth it started feeling as if I'm being repeatedly stabbed there. A visit to the doctor's revealed that I strained the ligaments that hold my rib cage together (or something) and should avoid any physical activities for the next 6 weeks.

The main difference between this experience and the ice rink one at 13, was that in the few instances I managed to ski down the slope without falling, I realized I loved it, and I was determined to become good at this thing.

So, 6 weeks later once my ribs healed, I was back on the slopes, getting better and better at each run. I still fell, but a bit less. I was still unstable and using waaaay too much force in order to turn. But I was consistently more stable than I previously was.

Eventually I ended up taking more lessons to improve my technique and enable me to essentially be able to go anywhere on the mountain, and have the technique and stability to do that safely. This year, on most winter weekends, I catch a 6am bus that goes to a different ski resort for the day each time. It is GLORIOUS!

Beyond that, I found out that practicing my balance in one sport improves my skills in transferable ways. Getting better at skiing improved my mountain biking. And getting better at staying upright on a balance board improved my skiing. Getting out of my comfort zone in one area, increases my ability in adjacent ones.

So, I guess there are a few life lessons to extract from this:

If you think you can't do something, you're guaranteeing you won't be able to do that.
Persistence and hard work can compensate for lack of talent. Being bad at something shouldn't prevent you from doing it, and improving at it.
A thing you're "not good at" may just become your favorite thing if you keep at it.

This all sounds so obvious in retrospect. I just wish it didn't take me that long to figure it out..

On resource bundling and compression dictionaries

2024-02-23T00:00:00Z

# Not a bundle of joy?

Folks in the web development community have recently started talking about trying to get rid of bundling as part of the deployment process.

It’s almost as if people don’t like to add extra layers of complexity to their build processes.

At least some developers seem to want to just Use The Platform™ and ship ES modules to the browser, the same way they write them. And those developers are right. That would be neat!! And it’s great to see that it’s working in some cases.

Unfortunately, despite progress on that front, shipping individual ES modules to the browser is still very likely to be slower than bundling them.

# Why bundle?

In the beginning, we had HTTP/1.0 and HTTP/1.1.

We had a limited number of connections over which requests could be sent, which resulted in latency-dependent delays related to our resources. The more resources your web page had, the slower it was to load. Latency, rather than bandwidth, was the limiting factor.

The solution of the web performance community to that was to bundle resources. JS and CSS was concatenated into larger JS and CSS files. Images were thrown onto image sprites. All in order to reduce the number of resources.

And then we got SPDY and HTTP/2 (and HTTP/3) which promised to fix all that. The protocol runs on a single connection, and the limit on the number of requests that can be sent in a single RTT is very high (about a 100 requests by default). Problem solved!!

As the title of this section may imply, the problem was not solved..

HTTP/2+ solved the network latency issues that multiple resources were causing us, but that didn’t solve everything.

In particular:

HTTP/2 doesn’t help us with discovery. If a certain resource is late-discovered by the browser, the browser will request it later.
HTTP/2+ doesn’t enable us to extend a compression window beyond a single resource. What that means in practice is that the compression ratio of a lot of small files is significantly worse than the compression ratio of fewer, larger files.
There still remains an inherent per-request cost in browsers. Each request adds a certain amount of delay. (typically on the order of a few milliseconds)

At the same time, bundling in its current form is also suboptimal. The browser can’t start executing any of the content the bundle contains until the entire bundle was downloaded.

For example, let’s say you have two different widgets on your page. One is responsible for the interactivity of what you want users to be able to do (e.g. a “buy now” button). The other is responsible for some auxiliary functionality (e.g. analytics or prefetching resources for future navigations).

We can probably agree that user-facing interactivity is a higher priority than the auxiliary functionality.

But bundling those two widgets together will inherently slow down the higher priority widget. Even if we load both with high priority, the interactivity widget will have to wait until the auxiliary one also finished downloading and parsing before it can start executing.

In theory, moving to ES modules can avoid this issue, as long as the modules don’t depend on each other. For modules that do depend on each other, we could have solved this by having leaf modules start execution while the modules that depend on them are still loading. In practice, that’s not how ES module loading works, and they need to all be fully loaded and parsed before any of them runs. We would also need to enable microtasks to run between modules if we wanted them to not create responsiveness issues. But I digress..

Another issue that bundling introduces is that it harms caching granularity. Multiple small resources that may change at different times are all bundled together, with a single caching lifetime. And as soon as any of the smaller modules changes, the entire bundle gets invalidated.

# How to bundle?

Let’s say we have a website with 3 different pages, each one of them relying on different JS modules.

Each JS module is imported from different entry points, or from modules that are imported by the different entry points. We represent that in the graph by including three base colors for the entry points, and representing the dependencies on each module by a (rough) combination of these base colors.

We also have modules with gray backgrounds, to represent third-party modules, that are unlikely to change very often.

What’s the best way to split these different modules into different bundles?

For first load performance, it’d be best if each module's dependencies were in a single bundle, but a small number of bundles is also fine, especially if we flatten the discovery process (e.g. using <link rel=modulepreload>).
For caching benefits across pages, it’d be best if each color was in a separate bundle
For caching over time, it’d be good to cluster the bundles up according to change frequency (e.g. active development code vs. stable library code)
Priorities can also change what we bundle. For example, dynamically imported modules that are loaded later on in the page may need to be split apart.
We may want to impose a minimum size for bundles, and avoid very small bundles (for compression ratio reasons) at some cost to caching granularity loss, and while risking loading them unnecessarily on some pages.

Taking some of the above principles into account gives us the following split:

Now what would happen if images.mjs added an import to popular_library.mjs? That would move that library from the "green" category to the “black” one (the modules that all pages rely on), and change the semantics of the relevant bundle.

Now imagine that the typical complex web app has hundreds if not thousands of module dependencies with dozens of entry points, and imagine what kinds of semantic drift can happen over time.

# What’s “semantic drift”??

We’ll define “bundle semantic drift” as the phenomena we described above - where web app’s bundles can change their semantics over time, and essentially represent different groups of modules.

We can imagine two kinds of drift.

# Lateral semantic drift

That kind of drift happens when dependencies between modules change, resulting in a module “changing color”. (e.g. popular_library.mjs in the example above)

The implications are that if the user has two cached bundles that contain all the modules of a certain set, a single module moving from one bundle to another (e.g. because another module started depending on it) would invalidate both bundles, and cause users to redownload both of them, with all the modules they contain.

# Vertical semantic drift

That happens when a new entry point is added to the graph, effectively “adding another color”, and hence changing the dependency graph. That can change the modules contained in many of the existing bundles, as well as create new ones.

# Didn’t you have “compression dictionaries” in the title??

OK, so how’s all that related to compression dictionaries?

Compression dictionaries’ static resource flow enables us to only deliver a certain bundle once, and then when the user fetches the same bundle in the future, use the previous version as a dictionary in order to compress the current version.

That effectively means we only deliver the difference between the two versions!! The result is ridiculously impressive compression ratios. This approach also means that delivering the dictionaries costs us nothing, as they are resources that the user needs anyway.

That effectively means we can ignore the caching lifetime heuristics when drawing bundle boundaries, because they matter significantly less, at least from a delivery perspective. If some of the resources in the bundle change while others don’t, our users only “pay” the download price for the bits that changed.

So in our example, we could draw the bundle boundaries as something like that:

Where that theory fails is when we’re talking about code caching in the browser, which is currently working on a per-resource granularity. But one can imagine this changes as browsers improve their code caching granularity. (and e.g. make it per-function or per-module if we had a way to have a bundle of modules)

# Compression dictionaries and semantic drift

The problem is that compression dictionaries are extremely vulnerable to semantic drift, especially when the bundle’s name is related to which modules it contains.

When that’s the case (which is often), any semantic drift results in a bundle name change.

That has significant (negative) implications on compression dictionaries due to their matching mechanism, which is URLPattern based. If the URL changes in unpredictable ways, currently served bundles will no longer match the browser’s future requests of the equivalent bundles. That means that the browser would not advertise them as an available dictionary, so we won’t be able to use them for compression.

All that to say that in order to properly use compression dictionaries with JS bundles, we’d need (roughly) stable bundle semantics, and stable bundle URLs over time. That is, if we have a bundle filename that looks roughly like bundlename-contenthash.js, the bundlename part would need to remain stable over time, even if the hash changes on every release.

# Stable naming patterns over time

So we need bundles that are “more or less the same” to maintain the same name pattern over time, despite a slight drift in their semantics.

When we name bundles based on the modules they contain, that’s not what happens. We get a drift in the names every time there’s a slight drift in the bundle semantics.

A different approach that eliminates that issue with lateral drift is to name bundles based on the entry points that depend on them. Essentially in our examples, we would name the bundles based on their color.

Now if a module moved from one color to another, each bundle’s content would change (and we’d have to redeliver that specific module’s bytes), but neither bundle’s filename would change, so most of the bytes can still be derived from the cached versions of those past-bundles, and won’t be re-delivered.

That’s great, but doesn’t really help us with vertical drift. If a new entry point was added to the mix, that could “change the color” of a lot of modules and cause a significant drift for many bundles.

One potential solution there could be to only consider a subset of the entry points when determining bundle names, and by default when new entry points are added, we don’t consider them for naming existing bundles (that already have other entry points that depend on them).

As adding and removing entry points is presumably relatively rare, it can take quite a while until we have complex dependencies between bundles that are interdependent on entry points that are not considered for naming.

If and when that happens, we can expand the subset of entry points considered for naming. That would result in lower compression ratios the next time users visit.

# Conclusion

Compression dictionaries have huge potential for reducing network overhead, but in order to be able to successfully deploy them with JS bundles, we need stable bundle naming patterns over time.

Luckily for us, modern bundlers provide the necessary controls to achieve that. We have to make use of these controls, in order to ensure that dictionary compression remains effective as our web apps evolve and their bundle semantics drift.

_{Huge thanks to Pat Meenan, Ryan Townsend and Jake Archibald for reviewing and providing great feedback on this post. (And extra thanks to Jake for helping me figure out how to load the SVG diagrams. This stuff shouldn't be that hard..)}

How Chromium's cookies get evicted

2024-06-18T00:00:00Z

I was recently asked if I knew how Chromium’s cookies were evicted/garbage-collected. My answer was that I don’t know, but I can find out. This post is what I discovered while digging through the code. Hopefully it will be useful to someone else (or at the very least, to future me).

There's not a lot of prose in this one, mostly code pointers and references..

TL;DR - If you want some of your cookies to stick around over others, make them high priority and secure. If you want to explicitly delete some cookies, override them with an expired cookie, that matches their "domain" and "path".

As it turns out, Chromium has a cookie monster in its network stack. But even monsters have their limits. In this particular case, there’s a limit to the amount of cookies the cookie monster can hold.

It has limits on the maximum number of overall cookies (3300), as well as maximum number of cookies per domain (180). Once those limits are met, a certain number of cookies will be purged in order to make room for others - 300 overall and 30 per domain cookies.

Given that cookies are being partitioned, there are also limits per partitioned domain, but currently at least they are identical to the per domain one (so, 180 cookies).

Another interesting constant - cookies accessed in the last 30 days are safe from global purge. That is, other websites can’t cause your recent cookies to be purged, but can cause your older ones to go away if the overall cookie limit is exceeded.

# Eviction

The general eviction mechanism used in Chromium’s Cookie Monster is a variant of Least-Recently-Used (LRU).

Cookies are sorted based on access recency and the least recently accessed ones get evicted first.

Access time is defined as either the creation time of the cookie, or the last time the cookie was used (e.g. on a request header), give or take a minute.

When a new cookie is set, that kicks off a garbage collection process, which purges the least recently accessed ones for the domain, as well as the overall least recently accessed cookies.

The cookie monster determines what is its “purge goal” - how many cookies will be deleted in order to make room for more.

Initially, all expired cookies are deleted. Then the rest of the purge happens in rounds until the purge goal is met.

The first round targets non-secure cookies with low-priority and then secure, low-priority ones. At a minimum, 30 of those will be preserved, with priority to the secure ones.

Then non-secure medium-priority cookies go on the chopping board, followed by non-secure high-priority cookies.

Finally, secure medium-priority and then high-priority are considered.

From the medium-priority ones, at least 50 will be preserved (priority to secure ones), and at least a 100 high-priority ones would be as well.

Otherwise, if you want to explicitly delete some older cookies in order to make room for the ones you care about, setting an expired cookie deletes a previous copy and doesn’t add it back. In order to set a matching cookie, you want its "domain" and "path" to be identical to the cookie you're aiming to delete.

Note: Cookie priority is a proprietary Chromium feature. It does seem useful, so it’s a shame it wasn’t standardized. By default, cookies are of medium priority.

# Summary

Again, I hope the above is somewhat useful to someone. If you want to make sure specific cookies survive over others, or you want to delete old ones, you now know how to do that (at least in Chromium).

I haven't dug into the way other engines handle cookies, but I suspect that for Safari that happens in the (closed-source) OS level.

Finally, I wish cookie priority was standardized. It seems like a useful feature, and one that I wasn't aware of before digging into the code.

Update: Mike West tells me that there was a draft to standardize "priority", but that it didn't get Working Group adoption. Also, the priority bump for "secure" cookies is part of RFC6265bis.

Improving language negotiation

2024-08-20T00:00:00Z

Being a multilingual user on the web can sometimes be a frustrating experience. Depending on the device you’re using and the language it happens to be configured to, some websites will try to accommodate you, and give you a translated version of their content, even though you may prefer the non-translated original.

That is why a recent Intent to experiment around language negotiation caught my eye, as it seemed like a good opportunity to improve on the status quo. It also stirred up some controversy around the tradeoff it is making between user agency, internationalization and privacy.

In this post, I’ll try to cover what the intent is trying to solve, what’s currently broken with language content negotiation on the web, and how I think we can do better.

# Status quo

The Accept-Language header is sent with every request and contains a priority-ordered list of languages that the user prefers. The server can then choose to use it to reply with one of the preferred languages, and mark that using the Content-Language header.

Sending out an ordered list of preferred languages is not a problem for mono-lingual users of fairly popular languages, especially if these users’ IP addresses are coming from a country that speaks that language. But for users that speak multiple languages that browse the web from countries where those languages are not dominant, that list can reveal a lot of information.

In combination with the IP the user’s request is coming from, the list of languages can be combined with other bits of information to create a user-specific fingerprint that can identify the user across the web. (regardless of their cookie state)

# Chrome’s Proposal

The Chrome team wants to fix that by limiting the information sent with requests to the user’s single highest priority language. That’s an approach Safari also broadly takes (even if it’s slightly more complex than that), so there’s some precedent there.

Where the Chrome team’s approach differs is by allowing sites to reply with an Avail-Language header that contains a list of languages the content is available in. Then if the Content-Language value is one that the user doesn’t understand, the browser can retry the request using a language that is on both the user’s list and the Avail-Language list (if one exists).

To demonstrate the above with an example, let’s say our user understands both French and English, with French being the highest priority language. If that user goes to a site destined mainly for a Spanish-speaking audience, the request to that site would be sent with an Accept-Language: fr header, and the response may have a Content-Language: es-ES header as well as a Avail-Langauge: es, en one.

The browser would then understand that the response is in a language the user doesn’t understand, but that a variant of the content that the user could understand exists. It would then send another request with an Accept-Language: en to get the site’s English variant and present it to the user.

# Cost

In theory the above proposal would work similarly to today’s status quo and the user would eventually end up with the language they prefer in front of them.

But that doesn’t come for free.

On the client-side, the users would get their content with a significant delay, due to the request’s retry.

As far as servers go, they would end up sending two responses instead of one in those cases. That can result in some server overload, or at the very least, wasted CPU cycles.

And if I understand the proposal correctly, that cost would be borne every time the user browses to the site, because the site’s Avail-language response will not be cached to be used in future navigations to make better decisions.

In practice, I suspect things would be even worse..

The proposal counts on sites actively adopting Avail-Language as part of their language negotiation, in order to give users content in the language they expect. And while that may happen over time, in the meantime multilingual users will end up seeing sites in languages they do not understand!!

Beyond that, the proposal doesn’t take advantage of the opportunity to improve language negotiation on the web and improve the experience of multilingual users.

# A better path?

# Original language

In my opinion as a tri-lingual user, the above proposal carries over one fundamental mistake from the current language negotiation protocol. It assumes that users have a single “preferred language”.

My theory is that multilingual users have a favorite language per site, which is the site’s original language, assuming it is one they are fluent in.

It also kinda makes sense - because translation is lossy, as a user you’d want the highest fidelity content you can decipher, which is the content in its original language.

Therefore, I believe that we can avoid the tradeoff between privacy and internationalization/performance altogether! It's perfectly fine for the browser to send only a single language as part of the Accept-Language list! But that language needs to be influenced by the site’s original language.

Taking into account the site’s original language doesn’t reveal any extra information about the user, and hence is perfectly safe from a privacy perspective.

Browsers can know the site’s original language through any (or all) of the following:

We could define a new header (e.g. Original-Language) that browser-affiliated crawlers could use to get that information and then distribute it to browsers as a configuration.
- Maybe we don’t even need a new header and crawlers can detect the default language of the site when a bogus Accept-Language value is sent in the request.
Browsers could use the site’s domain as a heuristic that indicates its original language.
We could generate open source lists mapping sites (or a popular subset of them) to their original languages that various browsers could use. We could generate those lists using the above heuristics and/or site-provided headers.

Assuming that the original language is known, the Accept-Language algorithm can be as minimal as:

If the site’s original language is included in the user’s preferred languages, send that language in the Accept-Language header
Otherwise, send the user’s highest priority language.

# Passive resource requests

Beyond the above, I also think we can safely freeze the values of Accept-Language headers on requests for passive subresources (e.g. images, fonts). By “freeze”, I mean we can change the value of these headers to en for all users, regardless of their language preference.

While in theory sites could use language negotiation to e.g. serve different images or fonts to certain users, I suspect that (almost) never happens in practice. We could validate those assumptions by adding usecounters for resource responses that include a Content-Language header, which indicates the server did some negotiation, and then further investigate these cases.

If freezing of these header values works out, we might be able to even try and remove these headers entirely in the future for passive resources, if we see that no servers rely on them.

# Conclusion

While I believe that Chrome’s proposal is a good step in the right direction in favor of user privacy, I think it makes some unnecessary tradeoffs that aren’t ideal both for performance and (at least in the practical short term) for internationalization.

The alternative I propose here does not come for free, as browsers would need to maintain “original language” lists or heuristics. But I believe it would provide a better user experience, while giving users better privacy.