Big performance wins can be found by taking a step back and tweaking what you already have. Today’s The Fast and the Curious post explores how we improved the scrolling experience of Chrome on Android, ultimately reducing slow scrolling jank by 2x. Read on to see how we discovered and evaluated the problem, and how that has helped us design a better browser experience going forward.

When measuring the performance of a browser, one might typically think of page load speed or Web Vitals. On mobile where touch interactions are common we also prioritize your interaction with Chrome to ensure it is always smooth and responsive including on new form factors like foldables. A significant focus of late has been on reducing jank while you scroll.

We recently improved the scrolling experience of Chrome on Android by 2x by filtering noise and reducing visual jumps in the content presented on screen. To get this result, we had to take a step back and figure out the problem of why Chrome on Android was lagging behind Chrome on iOS.

As we compared Chrome across platforms, we were hit with a particular observation. iOS Chrome scrolling was smooth and consistent whereas on Android, Chrome’s scrolling didn't follow your finger as closely. However, our metrics were telling us that while janks occurred occasionally, they weren’t as common as our perception when comparing with Chrome on iOS. Thus we had ourselves a mystery which needed some investigation.

Investigating input to output rate

Our metrics flagged that we often received input at an inconsistent rate; but since the input rate was greater than the display’s frame rate, we usually had at least one input event to trigger the production of a frame to display. However, this frame might have consumed fewer or more input events, which could result in inconsistent shifting of content on screen even while scrolling at a fixed speed.

This problem of different input rate vs frame rate is a problem that Chrome has had to address before. Internally, we resample input to predict/extrapolate where the finger was at a consistent point relative to the frame we want to produce. This should result in each frame representing a consistent amount of time and should mean smooth scrolling regardless of noise in the input events. The ideal scenario is illustrated in the following diagram where blue dots are real input events, green are resampled input events, and the displayed scroll deltas would fluctuate if you were to use the real input events rather than resampling.

Okay so we already do resampling so what's the problem?

A tale of woe and reimplementation

Input resampling inside of Chrome (and Android) were added back in 2019 as 90hz devices emerged and the problem above became more apparent (oscillating between 2 vs 1 input events per frame rather than consistently 2 input events per frame we usually see on 60hz devices). Android implemented multiple resampling algorithms (kalman, linear, etc.) and arrived at the conclusion that linear resampling (drawing a line between two points to figure out velocity and then extrapolate to the given timestamp) was good enough for their use cases. This fixed the problem for most Android apps, but didn't address Chrome.

Due to historical reasons and web spec requirements for raw input, Chrome uses unbuffered input and thus as devices started to appear with sampling rates that didn’t fit with input, Chrome had to implement some version of resampling. Below we see that each frame (measuring the time from input to it being displayed) consumes a different amount of input events (2 for the first, 3 for the second, and 1 for the third), if we assume input is consistently arriving and each is exactly 30 pixels of displacement then ideally we should smooth it out to 60 pixels per frame as seen below:

However, while we were investigating the original mystery we discovered that reality was very different from the ideal situation pictured above. We found that the actual input movement of the screen was quite spiky and inconsistent (more than we expected) and that our predictor was improving things but not as much as desired. On the left is real finger displacement on a screen (each point is an input event) and on the right the result of our predictor of actual content offset after smoothing out (each point is a frame)

Frames are being presented consistently on the right, but the rate of displacement spikes between one to another isn’t consistent (-50 to -40 followed by another -52 being especially drastic). Human fingers don’t move this discretely (at frame level precision). Rather they should slide and flex in a gradient, speeding up or slowing down gradually. So we knew we had a problem here. We dug deeper into Chrome’s implementation and found there were some fundamental differences in Chrome’s implementation (which was supposedly a copy of Android’s).

1. Android uses the native C++ MotionEvent timestamp (with nanosecond precision), but Chrome uses Java MotionEvent.getEventTime & MotionEvent.getHistoricalEventTime (milliseconds precision). Unfortunately, nanosecond precision was not part of the public API. However, rounding of milliseconds can introduce error into our predictor when it computes velocity between event timestamps.
2. Android’s implementation takes care when selecting the two input events so resampling is using the most relevant events. Chrome however uses a simple FIFO queue of input events, which can result in weird cases of using future events to predict velocity in the past in rare cases on high refresh rate devices.

We prototyped using Android’s resampling in Chrome, but found it was still not perfect for Chrome’s architecture resulting in some jank. To improve on it, we experimented with different algorithms, using automation to replay the same input over and over again and evaluating the screen displacement curves. After tuning, this landed at the 1€ filter implementation that visibly and drastically improved the scrolling experience. With this filter, the screen tracks closely to your finger and websites smoothly scroll, preventing jank caused by inconsistent input events. The improvement is visible in our manual validation, on both top-end and low-end devices (Here's a redmi 9A video example).

Going forward!

In Android 14, the nanosecond API for java MotionEvents will be publicly exposed in the SDK so Chrome (and other apps with unbuffered input) will be able to call it. We also developed new metrics that track the quality of the scroll predictors frame, by creating a test app which introduced pixel level differences between frames (and no other form of jank) and running experiments to see what people would notice. These analysis can be read about here and will be used going forward for more exciting performance wins and to make this a visible area for tracking against regressions. In the end, after tuning and enabling the 1€ filter, our metrics show a 2x reduction in visible jank while scrolling slowly! This improvement is going live in M116 as the default, but will be launched all the way back to M110 and brings Chrome on Android on par with Chrome on iOS!

The moral of the story is: Sometimes metrics don’t cover all the cases and taking a step back and investigating from the top down and getting the lay of the land can end with a superior scrolling experience for users.

Post by: Stephen Nusko, Chrome Software Engineer

Investigating input to output rate

Okay so we already do resampling so what's the problem?

A tale of woe and reimplementation

1. Android uses the native C++ MotionEvent timestamp (with nanosecond precision), but Chrome uses Java MotionEvent.getEventTime & MotionEvent.getHistoricalEventTime (milliseconds precision). Unfortunately, nanosecond precision was not part of the public API. However, rounding of milliseconds can introduce error into our predictor when it computes velocity between event timestamps.

2. Android’s implementation takes care when selecting the two input events so resampling is using the most relevant events. Chrome however uses a simple FIFO queue of input events, which can result in weird cases of using future events to predict velocity in the past in rare cases on high refresh rate devices.

Going forward!

Post by: Stephen Nusko, Chrome Software Engineer

Whether you prefer organizing your browser with tab groups, naming your windows, tab search, or another method, you have lots of features that help you get to the tabs you want. In this The Fast and the Curious post, we describe how we use what windows are visible to you to optimize Chrome, leading to 25.8% faster start up and 4.5% fewer crashes.

Background

For several years, to improve the user experience, Chrome has lowered the priority of background tabs[1]. For example, JavaScript is throttled in background tabs, and these tabs don’t render web content. This reduces CPU, GPU and memory usage, which leaves more memory, CPU and GPU for foreground tabs that the user actually sees. However, the logic was limited to tabs that weren't focused in their window, or windows that were minimized or otherwise moved offscreen.

Through experiments, we found that nearly 20% of Chrome windows are completely covered by other windows, i.e., occluded. If these occluded windows were treated like background tabs, our hypothesis was that we would see significant performance benefits. So, around three years ago, we started working on a project to track the occlusion state of each Chrome window in real time, and lower the priority of tabs in occluded windows. We called this project Native Window Occlusion, because we had to know about the location of native, non-Chrome windows on the user’s screen. (The location information is discarded immediately after it is used in the occlusion calculation.)

Calculating Native Window Occlusion

The Windows OS doesn’t provide a direct way to find out if a window is completely covered by other windows, so Chrome has to figure it out on its own. If we only had to worry about other Chrome windows, this would be simple because we know where Chrome windows are, but we have to consider all the non-Chrome windows a user might have open, and know about anything that happens that might change whether Chrome windows are occluded or not.

There are two main pieces to keeping track of which Chrome windows are occluded. The first is the occlusion calculation, which consists of iterating over the open windows on the desktop, in z-order (front to back) and seeing if the windows in front of a Chrome window completely cover it. The second piece is deciding when to do the occlusion calculation.

Calculating Occlusion

In theory, figuring out which windows are occluded is fairly simple. In practice, however, there are lots of complications, such as multi-monitor setups, virtual desktops, non-opaque windows, and even cloaked windows(!). This needs to be done with great care, because if we decide that a window is occluded when in fact it is visible to the user, then the area where the user expects to see web contents will be white. We also don’t want to block the UI thread while doing the occlusion calculation, because that could reduce the responsiveness of Chrome and degrade the user experience. So, we compute occlusion on a separate thread, as follows:

Ignore minimized windows, since they’re not visible.
Mark Chrome windows on a different virtual desktop as occluded.
Compute the virtual screen rectangle, which combines the display monitors. This is the unoccluded screen rectangle.
Iterate over the open windows on the desktop from front to back, ignoring invisible windows, transparent windows, floating windows (windows with style WS_EX_TOOLBAR), cloaked windows, windows on other virtual desktops, non-rectangular windows[2], etc. Ignoring these kinds of windows may cause some occluded windows to be considered visible (false negatives) but importantly it won’t lead to treating visible windows as occluded (false positives). For each window:

Subtract the window's area from the unoccluded screen rectangle.
If the window is a Chrome window, check if its area overlapped with the unoccluded area. If it didn’t, that means the Chrome window is completely covered by previous windows, so it is occluded.

Keep iterating until all Chrome windows are captured.

At this point, any Chrome window that we haven’t marked occluded is visible, and we’re done computing occlusion. Whew! Now we post a task to the UI thread to update the visibility of the Chrome windows.

This is all done without synchronization locks, so the occlusion calculation has minimal effect on the UI thread, e.g., it will not ever block the UI thread and degrade the user experience.

For more detailed implementation information, see the documentation.

Deciding When to Calculate Occlusion

We don’t want to continuously calculate occlusion because it would degrade the performance of Chrome, so we need to know when a window might become visible or occluded. Fortunately, Windows lets you track various system events, like windows moving or getting resized/maximized/minimized. The occlusion-calculation thread tells Windows that it wants to track those events, and when notified of an event, it examines the event to decide whether to do a new occlusion calculation. Because we may get several events in a very short time, we don’t calculate occlusion more than once every 16 milliseconds, which corresponds to the time a single frame is displayed, assuming a frame rate of 60 frames per second (fps).

Some of the events we listen for are windows getting activated or deactivated, windows moving or resizing, the user locking or unlocking the screen, turning off the monitor, etc. We don’t want to calculate occlusion more than necessary, but we don’t want to miss an event that causes a window to become visible, because if we do, the user will see a white area where their web contents should be. It’s a delicate balance[3].

The events we listen for are focused on whether a Chrome window is occluded. For example, moving the mouse generates a lot of events, and cursors generate an event for every blink, so we ignore events that aren’t for window objects. We also ignore events for most popup windows, so that tooltips getting shown doesn’t trigger an occlusion calculation.

The occlusion thread tells Windows that it wants to know about various Windows events. The UI thread tells Windows that it wants to know when there are major state changes, e.g., the monitor is powered off, or the user locks the screen.

Results

This feature was developed behind an experiment to measure its effect and rolled out to 100% of Chrome Windows users in October 2020 as part of the M86 release. Our metrics show significant performance benefits with the feature turned on:

8.5% to 25.8% faster startup
3.1% reduction in GPU memory usage
20.4% fewer renderer frames drawn overall
4.5% fewer clients experiencing renderer crashes
3.0% improvement in first input delay
6.7% improvement in first contentful paint and largest contentful paint

[1] Note that certain tabs are exempt from having their priority lowered, e.g., tabs playing audio or video.

[2] Non-rectangular windows complicate the calculations and were thought to be rare, but it turns out non-rectangular windows are common on Windows 7, due to some quirks of the default Windows 7 theme.

[3] When this was initially launched, we quickly discovered that Citrix users were getting white windows whenever another user locked their screen, due to Windows sending us session changed notifications for sessions that were not the current session. For the details, look here.

Background

Calculating Native Window Occlusion

Calculating Occlusion

Ignore minimized windows, since they’re not visible.
Mark Chrome windows on a different virtual desktop as occluded.
Compute the virtual screen rectangle, which combines the display monitors. This is the unoccluded screen rectangle.
Iterate over the open windows on the desktop from front to back, ignoring invisible windows, transparent windows, floating windows (windows with style WS_EX_TOOLBAR), cloaked windows, windows on other virtual desktops, non-rectangular windows[2], etc. Ignoring these kinds of windows may cause some occluded windows to be considered visible (false negatives) but importantly it won’t lead to treating visible windows as occluded (false positives). For each window:

Subtract the window's area from the unoccluded screen rectangle.
If the window is a Chrome window, check if its area overlapped with the unoccluded area. If it didn’t, that means the Chrome window is completely covered by previous windows, so it is occluded.

Keep iterating until all Chrome windows are captured.

This is all done without synchronization locks, so the occlusion calculation has minimal effect on the UI thread, e.g., it will not ever block the UI thread and degrade the user experience.

For more detailed implementation information, see the documentation.

Deciding When to Calculate Occlusion

Results

8.5% to 25.8% faster startup
3.1% reduction in GPU memory usage
20.4% fewer renderer frames drawn overall
4.5% fewer clients experiencing renderer crashes
3.0% improvement in first input delay
6.7% improvement in first contentful paint and largest contentful paint

A reason for the startup and first-contentful-paint improvements is when Chrome restores two or more full-screen windows when starting up, one of the windows is likely to be occluded. Chrome will now skip much of the work for that window, thus saving resources for the more important foreground window.

Posted by David Bienvenu, Chrome Developer

Data source for all statistics: Real-world data anonymously aggregated from Chrome clients.[1] Note that certain tabs are exempt from having their priority lowered, e.g., tabs playing audio or video.[2] Non-rectangular windows complicate the calculations and were thought to be rare, but it turns out non-rectangular windows are common on Windows 7, due to some quirks of the default Windows 7 theme.[3] When this was initially launched, we quickly discovered that Citrix users were getting white windows whenever another user locked their screen, due to Windows sending us session changed notifications for sessions that were not the current session. For the details, look here.

Chromium Blog

Smoothing out the scrolling experience in Chrome on Android

Investigating input to output rate

A tale of woe and reimplementation

Going forward!

How Chrome achieved high scores on three browser benchmarks

Announcing our brand new mid-tier compiler: Maglev

Speedometer

MotionMark

Jetstream

Do more with Chrome on a single charge on MacBooks

Speeding up Chrome on Android Startup with Freeze Dried Tabs

How Chrome Became the Highest Scoring Browser on Speedometer, Ever

A new speed milestone for Chrome

Chrome on Windows performance improvements and the journey of Native Window Occlusion

Background

Calculating Native Window Occlusion

Calculating Occlusion

Deciding When to Calculate Occlusion

Results

Labels

Archive

Feed