Big performance wins can be found by taking a step back and tweaking what you already have. Today’s The Fast and the Curious post explores how we improved the scrolling experience of Chrome on Android, ultimately reducing slow scrolling jank by 2x. Read on to see how we discovered and evaluated the problem, and how that has helped us design a better browser experience going forward.


When measuring the performance of a browser, one might typically think of page load speed or Web Vitals. On mobile where touch interactions are common we also prioritize your interaction with Chrome to ensure it is always smooth and responsive including on new form factors like foldables. A significant focus of late has been on reducing jank while you scroll.


We recently improved the scrolling experience of Chrome on Android by 2x by filtering noise and reducing visual jumps in the content presented on screen. To get this result, we had to take a step back and figure out the problem of why Chrome on Android was lagging behind Chrome on iOS. 


As we compared Chrome across platforms, we were hit with a particular observation. iOS Chrome scrolling was smooth and consistent whereas on Android, Chrome’s scrolling didn't follow your finger as closely. However, our metrics were telling us that while janks occurred occasionally, they weren’t as common as our perception when comparing with Chrome on iOS. Thus we had ourselves a mystery which needed some investigation.


Investigating input to output rate

Our metrics flagged that we often received input at an inconsistent rate; but since the input rate was greater than the display’s frame rate, we usually had at least one input event to trigger the production of a frame to display. However, this frame might have consumed fewer or more input events, which could result in inconsistent shifting of content on screen even while scrolling at a fixed speed.


This problem of different input rate vs frame rate is a problem that Chrome has had to address before. Internally, we resample input to predict/extrapolate where the finger was at a consistent point relative to the frame we want to produce. This should result in each frame representing a consistent amount of time and should mean smooth scrolling regardless of noise in the input events. The ideal scenario is illustrated in the following diagram where blue dots are real input events, green are resampled input events, and the displayed scroll deltas would fluctuate if you were to use the real input events rather than resampling.





Okay so we already do resampling so what's the problem?


A tale of woe and reimplementation

Input resampling inside of Chrome (and Android) were added back in 2019 as 90hz devices emerged and the problem above became more apparent (oscillating between 2 vs 1 input events per frame rather than consistently 2 input events per frame we usually see on 60hz devices). Android implemented multiple resampling algorithms (kalman, linear, etc.) and arrived at the conclusion that linear resampling (drawing a line between two points to figure out velocity and then extrapolate to the given timestamp) was good enough for their use cases. This fixed the problem for most Android apps, but didn't address Chrome.


Due to historical reasons and web spec requirements for raw input, Chrome uses unbuffered input and thus as devices started to appear with sampling rates that didn’t fit with input, Chrome had to implement some version of resampling. Below we see that each frame (measuring the time from input to it being displayed) consumes a different amount of input events (2 for the first, 3 for the second, and 1 for the third), if we assume input is consistently arriving and each is exactly 30 pixels of displacement then ideally we should smooth it out to 60 pixels per frame as seen below:





However, while we were investigating the original mystery we discovered that reality was very different from the ideal situation pictured above. We found that the actual input movement of the screen was quite spiky and inconsistent (more than we expected) and that our predictor was improving things but not as much as desired. On the left is real finger displacement on a screen (each point is an input event) and on the right the result of our predictor of actual content offset after smoothing out (each point is a frame)





Frames are being presented consistently on the right, but the rate of displacement spikes between one to another isn’t consistent (-50 to -40 followed by another -52 being especially drastic). Human fingers don’t move this discretely (at frame level precision). Rather they should slide and flex in a gradient, speeding up or slowing down gradually. So we knew we had a problem here. We dug deeper into Chrome’s implementation and found there were some fundamental differences in Chrome’s implementation (which was supposedly a copy of Android’s).


1. Android uses the native C++ MotionEvent timestamp (with nanosecond precision), but Chrome uses Java MotionEvent.getEventTime & MotionEvent.getHistoricalEventTime (milliseconds precision). Unfortunately, nanosecond precision was not part of the public API. However, rounding of milliseconds can introduce error into our predictor when it  computes velocity between event timestamps.

2. Android’s implementation takes care when selecting the two input events so resampling is using the most relevant events. Chrome however uses a simple FIFO queue of input events, which can result in weird cases of using future events to predict velocity in the past in rare cases on high refresh rate devices.


We prototyped using Android’s resampling in Chrome, but found it was still not perfect for Chrome’s architecture resulting in some jank. To improve on it, we experimented with different algorithms, using automation to replay the same input over and over again and evaluating the screen displacement curves. After tuning, this landed at the 1€ filter implementation that visibly and drastically improved the scrolling experience. With this filter, the screen tracks closely to your finger and websites smoothly scroll, preventing jank caused by inconsistent input events. The improvement is visible in our manual validation, on both top-end and low-end devices (Here's a redmi 9A video example).


Going forward!

In Android 14, the nanosecond API for java MotionEvents will be publicly exposed in the SDK so Chrome (and other apps with unbuffered input) will be able to call it. We also developed new metrics that track the quality of the scroll predictors frame, by creating a test app which introduced pixel level differences between frames (and no other form of jank) and running experiments to see what people would notice. These analysis can be read about here and will be used going forward for more exciting performance wins and to make this a visible area for tracking against regressions. In the end, after tuning and enabling the 1€ filter, our metrics show a 2x reduction in visible jank while scrolling slowly! This improvement is going live in M116 as the default, but will be launched all the way back to M110 and brings Chrome on Android on par with Chrome on iOS!


The moral of the story is: Sometimes metrics don’t cover all the cases and taking a step back and investigating from the top down and getting the lay of the land can end with a superior scrolling experience for users.


Post by: Stephen Nusko, Chrome Software Engineer




Since the beginning of Chrome, benchmarks have been a key way by which we drive performance optimizations that benefit users. The most relevant web benchmarks today are Speedometer, MotionMark, and Jetstream. Over the last year Chrome has invested in optimizing against these specific benchmarks and has just achieved our highest scores across all three. These gains were achieved through a combination of large projects and small improvements. In today’s The Fast and the Curious post, we want to share just some of the ways we drove these improvements in Chrome.

Announcing our brand new mid-tier compiler: Maglev 

We’re bringing a new mid-tier compiler to Chrome. Maglev is a just-in-time compiler that can quickly generate performant machine code for all relevant functions within the first one-hundredth of a second. It reduces overall CPU time to compile code while also saving battery life. Our measurements show Maglev has provided a 7.5 percent improvement on Jetstream and a 5 percent improvement in Speedometer. Maglev will start rolling out in Chrome version 114, which begins release on June 5.

Speedometer 

Speedometer measures the responsiveness of websites by putting various JavaScript UI frameworks through their paces. Just over a year ago we shared details about how we increased our score from 100 to over 300 from Chrome version 40 to version 101. Since then, across 13 Chrome releases, we’ve achieved our new highest Speedometer score of 491. In addition to Maglev, the V8 team has achieved this score through both small adjustments, such as optimized function calls, and major, multi-quarter projects. 

A speedometer visual shows a 491 score for the Speedometer browser benchmark, which measures the responsiveness of websites. This is up from a score of 330 in the past year for Chrome.
Chrome 116.0.5803.2 running on an M2 Macbook Air with Maglev enabled


MotionMark

MotionMark is designed to test how much browser graphics systems can render at high frame rates. Chrome’s graphics and rendering teams have tracked over 20 optimizations since the start of the year, and more than half are available today. Together, these optimizations have almost tripled performance. Some highlights include improvements to Canvas performance, profile-guided optimization, GPU task scheduling, and layer compositing. We also created a novel algorithm for dynamic multisample anti-aliasing and out-of-process 2D canvas rasterization for improved parallelism.

A speedometer visual shows a 4821.30 score for the MotionMark browser benchmark, which tests browser graphics systems. This marks a nearly 3X improvement in the last year for Chrome.
Chrome M115.0.5773.4 running on a 13” M2 Macbook Pro

Jetstream 

JetStream is a JavaScript and WebAssembly benchmark suite focused on advanced web applications. Many of the updates that we made for Speedometer also drove significant improvements on Jetstream as we optimized the V8 engine. In addition to these enhancements, Maglev drove the biggest gains in this benchmark. 

A speedometer visual shows a 330.939 score for the Jetstream2 browser benchmark, which focuses on advanced web applications. This improvement is largely driven by Maglev, a new just-in-time compiler in Chrome.
Chrome 116.0.5803.2 running on an M2 Macbook Air with Maglev enabled


Looking ahead


Because we’re optimizing against these benchmarks, it’s essential that these improvements translate to real user benefits, which is why we’re investing, along with other browsers, in creating the next generation of benchmarks. This has been an ongoing collaboration, and we’re excited to turn our efforts toward this new target in the coming year.


We hope you all enjoy a faster Chrome! 




Posted by Thomas Nattestad, Product Manager



From the beginning, we designed Chrome to be efficient. Being efficient is not just about loading pages as fast as possible, it’s also about doing it with the least amount of resources possible. Today’s The Fast and the Curious post explores how we improved Chrome to maximize battery life on Mac, so you can enjoy browsing and watching videos longer than ever before. 


With the latest release of Chrome, we’ve made it possible to do more on your MacBook on a single charge thanks to a ton of optimizations under the hood. In our testing, we found that you can browse for 17 hours or watch YouTube for 18 hours on a MacBook Pro (13", M2, 2022). And with Chrome’s Energy Saver mode enabled, you can browse an additional 30 minutes on battery(1). Of course, we care deeply about all our users, not just those with the latest hardware. That’s why you’ll also see performance gains on older models as well. 


Here’s a closer look at some of the changes we made:
 
Fine tuning iframes

We realized that many iframes live just a few seconds. As a result, we fine-tuned the garbage collection and memory compression heuristics for recently created iframes. This results in less energy consumed to reduce short-term memory usage (without impact on long-term memory usage).



Tweaking timers 

Javascript timers were introduced at the beginning of the Web’s history. Since then, Web developers have access to more efficient APIs to achieve the same (or better!) results. But Javascript timers still drive a large proportion of a Web page’s power consumption. As a result, we tweaked the way they fire in Chrome to let the CPU wake up less often.


Similarly, we identified opportunities to cancel internal timers when they’re no longer needed, reducing the number of times that the CPU is woken up. 


Streamlining data structures

We identified data structures in which there were frequent accesses with the same key and optimized their access pattern.



Eliminating unnecessary redraws

We navigated on real-world sites with a bot and identified Document Object Model (DOM) change patterns that don’t affect pixels on the screen. We modified Chrome to detect those early and bypass the unnecessary style, layout, paint, raster and gpu steps. We implemented similar optimizations for changes to the Chrome UI.

There’s always more work to be done. With the open-source benchmark suite, we’ll be able to tap the broader community of devs to help us to improve Chrome’s power consumption in 2023 and beyond.


Posted by François Doray, Software Developer, Chrome

___
1 Testing conducted in February 2023 using Chrome 110.0.5481.100 on a MacBook Pro (13”, M2, 2022 with 8 GB RAM running MacOS Ventura 13.2.1) and measured using our open-source benchmark suite.





Last week we released a blog post about our improvements in Chrome speed over the past year culminating with the M99 release of Chrome. We wanted to follow up by going in depth on how we achieved this milestone in browser performance.

Since the launch of Chrome in 2008, one of our core principles has been to build the fastest browser, whether you're on your phone or laptop. We have never strayed from our performance mission, and are always analyzing and optimizing every part of Chrome. We're proud to announce that Chrome scores over 300 on Apple’s Speedometer 2.0 benchmark suite on the M1 MacBook, the highest score we’ve ever seen. In this The Fast and the Curious post we'll go behind the scenes to share all the work that went into making Chrome blazingly fast.

“If you can’t measure it you can’t improve it” – this sentiment has driven a large part of our work to improve Chrome’s performance since the early days. For measuring browser performance, there has been a long history of various benchmarks that aim to provide test workloads for browsers to track their performance. Making these benchmarks both reflective of the real and ever changing world, while also being consistent, is a challenge. Chrome uses a combination of internal benchmarking infrastructure and public, industry-standard benchmarks, to continuously measure Chrome’s performance. For comparing browsers’ JavaScript performance, Apple’s Speedometer 2.0 benchmark is the most reflective of the real world, and most broadly used today.

We’ve been tracking our performance on Speedometer 2.0 ever since it came out:




Beginning with the M87 release, Chrome shipped on the M1 based Mac and began measuring the speed of Chrome on the new CPU reflected in the red line above.

Since 2015, we’ve been measuring Chrome’s Speedometer scores on a 13-inch MacBook. In the graph above, you can see just some of the many projects that have helped make a dramatic improvement in performance. You can learn about fast lookups, the Ignition + TurboFan compilers, blazingly fast parsing, faster JS calls, Spectre, Pointer Compression, Short builtins, Sparkplug and much more on V8.dev. You’ll notice some projects actually decrease our Speedometer score, as building an entire browser is about managing trade offs. For example, with pointer compression, we were willing to take a small performance hit for the large memory savings it provided. Similarly, when the Spectre CPU exploit hit, we traded off performance to help guarantee the safety of our users.

The result of years of work has been an 83% improvement in Speedometer score, a dramatic improvement we are happy to deliver to our users. With Apple’s introduction of the M1 CPU, combined with Sparkplug and LTO+PGO, Chrome now scores over 300 - the highest score any browser has ever achieved \o/.

We are excited to achieve this milestone in performance and look forward to delivering even more performance improvements with each release. Stay tuned to this blog to stay up to date on all things speed.

Posted by Thomas Nattestad, Chrome Product Manager

Footnote:Data source for M1 MacBook statistics: Speedometer 2.0 comparing Chrome 99.0.4812.0 --enable-features=CanvasOopRasterization --use-cmd-decoder=passthrough vs. Safari 15.2 17612.3.6.1.6 on a MacBook Pro (14", 2021), Apple M1 Max, 10 cores (8 performance, 2 efficiency), 32 GPU cores, 64gb device.




Everyday, billions of people around the world turn to Chrome to get things done quickly on their devices, whether shopping for a new pair of headphones or pulling together a sales report for work. Nothing is more frustrating than having a slow experience while browsing the web. That’s why Chrome has always been focused on building the fastest possible browser since its launch in 2008, without compromising on feature functionality or security. In our first The Fast and the Curious post of 2022, we are thrilled to celebrate how in the M99 release of Chrome we were able to substantially increase the speed of Chrome across all major platforms.

We go deep on every platform where Chrome runs to provide the fastest possible experience. We’re excited to announce that in M99, Chrome on Mac has achieved the highest score to date of any browser – 300 – in Apple’s Speedometer browser responsiveness benchmark.

Building on many performance changes over the last year, we enabled ThinLTO in M99, a build optimization technique that inlines speed-critical parts of the code base, even when they span multiple files or libraries. The result? An additional across-the-board speed bump that makes Chrome 7% faster than current builds of Safari. Combined with recent graphics optimizations (namely, pass-through decoder and out-of-process rasterization), our tests have also shown Chrome’s graphics performance to be 15% faster than Safari. Overall, since launching Chrome on M1-based Macs in late 2020, Chrome is now 43% faster than it was just 17 months ago!




Two of the other recent major contributors to Chrome’s speed are the V8 Sparkplug compiler and short builtin calls. Sparkplug is a new mid-tier JavaScript compiler for V8 that generates efficient code with low compilation overhead. Short builtin calls are used by the V8 JavaScript engine to optimize the placement of generated code inside the device’s memory. This technique boosts performance by avoiding indirect jumps when calling functions and makes a substantial difference on Apple M1-based Macs.

Chrome continues to get faster on Android as well. Loading a page now takes 15% less time, thanks to prioritizing critical navigation moments on the browser user interface thread. Last year we also reduced startup time for Chrome on Android by 13% using Freeze-Dried Tabs. This approach conserves resources across the board by using a lightweight version of tabs on load, while the actual tab loads in the background. Finally, we were able to improve speed and memory usage using Isolated Splits, which improved startup time by preloading the majority of the browser process code on a background thread.

We know that benchmarks are just one of many ways of measuring the speed of a browser. At the end of the day, what matters most is that Chrome is actually faster and more efficient in everyday usage, so we’ll continue to invest in innovative performance improvements that push the envelope of what’s possible in modern computing.

Posted by Max Christoff, Senior Director, Chrome Engineering


Data source for Mac statistics: Speedometer 2.0 comparing Chrome 99.0.4812.0 --enable-features=CanvasOopRasterization --use-cmd-decoder=passthrough vs. Safari 15.2 17612.3.6.1.6 on a MacBook Pro (14", 2021), Apple M1 Max, 10 cores (8 performance, 2 efficiency), 32 GPU cores, 64gb device connected to power.

Data source for Android statistics: Real-world data anonymously aggregated from Chrome clients.
Share on Twitter Share on Facebook




Whether you prefer organizing your browser with tab groups, naming your windows, tab search, or another method, you have lots of features that help you get to the tabs you want. In this The Fast and the Curious post, we describe how we use what windows are visible to you to optimize Chrome, leading to 25.8% faster start up and 4.5% fewer crashes.


Background

For several years, to improve the user experience, Chrome has lowered the priority of background tabs[1]. For example, JavaScript is throttled in background tabs, and these tabs don’t render web content. This reduces CPU, GPU and memory usage, which leaves more memory, CPU and GPU for foreground tabs that the user actually sees. However, the logic was limited to tabs that weren't focused in their window, or windows that were minimized or otherwise moved offscreen.

Through experiments, we found that nearly 20% of Chrome windows are completely covered by other windows, i.e., occluded. If these occluded windows were treated like background tabs, our hypothesis was that we would see significant performance benefits. So, around three years ago, we started working on a project to track the occlusion state of each Chrome window in real time, and lower the priority of tabs in occluded windows. We called this project Native Window Occlusion, because we had to know about the location of native, non-Chrome windows on the user’s screen. (The location information is discarded immediately after it is used in the occlusion calculation.)

Calculating Native Window Occlusion

The Windows OS doesn’t provide a direct way to find out if a window is completely covered by other windows, so Chrome has to figure it out on its own. If we only had to worry about other Chrome windows, this would be simple because we know where Chrome windows are, but we have to consider all the non-Chrome windows a user might have open, and know about anything that happens that might change whether Chrome windows are occluded or not.

There are two main pieces to keeping track of which Chrome windows are occluded. The first is the occlusion calculation, which consists of iterating over the open windows on the desktop, in z-order (front to back) and seeing if the windows in front of a Chrome window completely cover it. The second piece is deciding when to do the occlusion calculation.

Calculating Occlusion

In theory, figuring out which windows are occluded is fairly simple. In practice, however, there are lots of complications, such as multi-monitor setups, virtual desktops, non-opaque windows, and even cloaked windows(!). This needs to be done with great care, because if we decide that a window is occluded when in fact it is visible to the user, then the area where the user expects to see web contents will be white. We also don’t want to block the UI thread while doing the occlusion calculation, because that could reduce the responsiveness of Chrome and degrade the user experience. So, we compute occlusion on a separate thread, as follows:
  1. Ignore minimized windows, since they’re not visible.
  2. Mark Chrome windows on a different virtual desktop as occluded.
  3. Compute the virtual screen rectangle, which combines the display monitors. This is the unoccluded screen rectangle.
  4. Iterate over the open windows on the desktop from front to back, ignoring invisible windows, transparent windows, floating windows (windows with style WS_EX_TOOLBAR), cloaked windows, windows on other virtual desktops, non-rectangular windows[2], etc. Ignoring these kinds of windows may cause some occluded windows to be considered visible (false negatives) but importantly it won’t lead to treating visible windows as occluded (false positives). For each window:
  • Subtract the window's area from the unoccluded screen rectangle.
  • If the window is a Chrome window, check if its area overlapped with the unoccluded area. If it didn’t, that means the Chrome window is completely covered by previous windows, so it is occluded.
  • Keep iterating until all Chrome windows are captured.
  • At this point, any Chrome window that we haven’t marked occluded is visible, and we’re done computing occlusion. Whew! Now we post a task to the UI thread to update the visibility of the Chrome windows.
  • This is all done without synchronization locks, so the occlusion calculation has minimal effect on the UI thread, e.g., it will not ever block the UI thread and degrade the user experience.
  • For more detailed implementation information, see the documentation.


    Deciding When to Calculate Occlusion

    We don’t want to continuously calculate occlusion because it would degrade the performance of Chrome, so we need to know when a window might become visible or occluded. Fortunately, Windows lets you track various system events, like windows moving or getting resized/maximized/minimized. The occlusion-calculation thread tells Windows that it wants to track those events, and when notified of an event, it examines the event to decide whether to do a new occlusion calculation. Because we may get several events in a very short time, we don’t calculate occlusion more than once every 16 milliseconds, which corresponds to the time a single frame is displayed, assuming a frame rate of 60 frames per second (fps).

    Some of the events we listen for are windows getting activated or deactivated, windows moving or resizing, the user locking or unlocking the screen, turning off the monitor, etc. We don’t want to calculate occlusion more than necessary, but we don’t want to miss an event that causes a window to become visible, because if we do, the user will see a white area where their web contents should be. It’s a delicate balance[3].

    The events we listen for are focused on whether a Chrome window is occluded. For example, moving the mouse generates a lot of events, and cursors generate an event for every blink, so we ignore events that aren’t for window objects. We also ignore events for most popup windows, so that tooltips getting shown doesn’t trigger an occlusion calculation.

    The occlusion thread tells Windows that it wants to know about various Windows events. The UI thread tells Windows that it wants to know when there are major state changes, e.g., the monitor is powered off, or the user locks the screen.





    Results

    This feature was developed behind an experiment to measure its effect and rolled out to 100% of Chrome Windows users in October 2020 as part of the M86 release. Our metrics show significant performance benefits with the feature turned on:
    A reason for the startup and first-contentful-paint improvements is when Chrome restores two or more full-screen windows when starting up, one of the windows is likely to be occluded. Chrome will now skip much of the work for that window, thus saving resources for the more important foreground window.

    Posted by David Bienvenu, Chrome Developer

    Data source for all statistics: Real-world data anonymously aggregated from Chrome clients.
    [1] Note that certain tabs are exempt from having their priority lowered, e.g., tabs playing audio or video.
    [2] Non-rectangular windows complicate the calculations and were thought to be rare, but it turns out non-rectangular windows are common on Windows 7, due to some quirks of the default Windows 7 theme.
    [3] When this was initially launched, we quickly discovered that Citrix users were getting white windows whenever another user locked their screen, due to Windows sending us session changed notifications for sessions that were not the current session. For the details, look here.
    Share on Twitter Share on Facebook