Interop 2025 continues the mission to make the web more consistent across browsers, building on 2024’s 95% interoperability score. This year, 19 focus areas target key developer needs and long-standing issues, including WebRTC improvements, Storage Access API, and CSS Zoom.
The post Launching Interop 2025 appeared first on Mozilla Hacks - the Web developer blog.
]]>The Interop Project is a collaboration between browser vendors and other platform implementors to provide users and web developers with high quality implementations of the web platform.
Each year we select a set of focus areas representing key areas where we want to improve interoperability. Encouraging all browser engines to prioritize common features ensures they become usable for web developers as quickly as possible.
Progress in each engine and the overall Interop score are measured by tracking the pass rate of a set of web-platform tests for each focus area using the Interop dashboard.
Before introducing the new focus areas for this year, we should look at the successes of Interop 2024.
The Interop score, measuring the percentage of tests that pass in all of the major browser engines, has reached 95% in latest browser releases, up from only 46% at the start of the year. In pre-release browsers it’s even higher — over 97%. This is a huge win that shows how effective Interop can be at aligning browsers with the specifications and each other.
Each browser engine individually achieved a test pass score of 98% in stable browser releases and 99% in pre-release, with Firefox finishing slightly ahead with 98.8% in release and 99.1% in Nightly.
For users, this means features such as requestVideoFrameCallback, Declarative Shadow DOM, and Popover, which a year ago only had limited availability, are now implemented interoperably in all browsers.
Building on Interop 2024’s success, we are excited to continue the project into 2025. This year we have 19 focus areas; 17 new and 2 from previous years. A full description of all the focus areas is available in the Interop repository.
From 2024 we’re carrying forward Layout (really “Flexbox and Grid”), and Pointer and Mouse Events. These are important platform primitives where the Interop project has already led to significant interoperability improvements. However, with technologies that are so fundamental to the modern web we think it’s important to set ambitious goals and continue to prioritize these areas, creating rock solid foundations for developers to build on.
The new focus areas represent a broad cross section of the platform. Many of them — like Anchor Positioning and View Transitions — have been identified from clear developer demand in surveys such as State of HTML and State of CSS. Inclusion in Interop will ensure they’re usable as soon as possible.
In addition to these high profile new features, we’d like to highlight some lesser-known focus areas and explain why we’re pleased to see them in Interop.
At Mozilla user privacy is a core principle. One of the most common methods for tracking across the web is via third-party cookies. When sites request data from external services, the service can store data that’s re-sent when another site uses the same service. Thus the service can follow the user’s browsing across the web.
To counter this, Firefox’s “Total Cookie Protection” partitions storage so that third parties receive different cookie data per site and thus reduces tracking. Other browsers have similar policies, either by default or in private browsing modes.
However, in some cases, non-tracking workflows such as SSO authentication depend on third party cookies. Storage partitioning can break these workflows, and browsers currently have to ship site-specific workarounds. The Storage Access API solves this by letting sites request access to the unpartitioned cookies. Interop here will allow browsers to advance privacy protections without breaking critical functionality.
The Web Compat focus area is unique in Interop. It isn’t about one specific standard, but focuses on browser bugs known to break sites. These are often in older parts of the platform with long-standing inconsistencies. Addressing these requires either aligning implementations with the standard or, where that would break sites, updating the standard itself.
One feature in the Web Compat focus area for 2025 is CSS Zoom. Originally a proprietary feature in Internet Explorer, it allowed scaling layout by adjusting the computed dimensions of elements at a time before CSS transforms. WebKit reverse-engineered it, bringing it into Blink, but Gecko never implemented it, due to the lack of a specification and the complexities it created in layout calculations.
Unfortunately, a feature not being standardised doesn’t prevent developers from using it. Use of CSS Zoom led to layout issues on some sites in Firefox, especially on mobile. We tried various workarounds and have had success using interventions to replace zoom
with CSS transforms on some affected sites, but an attempt to implement the same approach directly in Gecko broke more sites than it fixed and was abandoned.
The situation seemed to be at an impasse until 2023 when Google investigated removing CSS Zoom from Chromium. Unfortunately, it turned out that some use cases, such as Microsoft Excel Online’s worksheet zoom, depended on the specific behaviour of CSS Zoom, so removal was not feasible. However, having clarified the use cases, the Chromium team was able to propose a standardized model for CSS Zoom that was easier to implement without compromising compatibility. This proposal was accepted by the CSS WG and led to the first implementation of CSS Zoom in Firefox 126, 24 years after it was first released in Internet Explorer.
With Interop 2025, we hope to bring the story of CSS Zoom to a close with all engines finally converging on the same behaviour, backed by a real open standard.
Video conferencing is now an essential feature of modern life, and in-browser video conferencing offers both ease of use and high security, as users are not required to download a native binary. Most web-based video conferencing relies on the WebRTC API, which offers high level tools for implementing real time communications. However, WebRTC has long suffered from interoperability issues, with implementations deviating from the standards and requiring nonstandard extensions for key features. This resulted in confusion and frustration for users and undermined trust in the web as a reliable alternative to native apps.
Given this history, we’re excited to see WebRTC in Interop for the first time. The main part of the focus area is the RTCRtpScriptTransform API, which enables cross browser end-to-end encryption. Although there’s more to be done in the future, we believe Interop 2025 will be a big step towards making WebRTC a truly interoperable web standard.
The focus area for Removing Mutation Events is the first time Interop has been used to coordinate the removal of a feature. Mutation events fire when the DOM changes, meaning the event handlers run on the critical path for DOM manipulation, causing major performance issues, and significant implementation complexity. Despite the fact that they have been implemented in all engines, they’re so problematic that they were never standardised. Instead, mutation observers were developed as a standard solution for the use cases of mutation events without their complexity or performance problems. Almost immediately after mutation observers were implemented, a Gecko bug was filed:
“We now have mutation observers, and we’d really like to kill support for mutation events at some point in the future. Probably not for a while yet.”
That was in 2012. The difficulty is the web’s core commitment to backwards compatibility. Removing features that people rely on is unacceptable. However, last year Chromium determined that use of mutation events had dropped low enough to allow a “deprecation trial“, disabling mutation events by default, but allowing specific sites to re-enable them for a limited time.
This is good news, but long-running deprecation trials can create problems for other browsers. Disabling the feature entirely can break sites that rely on the opt-out. On the other hand we know from experience that some sites actually function better in a browser with mutation events disabled (for example, because they are used for non-critical features, but impact performance).
By including this removal in Interop 2025, we can ensure that mutation events are fully removed in 2025 and end the year with reduced platform complexity and improved web performance.
As well as focus areas, the Interop project also runs investigations aimed at long-term interoperability improvements to areas where we can’t measure progress using test pass rates. For example Interop investigations can be looking to add new test capabilities, or increase the test coverage of platform features.
The accessibility testing started as part of Interop 2023. It has added APIs for testing accessible name and computed role, as well as more than 1000 new tests. Those tests formed the Accessibility focus area in Interop 2024, which achieved an Interop score of 99.7%.
In 2025 the focus will be expanding the testability of accessibility features. Mozilla is working on a prototype of AccessibleNode; an API that enables verifying the shape of the accessibility tree, along with its states and properties. This will allow us to test the effect of features like CSS display: contents or ::before/::after on the accessibility tree.
Today, all Interop focus areas are scored in desktop browsers. However, some features are mobile-specific or have interoperability challenges unique to mobile.
Improving mobile testing has been part of Interop since 2023, and in that time we’ve made significant progress standing up mobile browsers in web-platform-tests CI systems. Today we have reliable runs of Chrome and Firefox Nightly on Android, and Safari runs on iOS are expected soon. However, some parts of our test framework were written with desktop-specific assumptions in the design, so the focus for 2025 will be on bringing mobile testing to parity with desktop. The goal is to allow mobile-specific focus areas in future Interop projects, helping improve interoperability across all device types.
The unique and distinguishing feature of the web platform is its basis in open standards, providing multiple implementations and user choice. Through the Interop project, web platform implementors collaborate to ensure that these core strengths are matched by a seamless user experience across browsers.
With focus areas covering some of the most important new and existing areas of the modern web, Interop 2025 is set to deliver some of the biggest interoperability wins of the project so far. We are confident that Firefox and other browsers will rise to the challenge, providing users and developers with a more consistent and reliable web platform.
The post Launching Interop 2025 appeared first on Mozilla Hacks - the Web developer blog.
]]>Mozilla and Filament have introduced Uniffi for React Native, a tool that allows developers to leverage the safety and performance benefits of Rust in cross-platform React Native apps.
The post Introducing Uniffi for React Native: Rust-Powered Turbo Modules appeared first on Mozilla Hacks - the Web developer blog.
]]>Today Mozilla and Filament are releasing Uniffi for React Native, a new tool we’ve been using to build React Native Turbo Modules in Rust, under an open source license. This allows millions of developers writing cross-platform React Native apps to use Rust – a modern programming language known for its safety and performance benefits to build single implementations of their app’s core logic to work seamlessly across iOS and Android.
This is a big win for us and for Filament who co-developed the library with Mozilla and James Hugman, the lead developer. We think it will be awesome for many other developers too. Less code is good. Memory safety is good. Performance is good. We get all three, plus the joy of using a language we love in more places.
For those familiar with React Native, it’s a great framework for creating cross-platform apps, but it has its challenges. React Native apps rely on a single JavaScript thread, which can slow things down when handling complex tasks. Developers have traditionally worked around this by writing code twice – once for iOS and once for Android – or by using C++, which can be difficult to manage. Uniffi for React Native offers a better solution by enabling developers to offload heavy tasks to Rust, which is now easy to integrate with React Native. As a result, you’ve got faster, smoother apps and a streamlined development process.
Unifii for React Native is a uniFFI bindings generator for using Rust from React Native via Turbo Modules. It lets us work at an abstraction level high enough to stay focused on our applications’s needs rather than getting lost in the gory technical details of bespoke native cross-platform development It provides tooling to generate:
We’re stoked about this work continuing. In 2020, we started with Uniffi as a modern day ‘write once; run anywhere’ toolset for Rust. Uniffi has come a long way since we developed the technology as a bit of a hack to get us a single implementation of Firefox Sync’s core (in Rust) that we could then deploy to both our Android and iOS apps! Since then Mozilla has used uniffi-rs to successfully deploy Rust in mobile and desktop products used by hundreds of millions of users. This Rust code runs important subsystems such as bookmarks and history sync, Firefox Suggest, telemetry and experimentation. Beyond Mozilla, Uniffi is used in Android (in AOSP), high-profile security products and some complex libraries familiar to the community.
Currently the Uniffi for React Native project is an early release. We don’t have a cool landing page or examples in the repo (coming!), but open source contributor Johannes Marbach has already been sponsored by Unomed to use Uniffi for React Native to create a React Native Library for the Matrix SDK .
Need an idea on how you might give it a whirl? I’ve got two uses that we’re very excited about:
1) Use Rust to offload computationally heavy code to a multi-threaded/memory-safe subsystem to escape single-threaded JS performance bottlenecks in React Native. If you know, you know.
2) Leverage the incredible library of Rust crates in your React Native app. One of the Filament devs showed how powerful this is, recently. With a rudimentary knowledge of Rust, they were able to find a fast blurhashing library on crates.io to replace a slow Typescript implementation and get it running the same day. We’re hoping we can really improve the tooling even more to make this kind of optimization as easy as possible.
Uniffi represents a step forward in cross-platform development, combining the power of Rust with the flexibility of React Native to unlock new possibilities for app developers.
We’re excited to have the community explore what’s possible. Please check out the library on Github and jump into the conversation on Matrix.
Disclosure: in addition to this collaboration, Mozilla Ventures is an investor in Filament.
The post Introducing Uniffi for React Native: Rust-Powered Turbo Modules appeared first on Mozilla Hacks - the Web developer blog.
]]>Discover the latest release of Llamafile 0.8.14, an open-source AI tool by Mozilla Builders. With a new command-line chat interface, enhanced performance, and support for powerful models, Llamafile makes it easy to run large language models (LLMs) on your own hardware. Learn more about the updates and how to get involved with this cutting-edge project.
The post Llamafile v0.8.14: a new UI, performance gains, and more appeared first on Mozilla Hacks - the Web developer blog.
]]>We’ve just released Llamafile 0.8.14, the latest version of our popular open source AI tool. A Mozilla Builders project, Llamafile turns model weights into fast, convenient executables that run on most computers, making it easy for anyone to get the most out of open LLMs using the hardware they already have.
The key feature of this new release is our colorful new command line chat interface. When you launch a Llamafile we now automatically open this new chat UI for you, right there in the terminal. This new interface is fast, easy to use, and an all around simpler experience than the Web-based interface we previously launched by default. (That interface, which our project inherits from the upstream llama.cpp project, is still available and supports a range of features, including image uploads. Simply point your browser at port 8080 on localhost).
This new chat UI is just the tip of the iceberg. In the months since our last blog post here, lead developer Justine Tunney has been busy shipping a slew of new releases, each of which have moved the project forward in important ways. Here are just a few of the highlights:
Llamafiler: We’re building our own clean sheet OpenAI-compatible API server, called Llamafiler. This new server will be more reliable, stable, and most of all faster than the one it replaces. We’ve already shipped the embeddings endpoint, which runs three times as fast as the one in llama.cpp. Justine is currently working on the completions endpoint, at which point Llamafiler will become the default API server for Llamafile.
Performance improvements: With the help of open source contributors like k-quant inventor @Kawrakow Llamafile has enjoyed a series of dramatic speed boosts over the last few months. In particular, pre-fill (prompt evaluation) speed has improved dramatically on a variety of architectures:
When combined with the new high-speed embedding server described above, Llamafile has become one of the fastest ways to run complex local AI applications that use methods like retrieval augmented generation (RAG).
Support for powerful new models: Llamafile continues to keep pace with progress in open LLMs, adding support for dozens of new models and architectures, ranging in size from 405 billion parameters all the way down to 1 billion. Here are just a few of the new Llamafiles available for download on Hugging Face:
Whisperfile, speech-to-text in a single file: Thanks to contributions from community member @cjpais, we’ve created Whisperfile, which does for whisper.cpp what Llamafile did for llama.cpp: that is, turns it into a multi-platform executable that runs nearly everywhere. Whisperfile thus makes it easy to use OpenAI’s Whisper technology to efficiently convert speech into text, no matter which kind of hardware you have.
Our goal is for Llamafile to become a rock-solid foundation for building sophisticated locally-running AI applications. Justine’s work on the new Llamafiler server is a big part of that equation, but so is the ongoing work of supporting new models and optimizing inference performance for as many users as possible. We’re proud and grateful that some of the project’s biggest breakthroughs in these areas, and others, have come from the community, with contributors like @Kawrakow, @cjpais, @mofosyne, and @Djip007 routinely leaving their mark.
We invite you to join them, and us. We welcome issues and PRs in our GitHub repo. And we welcome you to become a member of Mozilla’s AI Discord server, which has a dedicated channel just for Llamafile where you can get direct access to the project team. Hope to see you there!
The post Llamafile v0.8.14: a new UI, performance gains, and more appeared first on Mozilla Hacks - the Web developer blog.
]]>As AI continues to evolve, so do the threats against it. As these GenAI systems become more sophisticated and widely adopted, ensuring their security and ethical use becomes paramount. 0Din is a groundbreaking GenAI bug bounty program dedicated specifically to help secure GenAI systems and beyond. In this blog, you'll learn about 0Din, how it works, and how you can participate and make a difference in securing our AI future.
The post 0Din: A GenAI Bug Bounty Program – Securing Tomorrow’s AI Together appeared first on Mozilla Hacks - the Web developer blog.
]]>As AI continues to evolve, so do the threats against it. As these GenAI systems become more sophisticated and widely adopted, ensuring their security and ethical use becomes paramount. 0Din is a groundbreaking GenAI bug bounty program dedicated specifically to help secure GenAI systems and beyond. In this blog, you’ll learn about 0Din, how it works, and how you can participate and make a difference in securing our AI future.
0Din is an innovative GenAI bug bounty program that seeks to identify and mitigate vulnerabilities in AI systems. By harnessing the collective expertise of the global security community, 0Din aims to build a more secure AI landscape. The program rewards individuals who discover and report security flaws, ensuring that AI systems remain robust and trustworthy.
Participating in the 0Din bug bounty program is straightforward. Here’s a step-by-step overview:
For detailed information on the vulnerability scope and processing policy, visit the 0Din Policy Page
0Din covers a broad range of vulnerabilities. Here are some examples:
Each type of vulnerability has a specific reward based on its severity, ranging from low to high. The Disclosure Mappings Guideline provides a comprehensive list of vulnerabilities and their corresponding rewards.
0Din welcomes participants from around the world. Here’s who can participate:
– Security Researchers: Professionals dedicated to discovering and mitigating security risks.
– Developers: Individuals with a strong understanding of AI and its underlying technologies.
– Tech Enthusiasts: Anyone with a keen interest in AI security and the technical skills to identify vulnerabilities.
To ensure a fair and effective program, participants must adhere to 0Din’s Vulnerability Processing and Disclosure Policy. This policy outlines the proper procedures for reporting vulnerabilities and ensures that all submissions are handled with integrity and respect.
0Din’s vulnerability processing and disclosure policy is designed to ensure transparency and fairness. Key points include:
For a detailed policy overview, refer to the 0Din Policy Page
In an era where AI plays an increasingly vital role in our lives, ensuring its security is paramount. 0Din offers a unique opportunity to contribute to this critical field while being rewarded for your expertise. By participating in the 0Din bug bounty program, you can help build a safer and more secure AI future. Join us today and make a difference in the world of GenAI security.
The post 0Din: A GenAI Bug Bounty Program – Securing Tomorrow’s AI Together appeared first on Mozilla Hacks - the Web developer blog.
]]>We’re pleased to announce that, as of version 23, the Puppeteer browser automation library now has first-class support for Firefox. This means that it’s now easy to write automation and perform end-to-end testing using Puppeteer, and run against both Chrome and Firefox.
The post Announcing Official Puppeteer Support for Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>We’re pleased to announce that, as of version 23, the Puppeteer browser automation library now has first-class support for Firefox. This means that it’s now easy to write automation and perform end-to-end testing using Puppeteer, and run against both Chrome and Firefox.
To get started, simply set the product to “firefox
” when starting Puppeteer:
import puppeteer from "puppeteer";
const browser = await puppeteer.launch({
browser: "firefox"
});
const page = await browser.newPage();
// ...
await browser.close();
As with Chrome, Puppeteer is able to download and launch the latest stable version of Firefox, so running against either browser should offer the same developer experience that Puppeteer users have come to expect.
Whilst the features offered by Puppeteer won’t be a surprise, bringing support to multiple browsers has been a significant undertaking. The Firefox support is not based on a Firefox-specific automation protocol, but on WebDriver BiDi, a cross browser protocol that’s undergoing standardization at the W3C, and currently has implementation in both Gecko and Chromium. This use of a cross-browser protocol should make it much easier to support many different browsers going forward.
Later in this post we’ll dive into some of the more technical background behind WebDriver BiDi. But first we’d like to call out how today’s announcement is a great demonstration of how productive collaboration can advance the state of the art on the web. Developing a new browser automation protocol is a lot of work, and great thanks goes to the Puppeteer team and the other members of the W3C Browser Testing and Tools Working Group, for all their efforts in getting us to this point.
You can also check out the Puppeteer team’s post about making WebDriver BiDi production ready.
For long-time Puppeteer users, the features available are familiar. However for people in other automation and testing ecosystems — particularly those that until recently relied entirely on HTTP-based WebDriver — this section outlines some of the new functionality that WebDriver BiDi makes possible to implement in a cross-browser manner.
A common requirement when testing web apps is to ensure that there are no unexpected errors reported to the console. This is also a case where an event-based protocol shines, since it avoids the need to poll the browser for new log messages.
import puppeteer from "puppeteer";
const browser = await puppeteer.launch({
browser: "firefox"
});
const page = await browser.newPage();
page.on('console', msg => {
console.log(`[console] ${msg.type()}: ${msg.text()}`);
});
await page.evaluate(() => console.debug('Some Info'));
await browser.close();
Output:
[console] debug: Some Info
Often when testing a reactive layout it’s useful to be able to ensure that the layout works well at multiple screen dimensions, and device pixel ratios. This can be done by using a real mobile browser, either on a device, or on an emulator. However for simplicity it can be useful to perform the testing on a desktop set up to mimic the viewport of a mobile device. The example below shows loading a page with Firefox configured to emulate the viewport size and device pixel ratio of a Pixel 5 phone.
import puppeteer from "puppeteer";
const device = puppeteer.KnownDevices["Pixel 5"];
const browser = await puppeteer.launch({
browser: "firefox"
});
const page = await browser.newPage();
await page.emulate(device);
const viewport = page.viewport();
console.log(
`[emulate] Pixel 5: ${viewport.width}x${viewport.height}` +
` (dpr=${viewport.deviceScaleFactor}, mobile=${viewport.isMobile})`
);
await page.goto("https://www.mozilla.org");
await browser.close();
Output:
[emulate] Pixel 5: 393x851 (dpr=3, mobile=true)
A common requirement for testing is to be able to track and intercept network requests. Interception is especially useful for avoiding requests to third party services during tests, and providing mock response data. It can also be used to handle HTTP authentication dialogs, and override parts of the request and response, for example adding or removing headers. In the example below we use network request interception to block all requests to web fonts on a page, which might be useful to ensure that these fonts failing to load doesn’t break the site layout.
import puppeteer from "puppeteer";
const browser = await puppeteer.launch({
browser: 'firefox'
});
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on("request", request => {
if (request.url().includes(".woff2")) {
// Block requests to custom user fonts.
console.log(`[intercept] Request aborted: ${request.url()}`);
request.abort();
} else {
request.continue();
}
});
const response = await page.goto("https://support.mozilla.org");
console.log(
`[navigate] status=${response.status()} url=${response.url()}`
);
await browser.close();
Output:
[intercept] Request aborted: https://assets-prod.sumo.prod.webservices.mozgcp.net/static/Inter-Bold.3717db0be15085ac.woff2 [navigate] status=200 url=https://support.mozilla.org/en-US/
Often automation tooling wants to provide custom functionality that can be implemented in JavaScript. Whilst WebDriver has always allowed injecting scripts, it wasn’t possible to ensure that an injected script was always run before the page started loading, making it impossible to avoid races between the page scripts and the injected script.
WebDriver BiDi provides “preload” scripts which can be run before a page is loaded. It also provides a means to emit custom events from scripts. This can be used, for example, to avoid polling for expected elements, but instead using a mutation observer that fires as soon as the element is available. In the example below we wait for the <title> element to appear on the page, and log its contents.
import puppeteer from "puppeteer";
const browser = await puppeteer.launch({
browser: 'firefox',
});
const page = await browser.newPage();
const gotMessage = new Promise(resolve =>
page.exposeFunction("sendMessage", async message => {
console.log(`[script] Message from pre-load script: ${message}`);
resolve();
})
);
await page.evaluateOnNewDocument(() => {
const observer = new MutationObserver(mutationList => {
for (const mutation of mutationList) {
if (mutation.type === "childList") {
for (const node of mutation.addedNodes) {
if (node.tagName === "TITLE") {
sendMessage(node.textContent);
}
}
}
};
});
observer.observe(document.documentElement, {
subtree: true,
childList: true,
});
});
await page.goto("https://support.mozilla.org");
await gotMessage;
await browser.close();
Output:
[script] Message from pre-load script: Mozilla Support
Until recently people wishing to automate browsers had two main choices:
Unfortunately both of those options come with significant tradeoffs. The “classic” WebDriver API is HTTP-based, and its model involves automation sending a command to the browser and waiting for a response. That works well for automation scenarios where you load a page and then verify, for example, that some element is displayed, but the inability to get events — e.g. console logs — back from the browser, or run multiple commands concurrently, makes the API a poor fit for more advanced use cases.
By contrast, browser-specific APIs have generally been designed around supporting the complex use cases of in-browser devtools. This has given them a feature set far in advance of what’s possible using WebDriver, as they need to support use cases such as recording console logs, or network requests.
Therefore, browser automation clients have been forced to make the choice between supporting many browsers using a single protocol and providing a limited feature set, or providing a richer feature set but having to implement multiple protocols to provide functionality separately for each supported browser. This obviously increased the cost and complexity of creating great cross-browser automation, which isn’t a good situation, especially when developers commonly cite cross-browser testing as one the main pain points in developing for the web.
Long time developers might notice the analogy here to the situation with editors before the development of Language Server Protocol (LSP). At that time each text editor or IDE had to implement bespoke support for each different programming language. That made it hard to get support for a new language into all the tools that developers were using. The advent of LSP changed that by providing a common protocol that could be supported by any combination of editor and programming language. For a new programming language like TypeScript to be supported across all editors it no longer needs to get them to add support one-by-one; it only needs to provide an LSP server and it will automatically be supported across any LSP-supporting editor. The advent of this common protocol has also enabled things that were hard to imagine before. For example specific libraries like Tailwind getting their own LSP implementation to enable bespoke editor functionality.
So to improve cross-browser automation we’ve taken a similar approach: developing WebDriver BiDi, which brings the automation featureset previously limited to browser-specific protocols to a standardized protocol that can be implemented by any browser and used by any automation tooling in any programming language.
At Mozilla we see this strategy of standardizing protocols in order to remove barriers to entry, allow a diverse ecosystem of interoperable implementations to flourish, and enable users to choose those best suited to their needs as a key part of our manifesto and web vision.
For more details about the design of WebDriver BiDi and how it relates to classic WebDriver, please see our earlier posts.
As part of our early work on improving cross-browser testing, we shipped a partial implementation of CDP, limited to a few commands and events needed to support testing use cases. This was previously the basis of experimental support for Firefox in Puppeteer. However, once it became clear that this was not the way forward for cross-browser automation, effort on this was stopped. As a result it is unmaintained and doesn’t work with modern Firefox features such as site isolation. Therefore support is scheduled to be removed at the end of 2024.
If you are currently using CDP with Firefox, and don’t know how to transition to WebDriver BiDi, please reach out using one of the channels listed at the bottom of this post, and we will discuss your requirements.
Although Firefox is now officially supported in Puppeteer, and has enough functionality to cover many automation and testing scenarios, there are still some APIs that remain unsupported. These broadly fall into three categories (consult the Puppeteer documentation for a full list):
We expect to fill these gaps going forward. To help prioritize, we’re interested in your feedback: Please try running your Puppeteer tests in Firefox! If you’re unable to get them in Firefox because of a bug or missing feature, please let us know using one of the methods below so that we can take it into account when planning our future standards and implementation work:
The post Announcing Official Puppeteer Support for Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>Process separation remains one of the most important parts of the Firefox security model and securing our IPC (Inter-Process Communication) interfaces is crucial to keep privileges in the different processes separated. We take a more detailed look at our newest tool for finding vulnerabilities in these interfaces – snapshot fuzzing.
The post Snapshots for IPC Fuzzing appeared first on Mozilla Hacks - the Web developer blog.
]]>Process separation is one of the cornerstones of the Firefox security model. Instead of running Firefox as a single process, multiple processes with different privileges communicate with each other via Inter-Process Communication (IPC). For example: loading a website, processing its resources, and rendering it is done by an isolated Content Process with a very restrictive sandbox, whereas critical operations such as file system access are only allowed to be executed in the Parent Process.
By running potentially harmful code with lower privileges, the impact of a potential code execution vulnerability is mitigated. In order to gain full control, the attacker now needs to find a second vulnerability that allows bypassing these privilege restrictions – which is colloquially known as a “sandbox escape”.
In order to achieve a sandbox escape, an attacker essentially has two options: The first one is to directly attack the underlying operating system from within the compromised content process. Since every process needs to interact with the operating system for various tasks, an attacker can focus on finding bugs in these interfaces to elevate privileges.
Since we have already deployed changes to Firefox that severely limit the OS interfaces exposed to low-privilege processes, the second attack option becomes more interesting: Exploiting bugs in privileged IPC endpoints. Since low privilege content processes need to interact with the privileged parent process, the parent needs to expose certain interfaces.
If these interfaces do not perform the necessary security checks or contain memory safety errors, the content process might be able to exploit them and perform actions with higher privileges, possibly leading to an entire parent process takeover.
Traditionally , fuzzing has had multiple success stories in the history of Mozilla and allowed us to find all sorts of problems including security vulnerabilities in our code. However, applying fuzzing to our critical IPC interfaces has historically always been difficult. This is primarily because IPC interfaces cannot be tested in isolation, i.e. require the full browser for testing, and because incorrect usage of IPC interfaces can force browser restarts which introduce a prohibitive amount of latency between iterations.
To find a solution to this challenge, we engaged with the research community to apply a new method of rewinding application state during fuzzing. We saw our first results with this approach in 2021 using an experimental prototype that would later become the open source snapshot fuzzing tool called “Nyx”.
As of 2024, we are happy to announce that we are now running various snapshot fuzzing targets for IPC in production. Snapshot fuzzing is a new technology that has become more popular in recent years and we are proud of our role in bringing it from concept to practicality.
Using this technology we have already been able to identify and fix a number of potential problems in our IPC layer and we will continue to improve our testing to provide you with the most secure version of Firefox.
If you’d like to know more, or even consider contributing to Mozilla, check out our post on the security blog explaining the technical architecture behind this new tool.
The post Snapshots for IPC Fuzzing appeared first on Mozilla Hacks - the Web developer blog.
]]>Today we’re proud to announce the next Mozilla Builders project: sqlite-vec. Led by independent developer Alex Garcia, this project brings vector search functionality to the beloved SQLite embedded database. Alex has been working on this problem for a while, and we think his latest approach will have a great impact by providing application developers with a powerful new tool for building Local AI applications.
The post Sponsoring sqlite-vec to enable more powerful Local AI applications appeared first on Mozilla Hacks - the Web developer blog.
]]>Mozilla’s recently announced Builders program supports projects that advance the cause of open source AI. Our inaugural theme is “Local AI”: AI-powered applications that can run entirely locally on consumer devices like desktops, laptops, and smartphones. We are keenly interested in this area because it fosters greater privacy and control by putting AI technology directly into the hands of users. It also democratizes AI development by reducing costs, making powerful tools accessible to individual developers and small communities.
As a part of Mozilla Builders, we’ve launched an accelerator that developers can apply to join, but in parallel we have also been proactively recruiting specific open source projects that we feel have the potential to move AI forward and would benefit from Mozilla’s investment, expertise, and support. Our first such Builders project is llamafile, led by open source developer Justine Tunney. llamafile makes open LLMs run fast on everyday consumer hardware while also making open source AI dramatically more accessible and usable.
Today we’re proud to announce the next Mozilla Builders project: sqlite-vec. Led by independent developer Alex Garcia, this project brings vector search functionality to the beloved SQLite embedded database.
Alex has been working on this problem for a while, and we think his latest approach will have a great impact by providing application developers with a powerful new tool for building Local AI applications.
“I’m very excited for sqlite-vec to be a Mozilla Builders project”, said Alex Garcia. “I care a lot about building software that is easy to get started with and works everywhere, a trait obviously shared by other Builders projects like llamafile. AI tools are no exception — a vector database that runs everywhere means more equitable access for everyone.”
Vector databases are emerging as a key component of the AI application stack, supporting uses like retrieval augmented generation (RAG) and semantic search. But few of today’s available databases are designed for on-device use, making it harder to offer functionality like RAG in Local AI apps. SQLite is a mature and widely-deployed embedded database – in fact, it’s even built-into Mozilla’s own Firefox web browser.
The prospect of a vector-enabled SQLite opens up many new possibilities for locally-running AI applications. For example, imagine a chatbot that can answer questions about your personal data without letting a single byte of that data leave the privacy and safety of your laptop.
We’re excited to be working with Alex and supporting his efforts on sqlite-vec. We encourage you to follow the project’s progress, and Alex welcomes your contributions. And Mozilla’s Discord server is a great place to connect with Alex, the Mozilla Builders team, and everyone else in our growing community of open source practitioners. Please stop by and introduce yourself.
The post Sponsoring sqlite-vec to enable more powerful Local AI applications appeared first on Mozilla Hacks - the Web developer blog.
]]>Firefox 130 will introduce an experimental new capability to automatically generate alt-text for images using a fully private on-device AI model. The feature will be available as part of Firefox’s built-in PDF editor, and our end goal is to make it available in general browsing for users with screen readers.
The post Experimenting with local alt text generation in Firefox Nightly appeared first on Mozilla Hacks - the Web developer blog.
]]>As discussed on Mozilla Connect, Firefox 130 will introduce an experimental new capability to automatically generate alt-text for images using a fully private on-device AI model. The feature will be available as part of Firefox’s built-in PDF editor, and our end goal is to make it available in general browsing for users with screen readers.
Web pages have a fundamentally simple structure, with semantics that allow the browser to interpret the same content differently for different people based on their own needs and preferences. This is a big part of what we think makes the Web special, and what enables the browser to act as a user agent, responsible for making the Web work for people.
This is particularly useful for assistive technology such as screen readers, which are able to work alongside browser features to reduce obstacles for people to access and exchange information. For static web pages, this generally can be accomplished with very little interaction from the site, and this access has been enormously beneficial to many people.
But even for a simple static page there are certain types of information, like alternative text for images, that must be provided by the author to provide an understandable experience for people using assistive technology (as required by the spec). Unfortunately, many authors don’t do this: the Web Almanac reported in 2022 that nearly half of images were missing alt text.
Until recently it’s not been feasible for the browser to infer reasonably high quality alt text for images, without sending potentially sensitive data to a remote server. However, latest developments in AI have enabled this type of image analysis to happen efficiently, even on a CPU.
We are adding a feature within the PDF editor in Firefox Nightly to validate this approach. As we develop it further and learn from the deployment, our goal is to offer it for users who’d like to use it when browsing to help them better understand images which would otherwise be inaccessible.
We are using Transformer-based machine learning models to describe images. These models are getting good at describing the contents of the image, yet are compact enough to operate on devices with limited resources. While can’t outperform a large language model like GPT-4 Turbo with Vision, or LLaVA, they are sufficiently accurate to provide valuable insights on-device across a diversity of hardware.
Model architectures like BLIP or even VIT that were trained on datasets like COCO (Common Object In Context) or Flickr30k are good at identifying objects in an image. When combined with a text decoder like OpenAI’s GPT-2, they can produce alternative text with 200M or fewer parameters. Once quantized, these models can be under 200MB on disk, and run in a couple of seconds on a laptop – a big reduction compared to the gigabytes and resources an LLM requires.
The image below (pulled from the COCO dataset) is described by:
Both small models lose accuracy compared to the description provided by a person, and the baseline model is confused by the hands position. The Firefox model is doing slightly better in that case, and captures what is important.
What matters can be suggestive in any case. Notice how the person did not write about the office settings or the cherries on the cake, and specified that the candles were long.
If we run the same image on a model like GPT-4o, the results are extremely detailed:
The image depicts a group of people gathered around a cake with lit candles. The focus is on the cake, which has a red jelly topping and a couple of cherries. There are several lit candles in the foreground. In the background, there is a woman smiling, wearing a gray turtleneck sweater, and a few other people can be seen, likely in an office or indoor setting. The image conveys a celebratory atmosphere, possibly a birthday or a special occasion.
But such level of detail in alt text is overwhelming and doesn’t prioritize the most important information. Brevity is not the only goal, but it’s a helpful starting point, and pithy accuracy in a first draft allows content creators to focus their edits on missing context and details.
So if we ask the LLM for a one-sentence description, we get:
A group of people in an office celebrates with a lit birthday cake in the foreground and a smiling woman in the background.
This has more detail than our small model, but can’t be run locally without sending your image to a server.
Running inference locally with small models offers many advantages:
Firefox Translations uses the Bergamot project powered by the Marian C++ inference runtime. The runtime is compiled into WASM, and there’s a model file for each translation task.
For example, if you run Firefox in French and visit an English page, Firefox will ask if you want to translate it to French and download the English-to-French model (~20MiB) alongside the inference runtime. This is a one-shot download: translations will happen completely offline once those files are on disk.
The WASM runtime and models are both stored in the Firefox Remote Settings service, which allows us to distribute them at scale and manage versions.
The inference task runs in a separate process, which prevents the browser or one of its tabs from crashing if the inference runtime crashes.
We’ve decided to embed the ONNX runtime in Firefox Nightly along with the Transformers.js library to extend the translation architecture to perform different inference work.
Like Bergamot, the ONNX runtime has a WASM distribution and can run directly into the browser. The ONNX project has recently introduced WebGPU support, which will eventually be activated in Firefox Nightly for this feature.
Transformers.js provides a Javascript layer on top of the ONNX inference runtime, making it easy to add inference for a huge list of model architectures. The API mimics the very popular Python library. It does all the tedious work of preparing the data that is passed to the runtime and converting the output back to a usable result. It also deals with downloading models from Hugging Face and caching them.
From the project’s documentation, this is how you can run a sentiment analysis model on a text:
import { pipeline } from '@xenova/transformers'; // Allocate a pipeline for sentiment-analysis let pipe = await pipeline('sentiment-analysis'); let out = await pipe('I love transformers!'); // [{'label': 'POSITIVE', 'score': 0.999817686}]
Using Transformers.js gives us confidence when trying out a new model with ONNX. If its architecture is listed in the Transformers.js documentation, that’s a good indication it will work for us.
To vendor it into Firefox Nightly, we’ve slightly changed its release to distribute ONNX separately from Transformers.js, dropped Node.js-related pieces, and fixed those annoying eval() calls the ONNX library ships with. You can find the build script here which was used to populate that vendor directory.
From there, we reused the Translation architecture to run the ONNX runtime inside its own process, and have Transformers.js run with a custom model cache system.
The Transformers.js project can use local and remote models and has a caching mechanism using the browser cache. Since we are running inference in an isolated web worker, we don’t want to provide access to the file system or store models inside the browser cache. We also don’t want to use Hugging Face as the model hub in Firefox, and want to serve model files from our own servers.
Since Transformers.js provides a callback for a custom cache, we have implemented a specific model caching layer that downloads files from our own servers and caches them in IndexedDB.
As the project grows, we anticipate the browser will store more models, which can take up significant space on disk. We plan to add an interface in Firefox to manage downloaded models so our users can list them and remove some if needed.
Ankur Kumar released a popular model on Hugging Face to generate alt text for images and blogged about it. This model was also published as ONNX weights by Joshua Lochner so it could be used in Transformers.js, see https://huggingface.co/Xenova/vit-gpt2-image-captioning
The model is doing a good job – even if in some cases we had better results with https://huggingface.co/microsoft/git-base-coco – But the GIT architecture is not yet supported in ONNX converters, and with less than 200M params, most of the accuracy is obtained by focusing on good training data. So we have picked ViT for our first model.
Ankur used the google/vit-base-patch16-224-in21k image encoder and the GPT-2 text decoder and fine-tuned them using the COCO dataset, which is a dataset of over 120k labeled images.
In order to reduce the model size and speed it up a little bit, we’ve decided to replace GPT-2 with DistilGPT-2 — which is 2 times faster and 33% smaller according to its documentation.
Using that model in Transformers.js gave good results (see the training code at GitHub – mozilla/distilvit: image-to-text model for PDF.js).
We further improved the model for our use case with an updated training dataset and some supervised learning to simplify the output and mitigate some of the biases common in image to text models.
Firefox is able to add an image in a PDF using our popular open source pdf.js library:
Starting in Firefox 130, we will automatically generate an alt text and let the user validate it. So every time an image is added, we get an array of pixels we pass to the ML engine and a few seconds after, we get a string corresponding to a description of this image (see the code).
The first time the user adds an image, they’ll have to wait a bit for downloading the model (which can take up to a few minutes depending on your connection) but the subsequent uses will be much faster since the model will be stored locally.
In the future, we want to be able to provide an alt text for any existing image in PDFs, except images which just contain text (it’s usually the case for PDFs containing scanned books).
Our alt text generator is far from perfect, but we want to take an iterative approach and improve it in the open. The inference engine has already landed in Firefox Nightly as a new ml component along with an initial documentation page.
We are currently working on improving the image-to-text datasets and model with what we’ve described in this blog post, which will be continuously updated on our Hugging Face page.
The code that produces the model lives in Github https://github.com/mozilla/distilvit and the web application we’re building for our team to improve the model is located at https://github.com/mozilla/checkvite. We want to make sure the models and datasets we build, and all the code used, are made available to the community.
Once the alt text feature in PDF.js has matured and proven to work well, we hope to make the feature available in general browsing for users with screen readers.
The post Experimenting with local alt text generation in Firefox Nightly appeared first on Mozilla Hacks - the Web developer blog.
]]>When Mozilla’s Innovation group first launched the llamafile project late last year, we were thrilled by the immediate positive response from open source AI developers. It’s become one of Mozilla’s top three most-favorited repositories on GitHub, attracting a number of contributors, some excellent PRs, and a growing community on our Discord server.
The post Llamafile’s progress, four months in appeared first on Mozilla Hacks - the Web developer blog.
]]>When Mozilla’s Innovation group first launched the llamafile project late last year, we were thrilled by the immediate positive response from open source AI developers. It’s become one of Mozilla’s top three most-favorited repositories on GitHub, attracting a number of contributors, some excellent PRs, and a growing community on our Discord server.
Through it all, lead developer and project visionary Justine Tunney has remained hard at work on a wide variety of fundamental improvements to the project. Just last night, Justine shipped the v0.8 release of llamafile, which includes not only support for the very latest open models, but also a number of big performance improvements for CPU inference.
As a result of Justine’s work, today llamafile is both the easiest and fastest way to run a wide range of open large language models on your own hardware. See for yourself: with llamafile, you can run Meta’s just-released LLaMA 3 model–which rivals the very best models available in its size class–on an everyday Macbook.
How did we do it? To explain that, let’s take a step back and tell you about everything that’s changed since v0.1.
tinyBLAS: democratizing GPU support for NVIDIA and AMD
llamafile is built atop the now-legendary llama.cpp project. llama.cpp supports GPU-accelerated inference for NVIDIA processors via the cuBLAS linear algebra library, but that requires users to install NVIDIA’s CUDA SDK. We felt uncomfortable with that fact, because it conflicts with our project goal of building a fully open-source and transparent AI stack that anyone can run on commodity hardware. And besides, getting CUDA set up correctly can be a bear on some systems. There had to be a better way.
With the community’s help (here’s looking at you, @ahgamut and @mrdomino!), we created our own solution: it’s called tinyBLAS, and it’s llamafile’s brand-new and highly efficient linear algebra library. tinyBLAS makes NVIDIA acceleration simple and seamless for llamafile users. On Windows, you don’t even need to install CUDA at all; all you need is the display driver you’ve probably already installed.
But tinyBLAS is about more than just NVIDIA: it supports AMD GPUs, as well. This is no small feat. While AMD commands a respectable 20% of today’s GPU market, poor software and driver support have historically made them a secondary player in the machine learning space. That’s a shame, given that AMD’s GPUs offer high performance, are price competitive, and are widely available.
One of llamafile’s goals is to democratize access to open source AI technology, and that means getting AMD a seat at the table. That’s exactly what we’ve done: with llamafile’s tinyBLAS, you can now easily make full use of your AMD GPU to accelerate local inference. And, as with CUDA, if you’re a Windows user you don’t even have to install AMD’s ROCm SDK.
All of this means that, for many users, llamafile will automatically use your GPU right out of the box, with little to no effort on your part.
CPU performance gains for faster local AI
Here at Mozilla, we are keenly interested in the promise of “local AI,” in which AI models and applications run directly on end-user hardware instead of in the cloud. Local AI is exciting because it opens up the possibility of more user control over these systems and greater privacy and security for users.
But many consumer devices lack the high-end GPUs that are often required for inference tasks. llama.cpp has been a game-changer in this regard because it makes local inference both possible and usably performant on CPUs instead of just GPUs.
Justine’s recent work on llamafile has now pushed the state of the art even further. As documented in her detailed blog post on the subject, by writing 84 new matrix multiplication kernels she was able to increase llamafile’s prompt evaluation performance by an astonishing 10x compared to our previous release. This is a substantial and impactful step forward in the quest to make local AI viable on consumer hardware.
This work is also a great example of our commitment to the open source AI community. After completing this work we immediately submitted a PR to upstream these performance improvements to llama.cpp. This was just the latest of a number of enhancements we’ve contributed back to llama.cpp, a practice we plan to continue.
Raspberry Pi performance gains
Speaking of consumer hardware, there are few examples that are both more interesting and more humble than the beloved Raspberry Pi. For a bargain basement price, you get a full-featured computer running Linux with plenty of computing power for typical desktop uses. It’s an impressive package, but historically it hasn’t been considered a viable platform for AI applications.
Not any more. llamafile has now been optimized for the latest model (the Raspberry Pi 5), and the result is that a number of small LLMs–such as Rocket-3B (download), TinyLLaMA-1.5B (download), and Phi-2 (download)–run at usable speeds on one of the least expensive computers available today. We’ve seen prompt evaluation speeds of up to 80 tokens/sec in some cases!
Keeping up with the latest models
The pace of progress in the open model space has been stunningly fast. Over the past few months, hundreds of models have been released or updated via fine-tuning. Along the way, there has been a clear trend of ever-increasing model performance and ever-smaller model sizes.
The llama.cpp project has been doing an excellent job of keeping up with all of these new models, frequently rolling-out support for new architectures and model features within days of their release.
For our part we’ve been keeping llamafile closely synced with llama.cpp so that we can support all the same models. Given the complexity of both projects, this has been no small feat, so we’re lucky to have Justine on the case.
Today, you can today use the very latest and most capable open models with llamafile thanks to her hard work. For example, we were able to roll-out llamafiles for Meta’s newest LLaMA 3 models–8B-Instruct and 70B-Instruct–within a day of their release. With yesterday’s 0.8 release, llamafile can also run Grok, Mixtral 8x22B, and Command-R.
Creating your own llamafiles
Since the day that llamafile shipped people have wanted to create their own llamafiles. Previously, this required a number of steps, but today you can do it with a single command, e.g.:
llamafile-convert [model.gguf]
In just moments, this will produce a “model.llamafile” file that is ready for immediate use. Our thanks to community member @chan1012 for contributing this helpful improvement.
In a related development, Hugging Face recently added official support for llamafile within their model hub. This means you can now search and filter Hugging Face specifically for llamafiles created and distributed by other people in the open source community.
OpenAI-compatible API server
Since it’s built on top of llama.cpp, llamafile inherits that project’s server component, which provides OpenAI-compatible API endpoints. This enables developers who are building on top of OpenAI to switch to using open models instead. At Mozilla we very much want to support this kind of future: one where open-source AI is a viable alternative to centralized, closed, commercial offerings.
While open models do not yet fully rival the capabilities of closed models, they’re making rapid progress. We believe that making it easier to pivot existing code over to executing against open models will increase demand and further fuel this progress.
Over the past few months, we’ve invested effort in extending these endpoints, both to increase functionality and improve compatibility. Today, llamafile can serve as a drop-in replacement for OpenAI in a wide variety of use cases.
We want to further extend our API server’s capabilities, and we’re eager to hear what developers want and need. What’s holding you back from using open models? What features, capabilities, or tools do you need? Let us know!
Integrations with other open source AI projects
Finally, it’s been a delight to see llamafile adopted by independent developers and integrated into leading open source AI projects (like Open Interpreter). Kudos in particular to our own Kate Silverstein who landed PRs that add llamafile support to LangChain and LlamaIndex (with AutoGPT coming soon).
If you’re a maintainer or contributor to an open source AI project that you feel would benefit from llamafile integration, let us know how we can help.
Join us!
The llamafile project is just getting started, and it’s also only the first step in a major new initiative on Mozilla’s part to contribute to and participate in the open source AI community. We’ll have more to share about that soon, but for now: I invite you to join us on the llamafile project!
The best place to connect with both the llamafile team at Mozilla and the overall llamafile community is over at our Discord server, which has a dedicated channel just for llamafile. And of course, your enhancement requests, issues, and PRs are always welcome over at our GitHub repo.
I hope you’ll join us. The next few months are going to be even more interesting and unexpected than the last, both for llamafile and for open source AI itself.
The post Llamafile’s progress, four months in appeared first on Mozilla Hacks - the Web developer blog.
]]>In this blog post, we delve into the motivations for choosing Rust for our crash reporter, outline the unique challenges of designing an application that operates when the main browser has failed, and discuss the new architecture we've implemented. We also share insights into the technical nuances of the implementation, demonstrating how Rust's features are leveraged to handle crashes more effectively and securely.
The post Porting a cross-platform GUI application to Rust appeared first on Mozilla Hacks - the Web developer blog.
]]>Firefox’s crash reporter is hopefully not something that most users experience often. However, it is still a very important component of Firefox, as it is integral in providing insight into the most visible bugs: those which crash the main process. These bugs offer the worst user experience (since the entire application must close), so fixing them is a very high priority. Other types of crashes, such as content (tab) crashes, can be handled by the browser and reported gracefully, sometimes without the user being aware that an issue occurred at all. But when the main browser process comes to a halt, we need another separate application to gather information about the crash and interact with the user.
This post details the approach we have taken to rewrite the crash reporter in Rust. We discuss the reasoning behind this rewrite, what makes the crash reporter a unique application, the architecture we used, and some details of the implementation.
Even though it is important to properly handle main process crashes, the crash reporter hasn’t received significant development in a while (aside from development to ensure that crash reports and telemetry continue to reliably be delivered)! It has long been stuck in a local maximum of “good enough” and “scary to maintain”: it features 3 individual GUI implementations (for Windows, GTK+ for Linux, and macOS), glue code abstracting a few things (mostly in C++, and Objective-C for macOS), a binary blob produced by obsoleted Apple development tools, and no test suite. Because of this, there is a backlog of features and improvements which haven’t been acted on.
We’ve recently had a number of successful pushes to decrease crash rates (including both big leaps and many small bug fixes), and the crash reporter has functioned well enough for our needs during this time. However, we’ve reached an inflection point where improving the crash reporter would provide valuable insight to enable us to decrease the crash rate even further. For the reasons previously mentioned, improving the current codebase is difficult and error-prone, so we deemed it appropriate to rewrite the application so we can more easily act on the feature backlog and improve crash reports.
Like many components of Firefox, we decided to use Rust for this rewrite to produce a more reliable and maintainable program. Besides the often-touted memory safety built into Rust, its type system and standard library make reasoning about code, handling errors, and developing cross-platform applications far more robust and comprehensive.
There are a number of features of the crash reporter which make it quite unique, especially compared to other components which have been ported to Rust. For one thing, it is a standalone, individual program; basically no other components of Firefox are used in this way. Firefox itself launches many processes as a means of sandboxing and insulating against crashes, however these processes all talk to one another and have access to the same code base.
The crash reporter has a very unique requirement: it must use as little as possible of the Firefox code base, ideally none! We don’t want it to rely on code which may be buggy and cause the reporter itself to crash. Using a completely independent implementation ensures that when a main process crash does occur, the cause of that crash won’t affect the reporter’s functionality as well.
The crash reporter also necessarily has a GUI. This alone may not separate it from other Firefox components, but we can’t leverage any of the cross-platform rendering goodness that Firefox provides! So we need to implement a cross-platform GUI independent of Firefox as well. You might think we could reach for an existing cross-platform GUI crate, however we have a few reasons not to do so.
So all of this is to say that third-party cross-platform GUI libraries aren’t a favorable option.
These requirements significantly narrow the approach that can be used.
In order to make the crash reporter more maintainable (and make it easier to add new features in the future), we want to have as minimal and generic platform-specific code as possible. We can achieve this by using a simple UI model that can be converted into native GUI code for each platform. Each UI implementation will need to provide two methods (over arbitrary platform-specific &self data):
/// Run a UI loop, displaying all windows of the application until it terminates.
fn run_loop(&self, app: model::Application)
/// Invoke a function asynchronously on the UI loop thread.
fn invoke(&self, f: model::InvokeFn)
The run_loop function is pretty self-explanatory: the UI implementation takes an Application model (which we’ll discuss shortly) and runs the application, blocking until the application is complete. Conveniently, our target platforms generally have similar assumptions around threading: the UI runs in a single thread and typically runs an event loop which blocks on new events until an event signaling the end of the application is received.
There are some cases where we’ll need to run a function on the UI thread asynchronously (like displaying a window, updating a text field, etc). Since run_loop blocks, we need the invoke method to define how to do this. This threading model will make it easy to use the platform GUI frameworks: everything calling native functions will occur on a single thread (the main thread in fact) for the duration of the program.
This is a good time to be a bit more specific about exactly what each UI implementation will look like. We’ll discuss pain points for each later on. There are 4 UI implementations:
One last detail before we dive in: the crash reporter (at least right now) has a pretty simple GUI. Because of this, an explicit non-goal of the development was to create a separate Rust GUI crate. We wanted to create just enough of an abstraction to cover the cases we needed in the crash reporter. If we need more controls in the future, we can add them to the abstraction, but we avoided spending extra cycles to fill out every GUI use case.
Likewise, we tried to avoid unnecessary development by allowing some tolerance for hacks and built-in edge cases. For example, our model defines a Button as an element which contains an arbitrary element, but actually supporting that with Win32 or AppKit would have required a lot of custom code, so we special case on a Button containing a Label (which is all we need right now, and an easy primitive available to us). I’m happy to say there aren’t really many special cases like that at all, but we are comfortable with the few that were needed.
Our model is a declarative structuring of concepts mostly present in GTK. Since GTK is a mature library with proven high-level UI concepts, this made it appropriate for our abstraction and made the GTK implementation pretty simple. For instance, the simplest way that GTK does layout (using container GUI elements and per-element margins/alignments) is good enough for our GUI, so we use similar definitions in the model. Notably, this “simple” layout definition is actually somewhat high-level and complicates the macOS and Windows implementations a bit (but this tradeoff is worth the ease of creating UI models).
The top-level type of our UI model is Application. This is pretty simple: we define an Application as a set of top-level Windows (though our application only has one) and whether the current locale is right-to-left. We inspect Firefox resources to use the same locale that Firefox would, so we don’t rely on the native GUI’s locale settings.
As you might expect, each Window contains a single root element. The rest of the model is made up of a handful of typical container and primitive GUI elements:
The crash reporter only needs 8 types of GUI elements! And really, Progress is used as a spinner rather than indicating any real progress as of right now, so it’s not strictly necessary (but nice to show).
Rust does not explicitly support the object-oriented concept of inheritance, so you might be wondering how each GUI element “extends” Element. The relationship represented in the picture is somewhat abstract; the implemented Element looks like:
pub struct Element {
pub style: ElementStyle,
pub element_type: ElementType
}
where ElementStyle contains all the common properties of elements (alignment, size, margin, visibility, and enabled state), and ElementType is an enum containing each of the specific GUI elements as variants.
The model elements are all intended to be consumed by the UI implementations; as such, almost all of the fields have public visibility. However, as a means of having a separate interface for building elements, we define an ElementBuilder<T> type. This type has methods that maintain assertions and provide convenience setters. For instance, many methods accept parameters that are impl Into<MemberType>, some methods like margin() set multiple values (but you can be more specific with margin_top()), etc.
There is a general impl<T> ElementBuilder<T> which provides setters for the various ElementStyle properties, and then each specific element type can also provide their own impl ElementBuilder<SpecificElement> with additional properties unique to the element type.
We combine ElementBuilder<T> with the final piece of the puzzle: a ui! macro. This macro allows us to write our UI in a declarative manner. For example, it allows us to write:
let details_window = ui! {
Window title("Crash Details") visible(show_details) modal(true) hsize(600) vsize(400)
halign(Alignment::Fill) valign(Alignment::Fill)
{
VBox margin(10) spacing(10) halign(Alignment::Fill) valign(Alignment::Fill) {
Scroll halign(Alignment::Fill) valign(Alignment::Fill) {
TextBox content(details) halign(Alignment::Fill) valign(Alignment::Fill)
},
Button halign(Alignment::End) on_click(move || *show_details.borrow_mut() = false)
{
Label text("Ok")
}
}
}
};
The implementation of ui! is fairly simple. The first identifier provides the element type and an ElementBuilder<T> is created. After that, the remaining method-call-like syntax forms are called on the builder (which is mutable).
Optionally, a final set of curly braces indicate that the element has children. In that case, the macro is recursively called to create them, and add_child is called on the builder with the result (so we just need to make sure a builder has an add_child method). Ultimately the syntax transformation is pretty simple, but I believe that this macro is a little bit more than just syntax sugar: it makes reading and editing the UI a fair bit clearer, since the hierarchy of elements is represented in the syntax. Unfortunately a downside is that there’s no way to support automatic formatting of such macro DSLs, so developers will need to maintain a sane formatting.
So now we have a model defined and a declarative way of building it. But we haven’t discussed any dynamic runtime behaviors here. In the above example, we see an on_click handler being set on a Button. We also see things like the Window’s visible property being set to a show_details value which is changed when on_click is pressed. We hook into this declarative UI to change or react to events at runtime using a set of simple data binding primitives with which UI implementations can interact.
Many GUI frameworks nowadays (both for Rust and other languages) have been built with the “diffing element trees” architecture (think React), where your code is (at least mostly) functional and side-effect-free and produces the GUI view as a function of the current state. This approach has its tradeoffs: for instance, it makes complicated, stateful alterations of the layout very simple to write, understand, and maintain, and encourages a clean separation of model and view! However since we aren’t writing a framework, and our application is and will remain fairly simple, the benefits of such an architecture were not worth the additional development burden. Our implementation is more similar to the MVVM architecture:
There are a few types which we use to declare dynamic (runtime-changeable) values. In our UI, we needed to support a few different behaviors:
On-demand values are used to get textbox contents rather than using a synchronized value, in an effort to avoid implementing debouncing in each UI. It may not be terribly difficult to do so, but it also wasn’t difficult to support the on-demand implementation.
As a means of convenience, we created a Property type which encompasses the value-oriented fields as well. A Property<T> can be set to either a static value (T), a synchronized value (Synchronized<T>), or an on-demand value (OnDemand<T>). It supports an impl From for each of these, so that builder methods can look like fn my_method(&mut self, value: impl Into<Property<T>>) allowing any supported value to be passed in a UI declaration.
We won’t discuss the implementation in depth (it’s what you’d expect), but it’s worth noting that these are all Clone to easily share the data bindings: they use Rc (we don’t need thread safety) and RefCell as necessary to access callbacks.
In the example from the last section, show_details is a Synchronized<bool> value. When it changes, the UI implementations change the associated window visibility. The Button on_click callback sets the synchronized value to false, hiding the window (note that the details window used in this example is never closed, it is just shown and hidden).
In a former iteration, data binding types had a lifetime parameter which specified the lifetime for which event callbacks were valid. While we were able to make this work, it greatly complicated the code, especially because there’s no way to communicate the correct covariance of the lifetime to the compiler, so there was additional unsafe code transmuting lifetimes (though it was contained as an implementation detail). These lifetimes were also infectious, requiring some of the complicated semantics regarding their safety to be propagated into the model types which stored Property fields.
Much of this was to avoid cloning values into the callbacks, but changing these types to all be Clone and store static-lifetime callbacks was worth making the code far more maintainable.
The careful reader might remember that we discussed how our threading model involves interacting with the UI implementations only on the main thread. This includes updating the data bindings, since the UI implementations might have registered callbacks on them! While we could run everything in the main thread, it’s generally a much better experience to do as much off of the UI thread as possible, even if we don’t do much that’s blocking (though we will be blocking when we send crash reports). We want our business logic to default to being off of the main thread so that the UI doesn’t ever freeze. We can guarantee this with some careful design.
The simplest way to guarantee this behavior is to put all of the business logic in one (non-Clone, non-Sync) type (let’s call it Logic) and construct the UI and UI state (like Property values) in another type (let’s call it UI). We can then move the Logic value into a separate thread to guarantee that UI can’t interact with Logic directly, and vice versa. Of course we do need to communicate sometimes! But we want to ensure that this communication will always be delegated to the thread which owns the values (rather than the values directly interacting with each other).
We can accomplish this by creating an enqueuing function for each type and storing that in the opposite type. Such a function will be passed boxed functions to run on the owning thread that get a reference to the owned type (e.g., Box<dyn FnOnce(&T) + Send + 'static>). This is simple to create: for the UI thread, it is just the UI implementation’s invoke method which we briefly discussed previously. The Logic thread does nothing but run a loop which will get these functions and run them on the owned value (we just enqueue and pass them using an mpsc::channel). Now each type can asynchronously call methods on the other with the guarantee that they’ll be run on the correct thread.
In a former iteration, a more complicated scheme was used with thread-local storage and a central type which was responsible for both creating threads and delegating the functions. But with such a basic use case as two threads delegating between each other, we were able to distill this to the essential aspects needed, greatly simplifying the code.
One nice benefit of this rewrite is that we could bring the localization of the crash reporter up to speed with our modern tooling. In almost every other part of Firefox, we use fluent to handle localization. Using fluent in the crash reporter makes the experience of localizers more uniform and predictable; they do not need to understand more than one localization system (the crash reporter was one of the last holdouts of the old system). It was very easy to use in the new code, with just a bit of extra code to extract the localization files from the Firefox installation when the crash reporter is run. In the worst case scenario where we can’t find or access these files, we have the en-US definitions directly bundled in the crash reporter binary.
We won’t go into much detail about the implementations, but it’s worth talking about each a bit.
The GTK implementation is probably the most straightforward and succinct. We use bindgen to generate Rust bindings to the GTK functions we need (avoiding vendoring any external crates). Then we simply call all of the corresponding GTK functions to set up the GTK widgets as described in the model (remember, the model was made to mirror some of the GTK concepts).
Since GTK is somewhat modern and meant to be written by humans (not automated tools like some of the other platforms), there weren’t really any pain points or unusual behaviors that needed to be addressed.
We have a handful of nice features to improve memory safety and correctness. A set of traits makes it easy to attach owned data to GObjects (ensuring data remains valid and is properly dropped when the GObject is destroyed), and a few macros set up the glue code between GTK signals and our data binding types.
The Windows implementation may have been the most difficult to write, since Win32 GUIs are very rarely written nowadays and the API shows its age. We use the windows-sys crate to access bindings to the API (which was already vendored in the codebase for many other Windows API uses). This crate is generated directly from Windows function metadata (by Microsoft), but otherwise its bindings aren’t terribly different from what bindgen might have produced (though they are likely a bit more accurate).
There were a number of hurdles to overcome. For one thing, the Win32 API doesn’t provide any layout primitives, so the high-level layout concepts we use (which allow graceful resize/repositioning) had to be implemented manually. There’s also quite a few extra API calls just to get to a GUI that looks somewhat decent (correct window colors, font smoothing, high DPI handling, etc). Even the default font ends up being a terrible looking bitmapped font rather than the more modern system default; we needed to manually retrieve the system default and set it as the font to use, which was a bit surprising!
We have a set of traits to facilitate creating custom window classes and managing associated window data of class instances. We also have wrapper types to properly manage the lifetimes of handles and perform type conversions (mainly String to null-terminated wide strings and back) as an extra layer of safety around the API.
The macOS implementation had its tricky parts, as overwhelmingly macOS GUIs are written with XCode and there’s a lot of automated and generated portions (such as nibs). We again use bindgen to generate Rust bindings, this time for the Objective-C APIs in macOS framework headers.
Unlike Windows and GTK, you don’t get keyboard shortcuts like Cmd-C, Cmd-Q, etc, for free if creating a GUI without e.g. XCode (which generates it for you as part of a new project template). To have these typical shortcuts that users expect, we needed to manually implement the application main menu (which is what governs keyboard shortcuts). We also had to handle runtime setup like creating Objective-C autorelease pools, bringing the window and application (which are separate concepts) to the foreground, etc. Even implementing invoke to call a function on the main thread had its nuances, since modal windows use a nested event loop which would not call queued functions under the default NSRunLoop mode.
We wrote some simple helper types and a macro to make it easy to implement, register, and create Objective-C classes from Rust code. We used this for creating delegate classes as well as subclassing some controls for the implementation (like NSButton); it made it easy to safely manage the memory of Rust values underlying the classes and correctly register class method selectors.
We’ll discuss testing in the next section. Our testing UI is very simple. It doesn’t create a GUI, but allows us to interact directly with the model. The ui! macro supports an extra piece of syntax when tests are enabled to optionally set a string identifier for each element. We use these strings in unit tests to access and interact with the UI. The data binding types also support a few additional methods in tests to easily manipulate values. This UI allows us to simulate button presses, field entry, etc, to ensure that other UI state changes as expected as well as simulating the system side effects.
An important goal of our rewrite was to add tests to the crash reporter; our old code was sorely lacking them (in part because unit testing GUIs is notoriously difficult).
In the new code, we can mock the crash reporter regardless of whether we are running tests or not (though it is always mocked for tests). This is important because mocking allows us to (manually) run the GUI in various states to check that the GUI implementations are correct and render well. Our mocking not only mocks the inputs to the crash reporter (environment variables, command line parameters, etc), it also mocks all side-effectful std functions.
We accomplish this by having a std module in the crate, and using crate::std throughout the rest of the code. When mocking is disabled, crate::std is simply the same as ::std. But when it is enabled, a bunch of functions that we have written are used instead. These mock the filesystem, environment, launching external commands, and other side effects. Importantly, only the minimal amount to mock the existing functions is implemented, so that if e.g. some new functions from std::fs, std::net, etc. are used, the crate will fail to compile with mocking enabled (so that we don’t miss any side effects). This might sound like a lot of effort, but you might be surprised at how little of std really needed to be mocked, and most implementations were pretty straightforward.
Now that we have our code using different mocked functions, we need to have a way of injecting the desired mock data (both in tests and in our normal mocked operation). For example, we have the ability to return some data when a File is read, but we need to be able to set that data differently for tests. Without going into too much detail, we accomplish this using a thread-local store of mock data. This way, we don’t need to change any code to accommodate the mock data; we only need to make changes where we set and retrieve it. The programming language enthusiasts out there may recognize this as a form of dynamic scoping. The implementation allows our mock data to be set with code like
mock::builder()
.set(
crate::std::env::MockCurrentExe,
"work_dir/crashreporter".into(),
)
.run(|| crash_reporter_main())
in tests, and
pub fn current_exe() -> std::io::Result {
Ok(MockCurrentExe.get(|r| r.clone()))
}
in our crate::std::env implementation.
With our mocking setup and test UI, we are able to extensively test the behavior of the crash reporter. The “last mile” of this testing which we can’t automate easily is whether each UI implementation faithfully represents the UI model. We manually test this with a mocked GUI for each platform.
Besides that, we are able to automatically test how arbitrary UI interactions cause the crash reporter to affect its own UI state and the environment (checking which programs are invoked and network connections are made, what happens if they fail, succeed, or timeout, etc). We also set up a mock filesystem and add assertions in various scenarios over the precise resulting filesystem state once the crash reporter completes. This greatly increases our confidence in the current behaviors and ensures that future changes will not alter them, which is of the utmost importance for such an essential component of our crash reporting pipeline.
Of course we can’t get away with writing all of this without a picture of the crash reporter! This is what it looks like on Linux using GTK. The other GUI implementations look the same but styled with a native look and feel.
Note that, for now, we wanted to keep it looking exactly the same as it previously did. So if you are unfortunate enough to see it, it shouldn’t appear as if anything has changed!
With a new, cleaned up crash reporter, we can finally unblock a number of feature requests and bug reports, such as:
We are excited to iterate and improve further on crash reporter functionality. But ultimately it’d be wonderful if you never see or use it, and we are constantly working toward that goal!
The post Porting a cross-platform GUI application to Rust appeared first on Mozilla Hacks - the Web developer blog.
]]>In the fast-paced world of generative AI, staying ahead means moving swiftly and smartly. That's why we've embraced Gradio, the low-code prototyping toolkit from Hugging Face, as our go-to for bringing new ideas to life.
The post Prototype even faster with the Gradio UI for Figma component library appeared first on Mozilla Hacks - the Web developer blog.
]]>As an industry, generative AI is moving quickly, and so requires teams exploring new ideas and technologies to move quickly as well. To do so, we have been using Gradio, a low-code prototyping toolkit from Hugging Face, to spin up experiments and experiences. Gradio has allowed us to validate concepts through prototyping without large investments of time, effort, or infrastructure.
Although Gradio has made the development phase of prototyping easier, the design phase has been largely the same. Even with Gradio, designers have had to create components in Figma, outline expected user flows and behaviors, and hand off designs for developers in the same way they have always done. While working on a recent exploration, we realized something was needed: a set of Figma components based on Gradio that enabled designers to create wireframes quickly.
Today, we are releasing our library of design components for Gradio for others to use. The components are based on version 4.23.0 of Gradio and will be available through our Figma profile: Mozilla Innovation Projects, https://www.figma.com/@futureatmozilla. We hope these components help teams accelerate their discovery and experimentation with ML and generative AI.
You can find out more about Gradio at https://www.gradio.app/ and more about innovation at Mozilla at https://future.mozilla.org
Thanks to Amy Chiu and Anais Ron who created the components and to the Gradio team for their work. Happy designing!
Because Gradio is an ever-changing prototyping kit, current components are based on version 4.23.0 of Gradio. We selected components based on their wide array of potential uses. Here is a list of the components inside the kit:
Small Components:
Big Components:
To start using the library, follow these simple steps:
The post Prototype even faster with the Gradio UI for Figma component library appeared first on Mozilla Hacks - the Web developer blog.
]]>In collaboration with the other major browser engine developers, Mozilla is thrilled to announce Speedometer 3 today. Like previous versions of Speedometer, this benchmark measures what we think matters most for performance online: responsiveness. But today’s release is more open and more challenging than before, and is the best tool for driving browser performance improvements that we’ve ever seen.
The post Improving Performance in Firefox and Across the Web with Speedometer 3 appeared first on Mozilla Hacks - the Web developer blog.
]]>In collaboration with the other major browser engine developers, Mozilla is thrilled to announce Speedometer 3 today. Like previous versions of Speedometer, this benchmark measures what we think matters most for performance online: responsiveness. But today’s release is more open and more challenging than before, and is the best tool for driving browser performance improvements that we’ve ever seen.
This fulfills the vision set out in December 2022 to bring experts across the industry together in order to rethink how we measure browser performance, guided by a shared goal to reflect the real-world Web as much as possible. This is the first time the Speedometer benchmark, or any major browser benchmark, has been developed through a cross-industry collaboration supported by each major browser engine: Blink, Gecko, and WebKit. Working together means we can build a shared understanding of what matters to optimize, and facilitates broad review of the benchmark itself: both of which make it a stronger lever for improving the Web as a whole.
And we’re seeing results: Firefox got faster for real users in 2023 as a direct result of optimizing for Speedometer 3. This took a coordinated effort from many teams: understanding real-world websites, building new tools to drive optimizations, and making a huge number of improvements inside Gecko to make web pages run more smoothly for Firefox users. In the process, we’ve shipped hundreds of bug fixes across JS, DOM, Layout, CSS, Graphics, frontend, memory allocation, profile-guided optimization, and more.
We’re happy to see core optimizations in all the major browser engines turning into improved responsiveness for real users, and are looking forward to continuing to work together to build performance tests that improve the Web.
The post Improving Performance in Firefox and Across the Web with Speedometer 3 appeared first on Mozilla Hacks - the Web developer blog.
]]>Following the success of Interop 2023, we are pleased to confirm that the project will continue in 2024 with a new selection of focus areas, representing areas of the web platform where we think we can have the biggest positive impact on users and web developers.
The post Announcing Interop 2024 appeared first on Mozilla Hacks - the Web developer blog.
]]>The Interop Project has become one of the key ways that browser vendors come together to improve the web platform. By working to identify and improve key areas where differences between browser engines are impacting users and web developers, Interop is a critical tool in ensuring the long-term health of the open web.
The web platform is built on interoperability based on common standards. This offers users a degree of choice and control that sets the web apart from proprietary platforms defined by a single implementation. A commitment to ensuring that the web remains open and interoperable forms a fundamental part of Mozilla’s manifesto and web vision, and is why we’re so committed to shipping Firefox with our own Gecko engine.
However interoperability requires care and attention to maintain. When implementations ship with differences between the standard and each other, this creates a pain point for web authors; they have to choose between avoiding the problematic feature entirely and coding to specific implementation quirks. Over time if enough authors produce implementation-specific content then interoperability is lost, and along with it user agency.
This is the problem that the Interop Project is designed to address. By bringing browser vendors together to focus on interoperability, the project allows identifying areas where interoperability issues are causing problems, or may do in the near future. Tracking progress on those issues with a public metric provides accountability to the broader web community on addressing the problems.
The project works by identifying a set of high-priority focus areas: parts of the web platform where everyone agrees that making interoperability improvements will be of high value. These can be existing features where we know browsers have slightly different behaviors that are causing problems for authors, or they can be new features which web developer feedback shows is in high demand and which we want to launch across multiple implementations with high interoperability from the start. For each focus area a set of web-platform-tests is selected to cover that area, and the score is computed from the pass rate of these tests.
The Interop 2023 project covered high profile features like the new :has() selector, and web-codecs, as well as areas of historically poor interoperability such as pointer events.
The results of the project speak for themselves: every browser ended the year with scores in excess of 97% for the prerelease versions of their browsers. Moreover, the overall Interoperability score — that is the fraction of focus area tests that pass in all participating browser engines — increased from 59% at the start of the year to 95% now. This result represents a huge improvement in the consistency and reliability of the web platform. For users this will result in a more seamless experience, with sites behaving reliably in whichever browser they prefer.
For the :has() selector — which we know from author feedback has been one of the most in-demand CSS features for a long time — every implementation is now passing 100% of the web-platform-tests selected for the focus area. Launching a major new platform feature with this level of interoperability demonstrates the power of the Interop project to progress the platform without compromising on implementation diversity, developer experience, or user choice.
As well as focus areas, the Interop project also has “investigations”. These are areas where we know that we need to improve interoperability, but aren’t at the stage of having specific tests which can be used to measure that improvement. In 2023 we had two investigations. The first was for accessibility, which covered writing many more tests for ARIA computed role and accessible name, and ensuring they could be run in different browsers. The second was for mobile testing, which has resulted in both Mobile Firefox and Chrome for Android having their initial results in wpt.fyi.
Following the success of Interop 2023, we are pleased to confirm that the project will continue in 2024 with a new selection of focus areas, representing areas of the web platform where we think we can have the biggest positive impact on users and web developers.
New focus areas for 2024 include, among other things:
The full list of focus areas is available in the project README.
In addition to the new focus areas, we will carry over some of the 2023 focus areas where there’s still more work to be done. Of particular interest is the Layout focus area, which will combine the previous Flexbox, Grid and Subgrid focus area into one area covering all the most important layout primitives for the modern web. On top of that the Custom Properties, URL and Mouse and Pointer Events focus areas will be carried over. These represent cases where, even though we’ve already seen large improvements in Interoperability, we believe that users and web authors will benefit from even greater convergence between implementations.
As well as focus areas, Interop 2024 will also feature a new investigation into improving the integration of WebAssembly testing into web-platform-tests. This will open up the possibility of including WASM features in future Interop projects. In addition we will extend the Accessibility and Mobile Testing investigations, as there is more work to be done to make those aspects of the platform fully testable across different implementations.
The post Announcing Interop 2024 appeared first on Mozilla Hacks - the Web developer blog.
]]>During the Firefox 120 beta cycle, a new crash signature appeared on our radars with significant volume. Engineers working on Firefox, explore the subtle pitfalls of combining compiler flags.
The post Option Soup: the subtle pitfalls of combining compiler flags appeared first on Mozilla Hacks - the Web developer blog.
]]>Firefox development uncovers many cross-platform differences and unique features of its combination of dependencies. Engineers working on Firefox regularly overcome these challenges and while we can’t detail all of them, we think you’ll enjoy hearing about some so here’s a sample of a recent technical investigation.
During the Firefox 120 beta cycle, a new crash signature appeared on our radars with significant volume.
At that time, the distribution across operating systems revealed that more than 50% of the crash volume originates from Ubuntu 18.04 LTS users.
The main process crashes in a CanvasRenderer
thread, with the following call stack:
0 firefox std::locale::operator= 1 firefox std::ios_base::imbue 2 firefox std::basic_ios<char, std::char_traits<char> >::imbue 3 libxul.so sh::InitializeStream<std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> > > /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Common.h:238 3 libxul.so sh::TCompiler::setResourceString /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:1294 4 libxul.so sh::TCompiler::Init /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/Compiler.cpp:407 5 libxul.so sh::ConstructCompiler /build/firefox-ZwAdKm/firefox-120.0~b2+build1/gfx/angle/checkout/src/compiler/translator/ShaderLang.cpp:368 6 libxul.so mozilla::webgl::ShaderValidator::Create /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:215 6 libxul.so mozilla::WebGLContext::CreateShaderValidator const /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShaderValidator.cpp:196 7 libxul.so mozilla::WebGLShader::CompileShader /build/firefox-ZwAdKm/firefox-120.0~b2+build1/dom/canvas/WebGLShader.cpp:98
At first glance, we want to blame WebGL. The C++ standard library functions cannot be at fault, right?
But when looking at the WebGL code, the crash occurs in the perfectly valid lines of C++ summarized below:
std::ostringstream stream; stream.imbue(std::locale::classic());
This code should never crash, and yet it does. In fact, taking a closer look at the stack gives a first lead for investigation:
Although we crash into functions that belong to the C++ standard library, these functions appear to live in the firefox binary.
This is an unusual situation that never occurs with official builds of Firefox.
It is however very common for distribution to change the configuration settings and apply downstream patches to an upstream source, no worries about that.
Moreover, there is only a single build of Firefox Beta that is causing this crash.
We know this thanks to a unique identifier associated with any ELF binary.
Here, if we choose any specific version of Firefox 120 Beta (such as 120b9), the crashes all embed the same unique identifier for firefox.
Now, how can we guess what build produces this weird binary?
A useful user comment mentions that they regularly experience this crash since updating to 120.0~b2+build1-0ubuntu0.18.04.1.
And by looking for this build identifier, we quickly reach the Firefox Beta PPA.
Then indeed, we are able to reproduce the crash by installing it in a Ubuntu 18.04 LTS virtual machine: it occurs when loading any WebGL page!
With the binary now at hand, running nm -D ./firefox
confirms the presence of several symbols related to libstdc++ that live in the text section (T
marker).
Templated and inline symbols from libstdc++ usually appear as weak (W
marker), so there is only one explanation for this situation: firefox has been statically linked with libstdc++, probably through -static-libstdc++
.
Fortunately, the build logs are available for all Ubuntu packages.
After some digging, we find the logs for the 120b9 build, which indeed contain references to -static-libstdc++
.
But why?
Again, everything is well documented, and thanks to well trained digging skills we reach a bug report that provides interesting insights.
Firefox requires a modern C++ compiler, and hence a modern libstdc++, which is unavailable on old systems like Ubuntu 18.04 LTS.
The build uses -static-libstdc++
to close this gap.
This just explains the weird setup though.
What about the crash?
Since we can now reproduce it, we can launch Firefox in a debugger and continue our investigation.
When inspecting the crash site, we seem to crash because std::locale::classic()
is not properly initialized.
Let’s take a peek at the implementation.
const locale& locale::classic() { _S_initialize(); return *(const locale*)c_locale; }
_S_initialize()
is in charge of making sure that c_locale
will be properly initialized before we return a reference to it.
To achieve this, _S_initialize()
calls another function, _S_initialize_once()
.
void locale::_S_initialize() { #ifdef __GTHREADS if (!__gnu_cxx::__is_single_threaded()) __gthread_once(&_S_once, _S_initialize_once); #endif if (__builtin_expect(!_S_classic, 0)) _S_initialize_once(); }
In _S_initialize()
, we first go through a wrapper for pthread_once()
: the first thread that reaches this code consumes _S_once
and calls _S_initialize_once()
, whereas other threads (if any) are stuck waiting for _S_initialize_once()
to complete.
This looks rather fail-proof, right?
There is even an extra direct call to _S_initialize_once()
if _S_classic
is still uninitialized after that.
Now, _S_initialize_once()
itself is rather straightforward: it allocates _S_classic
and puts it within c_locale
.
void locale::_S_initialize_once() throw() { // Need to check this because we could get called once from _S_initialize() // when the program is single-threaded, and then again (via __gthread_once) // when it's multi-threaded. if (_S_classic) return; // 2 references. // One reference for _S_classic, one for _S_global _S_classic = new (&c_locale_impl) _Impl(2); _S_global = _S_classic; new (&c_locale) locale(_S_classic); }
The crash looks as if we never went through _S_initialize_once()
, so let’s put a breakpoint there and see what happens.
And just by doing this, we already notice something suspicious.
We do reach _S_initialize_once()
, but not within the firefox binary: instead, we only ever reach the version exported by liblgpllibs.so.
In fact, liblgpllibs.so is also statically linked with libstdc++, such that firefox and liblgpllibs.so both embed and export their own _S_initialize_once()
function.
By default, symbol interposition applies, and _S_initialize_once()
should always be called through the procedure linkage table (PLT), so that every module ends up calling the same version of the function.
If symbol interposition were happening here, we would expect that liblgpllibs.so would reach the version of _S_initialize_once() exported by firefox rather than its own, because firefox was loaded first.
So maybe there is no symbol interposition.
This can occur when using -fno-semantic-interposition
.
Each version of the standard library would live on its own, independent from the other versions.
But neither the Firefox build system nor the Ubuntu maintainer seem to pass this flag to the compiler.
However, by looking at the disassembly for _S_initialize()
and _S_initialize_once()
, we can see that the exported global variables (_S_once
, _S_classic
, _S_global
) are subject to symbol interposition:
These accesses all go through the global offset table (GOT), so that every module ends up accessing the same version of the variable.
This seems strange given what we said earlier about _S_initialize_once()
.
Non-exported global variables (c_locale
, c_locale_impl
), however, are accessed directly without symbol interposition, as expected.
We now have enough information to explain the crash.
When we reach _S_initialize()
in liblgpllibs.so, we actually consume the _S_once
that lives in firefox, and initialize the _S_classic
and _S_global
that live in firefox.
But we initialize them with pointers to well initialized variables c_locale_impl
and c_locale
that live in liblgpllibs.so!
The variables c_locale_impl
and c_locale
that live in firefox, however, remain uninitialized.
So if we later reach _S_initialize()
in firefox, everything looks as if initialization has happened.
But then we return a reference to the version of c_locale
that lives in firefox, and this version has never been initialized.
Boom!
Now the main question is: why do we see interposition occur for _S_once
but not for _S_initialize_once()
?
If we step back for a minute, there is a fundamental distinction between these symbols: one is a function symbol, the other is a variable symbol.
And indeed, the Firefox build system uses the -Bsymbolic-function
flag!
The ld man page describes it as follows:
-Bsymbolic-functions When creating a shared library, bind references to global function symbols to the definition within the shared library, if any. This option is only meaningful on ELF platforms which support shared libraries.
As opposed to:
-Bsymbolic When creating a shared library, bind references to global symbols to the definition within the shared library, if any. Normally, it is possible for a program linked against a shared library to override the definition within the shared library. This option is only meaningful on ELF platforms which support shared libraries.
Nailed it!
The crash occurs because this flag makes us use a weird variant of symbol interposition, where symbol interposition happens for variable symbols like _S_once
and _S_classic
but not for function symbols like _S_initialize_once()
.
This results in a mismatch regarding how we access global variables: exported global variables are unique thanks to interposition, whereas every non-interposed function will access its own version of any non-exported global variable.
With all the knowledge that we have now gathered, it is easy to write a reproducer that does not involve any Firefox code:
/* main.cc */ #include <iostream> extern void pain(); int main() { pain(); std::cout << "[main] " << std::locale::classic().name() <<"\n"; return 0; } /* pain.cc */ #include <iostream> void pain() { std::cout << "[pain] " << std::locale::classic().name() <<"\n"; } # Makefile all: $(CXX) pain.cc -fPIC -shared -o libpain.so -static-libstdc++ -Wl,-Bsymbolic-functions $(CXX) main.cc -fPIC -c -o main.o $(CC) main.o -fPIC -o main /usr/lib/gcc/x86_64-redhat-linux/13/libstdc++.a -L. -Wl,-rpath=. -lpain -Wl,-Bsymbolic-functions ./main clean: $(RM) libpain.so main
Understanding the bug is one step, and solving it is yet another story.
Should it be considered a libstdc++ bug that the code for locales is not compatible with -static-stdlibc++ -Bsymbolic-functions
?
It feels like combining these flags is a very nice way to dig our own grave, and that seems to be the opinion of the libstdc++ maintainers indeed.
Overall, perhaps the strangest part of this story is that this combination did not cause any trouble up until now.
Therefore, we suggested to the maintainer of the package to stop using -static-libstdc++
.
There are other ways to use a different libstdc++ than available on the system, such as using dynamic linking and setting an RPATH
to link with a bundled version.
Doing that allowed them to successfully deploy a fixed version of the package.
A few days after that, with the official release of Firefox 120, we noticed a very significant bump in volume for the same crash signature. Not again!
This time the volume was coming exclusively from users of NixOS 23.05, and it was huge!
After we shared the conclusions from our beta investigation with them, the maintainers of NixOS were able to quickly associate the crash with an issue that had not yet been backported for 23.05 and was causing the compiler to behave like -static-libstdc++
.
To avoid such mess in the future, we added detection for this particular setup in Firefox’s configure.
We are grateful to the people who have helped fix this issue, in particular:
The post Option Soup: the subtle pitfalls of combining compiler flags appeared first on Mozilla Hacks - the Web developer blog.
]]>Puppeteer now supports the next-generation, cross-browser WebDriver BiDi standard. This new protocol makes it easy for web developers to write automated tests that work across multiple browser engines.
The post Puppeteer Support for the Cross-Browser WebDriver BiDi Standard appeared first on Mozilla Hacks - the Web developer blog.
]]>We are pleased to share that Puppeteer now supports the next-generation, cross-browser WebDriver BiDi standard. This new protocol makes it easy for web developers to write automated tests that work across multiple browser engines.
The WebDriver BiDi protocol is supported starting with Puppeteer v21.6.0. When calling puppeteer.launch
pass in "firefox"
as the product option, and "webDriverBiDi"
as the protocol option:
const browser = await puppeteer.launch({
product: 'firefox',
protocol: 'webDriverBiDi',
})
You can also use the "webDriverBiDi"
protocol when testing in Chrome, reflecting the fact that WebDriver BiDi offers a single standard for modern cross-browser automation.
In the future we expect "webDriverBiDi"
to become the default protocol when using Firefox in Puppeteer.
Puppeteer has had experimental support for Firefox based on a partial re-implementation of the proprietary Chrome DevTools Protocol (CDP). This approach had the advantage that it worked without significant changes to the existing Puppeteer code. However the CDP implementation in Firefox is incomplete and has significant technical limitations. In addition, the CDP protocol itself is not designed to be cross browser, and undergoes frequent breaking changes, making it unsuitable as a long-term solution for cross-browser automation.
To overcome these problems, we’ve worked with the WebDriver Working Group at the W3C to create a standard automation protocol that meets the needs of modern browser automation clients: this is WebDriver BiDi. For more details on the protocol design and how it compares to the classic HTTP-based WebDriver protocol, see our earlier posts.
As the standardization process has progressed, the Puppeteer team has added a WebDriver BiDi backend in Puppeteer, and provided feedback on the specification to ensure that it meets the needs of Puppeteer users, and that the protocol design enables existing CDP-based tooling to easily transition to WebDriver BiDi. The result is a single protocol based on open standards that can drive both Chrome and Firefox in Puppeteer.
Not yet; WebDriver BiDi is still a work in progress, and doesn’t yet cover the full feature set of Puppeteer.
Compared to the Chrome+CDP implementation, there are some feature gaps, including support for accessing the cookie store, network request interception, some emulation features, and permissions. These features are actively being standardized and will be integrated as soon as they become available. For Firefox, the only missing feature compared to the Firefox+CDP implementation is cookie access. In addition, WebDriver BiDi already offers improvements, including better support for multi-process Firefox, which is essential for testing some websites. More information on the complete set of supported APIs can be found in the Puppeteer documentation, and as new WebDriver-BiDi features are enabled in Gecko we’ll publish details on the Firefox Developer Experience blog.
Nevertheless, we believe that the WebDriver-based Firefox support in Puppeteer has reached a level of quality which makes it suitable for many real automation scenarios. For example at Mozilla we have successfully ported our Puppeteer tests for pdf.js from Firefox+CDP to Firefox+WebDriver BiDi.
We currently don’t have a specific timeline for removing CDP support. However, maintaining multiple protocols is not a good use of our resources, and we expect WebDriver BiDi to be the future of remote automation in Firefox. If you are using the CDP support outside of the context of Puppeteer, we’d love to hear from you (see below), so that we can understand your use cases, and help transition to WebDriver BiDi.
For any issues you experience when porting Puppeteer tests to BiDi, please open issues in the Puppeteer issue tracker, unless you can verify the bug is in the Firefox implementation, in which case please file a bug on Bugzilla.
If you are currently using CDP with Firefox, please join the #webdriver matrix channel so that we can discuss your use case and requirements, and help you solve any problems you encounter porting your code to WebDriver BiDi.
Update: The Puppeteer team have published “Harness the Power of WebDriver BiDi: Chrome and Firefox Automation with Puppeteer“.
The post Puppeteer Support for the Cross-Browser WebDriver BiDi Standard appeared first on Mozilla Hacks - the Web developer blog.
]]>A month ago, we introduced our Nightly package for Debian-based Linux distributions. Today, we are proud to announce we made our .deb package available for Developer Edition and Beta!
The post Firefox Developer Edition and Beta: Try out Mozilla’s .deb package! appeared first on Mozilla Hacks - the Web developer blog.
]]>A month ago, we introduced our Nightly package for Debian-based Linux distributions. Today, we are proud to announce we made our .deb
package available for Developer Edition and Beta!
We’ve set up a new APT repository for you to install Firefox as a .deb
package. These packages are compatible with the same Debian and Ubuntu versions as our traditional binaries.
Your feedback is invaluable, so don’t hesitate to report any issues you encounter to help us improve the overall experience.
Adopting Mozilla’s Firefox .deb
package offers multiple benefits:
.deb
is integrated into Firefox’s release process,.deb
package, simply follow these steps:<code># Create a directory to store APT repository keys if it doesn't exist:
sudo install -d -m 0755 /etc/apt/keyrings
# Import the Mozilla APT repository signing key:
wget -q <a class="c-link" href="https://packages.mozilla.org/apt/repo-signing-key.gpg" target="_blank" rel="noopener noreferrer" data-stringify-link="https://packages.mozilla.org/apt/repo-signing-key.gpg" data-sk="tooltip_parent">https://packages.mozilla.org/apt/repo-signing-key.gpg</a> -O- | sudo tee /etc/apt/keyrings/packages.mozilla.org.asc > /dev/null
# The fingerprint should be 35BAA0B33E9EB396F59CA838C0BA5CE6DC6315A3
gpg -n -q --import --import-options import-show /etc/apt/keyrings/packages.mozilla.org.asc | awk '/pub/{getline; gsub(/^ +| +$/,""); print "\n"$0"\n"}'
# Next, add the Mozilla APT repository to your sources list:
echo "deb [signed-by=/etc/apt/keyrings/packages.mozilla.org.asc] <a class="c-link" href="https://packages.mozilla.org/apt" target="_blank" rel="noopener noreferrer" data-stringify-link="https://packages.mozilla.org/apt" data-sk="tooltip_parent">https://packages.mozilla.org/apt</a> mozilla main" | sudo tee -a /etc/apt/sources.list.d/mozilla.list > /dev/null
# Update your package list and install the Firefox .deb package:
sudo apt-get update && sudo apt-get install firefox-beta # Replace "beta" by "devedition" for Developer Edition
fr
in the example below with the desired language code:sudo apt-get install firefox-beta-l10n-fr
sudo apt-get update
:apt-cache search firefox-beta-l10n
The post Firefox Developer Edition and Beta: Try out Mozilla’s .deb package! appeared first on Mozilla Hacks - the Web developer blog.
]]>We're thrilled to announce the first release of llamafile, inviting the open source community to join this groundbreaking project.
With llamafile, you can effortlessly convert large language model (LLM) weights into executables. Imagine transforming a 4GB file of LLM weights into a binary that runs smoothly on six different operating systems, without requiring installation.
The post Introducing llamafile appeared first on Mozilla Hacks - the Web developer blog.
]]>A special thanks to Justine Tunney of the Mozilla Internet Ecosystem (MIECO), who co-authored this blog post.
Today we’re announcing the first release of llamafile and inviting the open source community to participate in this new project.
llamafile lets you turn large language model (LLM) weights into executables.
Say you have a set of LLM weights in the form of a 4GB file (in the commonly-used GGUF format). With llamafile you can transform that 4GB file into a binary that runs on six OSes without needing to be installed.
This makes it dramatically easier to distribute and run LLMs. It also means that as models and their weights formats continue to evolve over time, llamafile gives you a way to ensure that a given set of weights will remain usable and perform consistently and reproducibly, forever.
We achieved all this by combining two projects that we love: llama.cpp (a leading open source LLM chatbot framework) with Cosmopolitan Libc (an open source project that enables C programs to be compiled and run on a large number of platforms and architectures). It also required solving several interesting and juicy problems along the way, such as adding GPU and dlopen() support to Cosmopolitan; you can read more about it in the project’s README.
This first release of llamafile is a product of Mozilla’s innovation group and developed by Justine Tunney, the creator of Cosmopolitan. Justine has recently been collaborating with Mozilla via MIECO, and through that program Mozilla funded her work on the 3.0 release (Hacker News discussion) of Cosmopolitan. With llamafile, Justine is excited to be contributing more directly to Mozilla projects, and we’re happy to have her involved.
llamafile is licensed Apache 2.0, and we encourage contributions. Our changes to llama.cpp itself are licensed MIT (the same license used by llama.cpp itself) so as to facilitate any potential future upstreaming. We’re all big fans of llama.cpp around here; llamafile wouldn’t have been possible without it and Cosmopolitan.
We hope llamafile is useful to you and look forward to your feedback.
The post Introducing llamafile appeared first on Mozilla Hacks - the Web developer blog.
]]>Mozilla has just launched the AI Guide, a collaborative hub for developers to join forces, inspire each other, and lead the way in groundbreaking generative AI advancements. The AI Guide’s initial focus begins with language models and the aim is to become a collaborative community-driven resource covering other types of models.
The post Mozilla AI Guide Launch with Summarization Code Example appeared first on Mozilla Hacks - the Web developer blog.
]]>The Mozilla AI Guide has launched and we welcome you to read through and get acquainted with it. You can access it here
Our vision is for the AI Guide to be the starting point for every new developer to the space and a place to revisit for clarity and inspiration, ensuring that AI innovations enrich everyday life. The AI Guide’s initial focus begins with language models and the aim is to become a collaborative community-driven resource covering other types of models.
To start the first few sections in the Mozilla AI Guide go in-depth on the most asked questions about Large Language Models (LLMs). AI Basics covers the concepts of AI, ML, LLMs, what these concepts mean and how they are related. This section also breaks down the pros and cons of using an LLM. Language Models 101 continues to build on the shared knowledge of AI basics and dives deeper into the next level with language models. It will answer questions such as “What does ‘training’ an ML model mean” or “What is ‘human in the loop’ approach?”
We will jump to the last section on Choosing ML Models and demonstrate in code below what can be done using open source models to summarize certain text. You can access the Colab Notebook here or continue reading:
Unlike other guides, this one is designed to help pick the right model for whatever it is you’re trying to do, by:
We’re going to hone in on “text summarization” as our first task.
Great question. Most available LLMs worth their salt can do many tasks, including summarization, but not all of them may be good at what specifically you want them to do. We should figure out how to evaluate whether they actually can or not.
Also, many of the current popular LLMs are not open, are trained on undisclosed data and exhibit biases. Responsible AI use requires careful choices, and we’re here to help you make them.
Finally, most large LLMs require powerful GPU compute to use. While there are many models that you can use as a service, most of them cost money per API call. Unnecessary when some of the more common tasks can be done at good quality with already available open models and off-the-shelf hardware.
Over the last few decades, engineers have been blessed with being able to onboard by starting with open source projects, and eventually shipping open source to production. This default state is now at risk.
Yes, there are many open models available that do a great job. However, most guides don’t discuss how to get started with them using simple steps and instead bias towards existing closed APIs.
Funding is flowing to commercial AI projects, who have larger budgets than open source contributors to market their work, which inevitably leads to engineers starting with closed source projects and shipping expensive closed projects to production.
We’re going to:
For simplicity’s sake, let’s grab Mozilla’s Trustworthy AI Guidelines in string form
Note that in the real world, you will likely have to use other libraries to extract content for any particular file type.
import textwrap
content = """Mozilla's "Trustworthy AI" Thinking Points:
PRIVACY: How is data collected, stored, and shared? Our personal data powers everything from traffic maps to targeted advertising. Trustworthy AI should enable people to decide how their data is used and what decisions are made with it.
FAIRNESS: We’ve seen time and again how bias shows up in computational models, data, and frameworks behind automated decision making. The values and goals of a system should be power aware and seek to minimize harm. Further, AI systems that depend on human workers should protect people from exploitation and overwork.
TRUST: People should have agency and control over their data and algorithmic outputs, especially considering the high stakes for individuals and societies. For instance, when online recommendation systems push people towards extreme, misleading content, potentially misinforming or radicalizing them.
SAFETY: AI systems can carry high risk for exploitation by bad actors. Developers need to implement strong measures to protect our data and personal security. Further, excessive energy consumption and extraction of natural resources for computing and machine learning accelerates the climate crisis.
TRANSPARENCY: Automated decisions can have huge personal impacts, yet the reasons for decisions are often opaque. We need to mandate transparency so that we can fully understand these systems and their potential for harm."""
Great. Now we’re ready to start summarizing.
The AI space is moving so fast that it requires a tremendous amount of catching up on scientific papers each week to understand the lay of the land and the state of the art.
It’s some effort for an engineer who is brand new to AI to:
For the working engineer on a deadline, this is problematic. There’s not much centralized discourse on working with open source AI models. Instead there are fragmented X (formerly Twitter) threads, random private groups and lots of word-of-mouth transfer.
However, once we have a workflow to address all of the above, you will have the means to forever be on the bleeding age of published AI research
For now, we recommend Huggingface and their large directory of open models broken down by task. This is a great starting point. Note that larger LLMs are also included in these lists, so we will have to filter.
In this huge list of summarization models, which ones do we choose?
We don’t know what any of these models are trained on. For example, a summarizer trained on news articles vs Reddit posts will perform better on news articles.
What we need is a set of metrics and benchmarks that we can use to do apples-to-apples comparisons of these models.
The steps below can be used to evaluate any available model for any task. It requires hopping between a few sources of data for now, but we will be making this a lot easier moving forward.
Steps:
The easiest way to do this is using Papers With Code, an excellent resource for finding the latest scientific papers by task that also have code repositories attached.
First, filter Papers With Code’s “Text Summarization” datasets by most cited text-based English datasets.
Let’s pick (as of this writing) the most cited dataset — the “CNN/DailyMail” dataset. Usually most cited is one marker of popularity.
Now, you don’t need to download this dataset. But we’re going to review the info Papers With Code have provided to learn more about it for the next step. This dataset is also available on Huggingface.
You want to check 3 things:
First, check the license. In this case, it’s MIT licensed, which means it can be used for both commercial and personal projects.
Next, see if the papers using this dataset are recent. You can do this by sorting Papers in descending order. This particular dataset has many papers from 2023 – great!
Finally, let’s check whether the data is from a credible source. In this case, the dataset was generated by IBM in partnership with the University of Montréal. Great.
Now, let’s dig into how we can evaluate models that use this dataset.
Next, we look for measured metrics that are common across datasets for the summarization task. BUT, if you’re not familiar with the literature on summarization, you have no idea what those are.
To find out, pick a “Subtask” that’s close to what you’d like to see. We’d like to summarize the CNN article we pulled down above, so let’s choose “Abstractive Text Summarization”.
Now we’re in business! This page contains a significant amount of new information.
There are mentions of three new terms: ROUGE-1, ROUGE-2 and ROUGE-L. These are the metrics that are used to measure summarization performance.
There is also a list of models and their scores on these three metrics – this is exactly what we’re looking for.
Assuming we’re looking at ROUGE-1 as our metric, we now have the top 3 models that we can evaluate in more detail. All 3 are close to 50, which is a promising ROUGE score (read up on ROUGE).
OK, we have a few candidates, so let’s pick a model that will run on our local machines. Many models get their best performance when running on GPUs, but there are many that also generate summaries fast on CPUs. Let’s pick one of those to start – Google’s Pegasus.
# first we install huggingface's transformers library
%pip install transformers sentencepiece
Then we find Pegasus on Huggingface. Note that part of the datasets Pegasus was trained on includes CNN/DailyMail which bodes well for our article summarization. Interestingly, there’s a variant of Pegasus from google that’s only trained on our dataset of choice, we should use that.
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
import torch
# Set the seed, this will help reproduce results. Changing the seed will
# generate new results
from transformers import set_seed
set_seed(248602)
# We're using the version of Pegasus specifically trained for summarization
# using the CNN/DailyMail dataset
model_name = "google/pegasus-cnn_dailymail"
# If you're following along in Colab, switch your runtime to a
# T4 GPU or other CUDA-compliant device for a speedup
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the tokenizer
tokenizer = PegasusTokenizer.from_pretrained(model_name)
# Load the model
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(device)
# Tokenize the entire content
batch = tokenizer(content, padding="longest", return_tensors="pt").to(device)
# Generate the summary as tokens
summarized = model.generate(**batch)
# Decode the tokens back into text
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
# Compare
def compare(original, summarized_text):
print(f"Article text length: {len(original)}\n")
print(textwrap.fill(summarized_text, 100))
print()
print(f"Summarized length: {len(summarized_text)}")
compare(content, summarized_text)
Article text length: 1427 Trustworthy AI should enable people to decide how their data is used.<n>values and goals of a system should be power aware and seek to minimize harm.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security. Summarized length: 320
Alright, we got something! Kind of short though. Let’s see if we can make the summary longer…
set_seed(860912)
# Generate the summary as tokens, with a max_new_tokens
summarized = model.generate(**batch, max_new_tokens=800)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Trustworthy AI should enable people to decide how their data is used.<n>values and goals of a system should be power aware and seek to minimize harm.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security. Summarized length: 320
Well, that didn’t really work. Let’s try a different approach called ‘sampling’. This allows the model to pick the next word according to its conditional probability distribution (specifically, the probability that said word follows the word before).
We’ll also be setting the ‘temperature’. This variable works to control the levels of randomness and creativity in the generated output.
set_seed(118511)
summarized = model.generate(**batch, do_sample=True, temperature=0.8, top_k=0)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points:.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data. Summarized length: 193
Shorter, but the quality is higher. Adjusting the temperature up will likely help.
set_seed(108814)
summarized = model.generate(**batch, do_sample=True, temperature=1.0, top_k=0)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points:.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security.<n>We need to mandate transparency so that we can fully understand these systems and their potential for harm. Summarized length: 325
Now let’s play with one other generation approach called top_k sampling — instead of considering all possible next words in the vocabulary, the model only considers the top ‘k’ most probable next words.
This technique helps to focus the model on likely continuations and reduces the chances of generating irrelevant or nonsensical text.
It strikes a balance between creativity and coherence by limiting the pool of next-word choices, but not so much that the output becomes deterministic.
set_seed(226012)
summarized = model.generate(**batch, do_sample=True, top_k=50)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points look at ethical issues surrounding automated decision making.<n>values and goals of a system should be power aware and seek to minimize harm.People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security. Summarized length: 355
Finally, let’s try top_p sampling — also known as nucleus sampling, is a strategy where the model considers only the smallest set of top words whose cumulative probability exceeds a threshold ‘p’.
Unlike top_k which considers a fixed number of words, top_p adapts based on the distribution of probabilities for the next word. This makes it more dynamic and flexible. It helps create diverse and sensible text by allowing less probable words to be selected when the most probable ones don’t add up to ‘p’.
set_seed(21420041)
summarized = model.generate(**batch, do_sample=True, top_p=0.9, top_k=50)
summarized_decoded = tokenizer.batch_decode(summarized, skip_special_tokens=True)
summarized_text = summarized_decoded[0]
compare(content, summarized_text)
# saving this for later.
pegasus_summarized_text = summarized_text
Article text length: 1427 Mozilla's "Trustworthy AI" Thinking Points:.<n>People should have agency and control over their data and algorithmic outputs.<n>Developers need to implement strong measures to protect our data and personal security.<n>We need to mandate transparency so that we can fully understand these systems and their potential for harm. Summarized length: 325
To continue with the code example and see a test with another model, and to learn how to evaluate ML model results (a whole another section), click here to view the Python Notebook and click “Open in Colab” to experiment with your own custom code.
Note this guide will be constantly updated and new sections on Data Retrieval, Image Generation, and Fine Tuning will be coming next.
Shortly after today’s launch of the Mozilla AI Guide, we will be publishing our community contribution guidelines. It will provide guidance on the type of content developers can contribute and how it can be shared. Get ready to share any great open source AI projects, implementations, video and audio models.
Together, we can bring together a cohesive, collaborative and responsible AI community.
A special thanks to Kevin Li and Pradeep Elankumaran who pulled this great blog post together.
The post Mozilla AI Guide Launch with Summarization Code Example appeared first on Mozilla Hacks - the Web developer blog.
]]>To deliver against our vision and enable a better online experience for everyone, we’ve been working hard on making Firefox even faster. We’re extremely happy to report that this has resulted in a significant improvement in speed over the past year.
The post Down and to the Right: Firefox Got Faster for Real Users in 2023 appeared first on Mozilla Hacks - the Web developer blog.
]]>One of the biggest challenges for any software is to determine how changes impact user experience in the real world. Whether it’s the processing speed of video editing software or the smoothness of a browsing experience, there’s only so much you can tell from testing in a controlled lab environment. While local experiments can provide plenty of metrics, improvements to those metrics may not translate to a better user experience.
This can be especially challenging with complex client software running third-party code like Firefox, and it’s a big reason why we’ve undertaken the Speedometer 3 effort alongside other web browsers. Our goal is to build performance tests that simulate real-world user experiences so that browsers have better tools to drive improvements for real users on real webpages. While it’s easy to see that benchmarks have improved in Firefox throughout the year as a result of this work, what we really care about is how much those wins are being felt by our users.
In order to measure the user experience, Firefox collects a wide range of anonymized timing metrics related to page load, responsiveness, startup and other aspects of browser performance. Collecting data while holding ourselves to the highest standards of privacy can be challenging. For example, because we rely on aggregated metrics, we lack the ability to pinpoint data from any particular website. But perhaps even more challenging is analyzing the data once collected and drawing actionable conclusions. In the future we’ll talk more about these challenges and how we’re addressing them, but in this post we’d like to share how some of the metrics that are fundamental to how our users experience the browser have improved throughout the year.
Let’s start with page load. First Contentful Paint (FCP) is a better metric for felt performance than the `onload` event. We’re tracking the time it takes between receiving the first byte from the network to FCP. This tells us how much faster we are giving feedback to the user that the page is successfully loading, so it’s a critical metric for understanding the user experience. While much of this is up to web pages themselves, if the browser improves performance across the board, we expect this number to go down.
We can see that this time improved from roughly 250ms at the start of the year to 215ms in October. This means that a user receives feedback on page loads almost 15% faster than they did at the start of the year. And it’s important to note that this is all the result of optimization work that didn’t even explicitly target pageload.
In order to understand where this improvement is coming from, let’s look at another piece of timing data: the amount of time that was spent executing JavaScript code during a pageload. Here we are going to look at the 95th percentile, representing the most JS heavy pages and highlighting a big opportunity for us to remove friction for users.
This shows the 95th percentile improving from ~1560ms at the beginning of the year, to ~1260ms in October. This represents a considerable improvement of 300ms, or almost 20%, and is likely responsible for a significant portion of the reduced FCP times. This makes sense, since Speedometer 3 work has led to significant optimizations to the SpiderMonkey JavaScript engine (a story for another post).
We’d also like to know how responsive pages are after they are loaded. For example, how smooth is the response when typing on the keyboard as I write this blogpost! The primary metric we collect here is the “keypress present latency”; the time between a key being pressed on the keyboard and its result being presented onto the screen. Rendering some text to the screen may sound simple, but there’s a lot going on to make that happen – especially when web pages run main thread JavaScript to respond to the keypress event. Most typing is snappy and primarily limited by hardware (e.g. the refresh rate of the monitor), but it’s extremely disruptive when it’s not. This means it’s important to mitigate the worst cases, so we’ll again look at the 95th percentile.
Once again we see a measurable improvement. The 95th percentile hovered around 65ms for most of the year and dropped to under 59ms after the Firefox 116 and 117 releases in August. A 10% improvement to the slowest keypresses means users are experiencing more instantaneous feedback and fewer disruptions while typing.
We’ve been motivated by the improvements we’re seeing in our telemetry data, and we’re convinced that our efforts this year are having a positive effect on Firefox users. We have many more optimizations in the pipeline and will share more details about those and our overall progress in future posts.
The post Down and to the Right: Firefox Got Faster for Real Users in 2023 appeared first on Mozilla Hacks - the Web developer blog.
]]>Protecting user privacy is a core element of Mozilla’s vision for the web and the internet at large. In pursuit of this vision, we’re pleased to announce new partnerships with Fastly and Divvi Up to deploy privacy-preserving technology in Firefox.
The post Built for Privacy: Partnering to Deploy Oblivious HTTP and Prio in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>Protecting user privacy is a core element of Mozilla’s vision for the web and the internet at large. In pursuit of this vision, we’re pleased to announce new partnerships with Fastly and Divvi Up to deploy privacy-preserving technology in Firefox.
Mozilla builds a number of tools that help people defend their privacy online, but the need for these tools reflects a world where companies view invasive data collection as necessary for building good products and making money. A zero-sum game between privacy and business interests is not a healthy state of affairs. Therefore, we dedicate considerable effort to developing and advancing new technologies that enable businesses to achieve their goals without compromising peoples’ privacy. This is a focus of our work on web standards, as well as in how we build Firefox itself.
Building an excellent browser while maintaining a high standard for privacy sometimes requires this kind of new technology. For example: we put a lot of effort into keeping Firefox fast. This involves extensive automated testing, but also monitoring how it’s performing for real users. Firefox currently reports generic performance metrics like page-load time but does not associate those metrics with specific sites, because doing so would reveal peoples’ browsing history. These internet-wide averages are somewhat informative but not particularly actionable. Sites are constantly deploying code changes and occasionally those changes can trigger performance bugs in browsers. If we knew that a specific site got much slower overnight, we could likely isolate the cause and fix it. Unfortunately, we lack that visibility today, which hinders our ability to make Firefox great.
This is a classic problem in data collection: We want aggregate data, but the naive way to get it involves collecting sensitive information about individual people. The solution is to develop technology that delivers the same insights while keeping information about any individual person verifiably private.
In recent years, Mozilla has worked with others to advance two such technologies — Oblivious HTTP and the Prio-based Distributed Aggregation Protocol (DAP) — towards being proper internet standards that are practical to deploy in production. Oblivious HTTP works by routing encrypted data through an intermediary to conceal its source, whereas DAP/Prio splits the data into two shares and sends each share to a different server [1]. Despite their different shapes, both technologies rely on a similar principle: By processing the data jointly across two independent parties, they ensure neither party holds the information required to reveal sensitive information about someone.
We therefore need to partner with another independent and trustworthy organization to deploy each technology in Firefox. Having worked for some time to develop and validate both technologies in staging environments, we’ve now taken the next step to engage Fastly to operate an OHTTP relay and Divvi Up to operate a DAP aggregator. Both Fastly and ISRG (the nonprofit behind Divvi Up and Let’s Encrypt) have excellent reputations for acting with integrity, and they have staked those reputations on the faithful operation of these services. So even in a mirror universe where we tried to persuade them to cheat, they have a strong incentive to hold the line.
Our objective at Mozilla is to develop viable alternatives to the things that are wrong with the internet today and move the entire industry by demonstrating that it’s possible to do better. In the short term, these technologies will help us keep Firefox competitive while adhering to our longstanding principles around sensitive data. Over the long term, we want to see these kinds of strong privacy guarantees become the norm, and we will continue to work towards such a future.
Footnotes:
[1] Each approach is best-suited to different scenarios, which is why we’re investing in both. Oblivious HTTP is more flexible and can be used in interactive contexts, whereas DAP/Prio can be used in situations where the payload itself might be identifying.
The post Built for Privacy: Partnering to Deploy Oblivious HTTP and Prio in Firefox appeared first on Mozilla Hacks - the Web developer blog.
]]>