There are various threats a user faces when browsing the web. Users may be tricked into sharing sensitive information like their passwords with a misleading or fake website, also called phishing. They may also be led into installing malicious software on their machines, called malware, which can collect personal data and also hold it for ransom. Google Chrome, henceforth called Chrome, enables its users to protect themselves from such threats on the internet. When Chrome users browse the web with Safe Browsing protections, Chrome uses the Safe Browsing service from Google to identify and ward off various threats.

Safe Browsing works in different ways depending on the user's preferences. In the most common case, Chrome uses the privacy-conscious Update API (Application Programming Interface) from the Safe Browsing service. This API was developed with user privacy in mind and ensures Google gets as little information about the user's browsing history as possible. If the user has opted-in to "Enhanced Protection" (covered in an earlier post) or "Make Searches and Browsing Better", Chrome shares limited additional data with Safe Browsing only to further improve user protection.

This post describes how Chrome implements the Update API, with appropriate pointers to the technical implementation and details about the privacy-conscious aspects of the Update API. This should be useful for users to understand how Safe Browsing protects them, and for interested developers to browse through and understand the implementation. We will cover the APIs used for Enhanced Protection users in a future post.

Threats on the Internet

When a user navigates to a webpage on the internet, their browser fetches objects hosted on the internet. These objects include the structure of the webpage (HTML), the styling (CSS), dynamic behavior in the browser (Javascript), images, downloads initiated by the navigation, and other webpages embedded in the main webpage. These objects, also called resources, have a web address which is called their URL (Uniform Resource Locator). Further, URLs may redirect to other URLs when being loaded. Each of these URLs can potentially host threats such as phishing websites, malware, unwanted downloads, malicious software, unfair billing practices, and more. Chrome with Safe Browsing checks all URLs, redirects or included resources, to identify such threats and protect users.

Safe Browsing Lists

Safe Browsing provides a list for each threat it protects users against on the internet. A full catalog of lists that are used in Chrome can be found by visiting chrome://safe-browsing/#tab-db-manager on desktop platforms.

A list does not contain unsafe web addresses, also referred to as URLs, in entirety; it would be prohibitively expensive to keep all of them in a device’s limited memory. Instead it maps a URL, which can be very long, through a cryptographic hash function (SHA-256), to a unique fixed size string. This distinct fixed size string, called a hash, allows a list to be stored efficiently in limited memory. The Update API handles URLs only in the form of hashes and is also called hash-based API in this post.

Further, a list does not store hashes in entirety either, as even that would be too memory intensive. Instead, barring a case where data is not shared with Google and the list is small, it contains prefixes of the hashes. We refer to the original hash as a full hash, and a hash prefix as a partial hash.

A list is updated following the Update API’s
request frequency section. Chrome also follows a back-off mode in case of an unsuccessful response. These updates happen roughly every 30 minutes, following the minimum wait duration set by the server in the list update response.

For those interested in browsing relevant source code, here’s where to look:

Source Code
  1. GetListInfos() contains all the lists, along with their associated threat types, the platforms they are used on, and their file names on disk.
  2. HashPrefixMap shows how the lists are stored and maintained. They are grouped by the size of prefixes, and appended together to allow quick binary search based lookups.

    How is hash-based URL lookup done

    As an example of a Safe Browsing list, let's say that we have one for malware, containing partial hashes of URLs known to host malware. These partial hashes are generally 4 bytes long, but for illustrative purposes, we show only 2 bytes.

    ['036b', '1a02', 'bac8', 'bb90']
    
    
    
    

    Whenever Chrome needs to check the reputation of a resource with the Update API, for example when navigating to a URL, it does not share the raw URL (or any piece of it) with Safe Browsing to perform the lookup. Instead, Chrome uses full hashes of the URL (and some combinations) to look up the partial hashes in the locally maintained Safe Browsing list. Chrome sends only these matched partial hashes to the Safe Browsing service. This ensures that Chrome provides these protections while respecting the user’s privacy. This hash-based lookup happens in three steps in Chrome:

    Step 1: Generate URL Combinations and Full Hashes

    When Google blocks URLs that host potentially unsafe resources by placing them on a Safe Browsing list, the malicious actor can host the resource on a different URL. A malicious actor can cycle through various subdomains to generate new URLs. Safe Browsing uses host suffixes to identify malicious domains that host malware in their subdomains. Similarly, malicious actors can also cycle through various subpaths to generate new URLs. So Safe Browsing also uses path prefixes to identify websites that host malware at various subpaths. This prevents malicious actors from cycling through subdomains or paths for new malicious URLs, allowing robust and efficient identification of threats.

    To incorporate these host suffixes and path prefixes, Chrome first computes the full hashes of the URL and some patterns derived from the URL. Following Safe Browsing API's URLs and Hashing specification, Chrome computes the full hashes of URL combinations by following these steps:

    1. First, Chrome converts the URL into a canonical format, as defined in the specification.
    2. Then, Chrome generates up to 5 host suffixes/variants for the URL.
    3. Then, Chrome generates up to 6 path prefixes/variants for the URL.
    4. Then, for the combined 30 host suffixes and path prefixes combinations, Chrome generates the full hash for each combination.

      Source Code
      1. V4LocalDatabaseManager::CheckBrowseURL is an example which performs a hash-based lookup.
      2. V4ProtocolManagerUtil::UrlToFullHashes creates the various URL combinations for a URL, and computes their full hashes.

        Example

        For instance, let's say that a user is trying to visit https://evil.example.com/blah#frag. The canonical url is https://evil.example.com/blah. The host suffixes to be tried are evil.example.com, and example.com. The path prefixes are / and /blah. The four combined URL combinations are evil.example.com/, evil.example.com/blah, example.com/, and example.com/blah.

        url_combinations = ["evil.example.com/", "evil.example.com/blah","example.com/", "example.com/blah"]
        full_hashes = ['1a02…28', 'bb90…9f', '7a9e…67', 'bac8…fa']
        
        
        
        

        Step 2: Search Partial Hashes in Local Lists

        Chrome then checks the full hashes of the URL combinations against the locally maintained Safe Browsing lists. These lists, which contain partial hashes, do not provide a decisive malicious verdict, but can quickly identify if the URL is considered not malicious. If the full hash of the URL does not match any of the partial hashes from the local lists, the URL is considered safe and Chrome proceeds to load it. This happens for more than 99% of the URLs checked.

        Source Code
        1. V4LocalDatabaseManager::GetPrefixMatches gets the matching partial hashes for the full hashes of the URL and its combinations.

          Example

          Chrome finds that three full hashes 1a02…28, bb90…9f, and bac8…fa match local partial hashes. We note that this is for demonstration purposes, and a match here is rare.

          Source Code
          1. V4GetHashProtocolManager::GetFullHashes performs the lookup for the full hashes for the matched partial hashes.

            Example

            Chrome sends the matched partial hashes 1a02, bb90, and bac8 to fetch the full hashes. The server returns full hashes that match these partial hashes, 1a02…28, bb90…ce, and bac8…01. Chrome finds that one of the full hashes matches with the full hash of the URL combination being checked, and identifies the malicious URL as hosting malware.

            Conclusion

            Safe Browsing protects Chrome users from various malicious threats on the internet. While providing these protections, Chrome faces challenges such as constraints in memory capacity, network bandwidth usage, and a dynamic threat landscape. Chrome is also mindful of the users’ privacy choices, and shares little data with Google.

            In a follow up post, we will cover the more advanced protections Chrome provides to its users who have opted in to “Enhanced Protection”.

            There are various threats a user faces when browsing the web. Users may be tricked into sharing sensitive information like their passwords with a misleading or fake website, also called phishing. They may also be led into installing malicious software on their machines, called malware, which can collect personal data and also hold it for ransom. Google Chrome, henceforth called Chrome, enables its users to protect themselves from such threats on the internet. When Chrome users browse the web with Safe Browsing protections, Chrome uses the Safe Browsing service from Google to identify and ward off various threats.

            Safe Browsing works in different ways depending on the user's preferences. In the most common case, Chrome uses the privacy-conscious Update API (Application Programming Interface) from the Safe Browsing service. This API was developed with user privacy in mind and ensures Google gets as little information about the user's browsing history as possible. If the user has opted-in to "Enhanced Protection" (covered in an earlier post) or "Make Searches and Browsing Better", Chrome shares limited additional data with Safe Browsing only to further improve user protection.

            This post describes how Chrome implements the Update API, with appropriate pointers to the technical implementation and details about the privacy-conscious aspects of the Update API. This should be useful for users to understand how Safe Browsing protects them, and for interested developers to browse through and understand the implementation. We will cover the APIs used for Enhanced Protection users in a future post.

            Threats on the Internet

            When a user navigates to a webpage on the internet, their browser fetches objects hosted on the internet. These objects include the structure of the webpage (HTML), the styling (CSS), dynamic behavior in the browser (Javascript), images, downloads initiated by the navigation, and other webpages embedded in the main webpage. These objects, also called resources, have a web address which is called their URL (Uniform Resource Locator). Further, URLs may redirect to other URLs when being loaded. Each of these URLs can potentially host threats such as phishing websites, malware, unwanted downloads, malicious software, unfair billing practices, and more. Chrome with Safe Browsing checks all URLs, redirects or included resources, to identify such threats and protect users.

            Safe Browsing Lists

            Safe Browsing provides a list for each threat it protects users against on the internet. A full catalog of lists that are used in Chrome can be found by visiting chrome://safe-browsing/#tab-db-manager on desktop platforms.

            A list does not contain unsafe web addresses, also referred to as URLs, in entirety; it would be prohibitively expensive to keep all of them in a device’s limited memory. Instead it maps a URL, which can be very long, through a cryptographic hash function (SHA-256), to a unique fixed size string. This distinct fixed size string, called a hash, allows a list to be stored efficiently in limited memory. The Update API handles URLs only in the form of hashes and is also called hash-based API in this post.

            Further, a list does not store hashes in entirety either, as even that would be too memory intensive. Instead, barring a case where data is not shared with Google and the list is small, it contains prefixes of the hashes. We refer to the original hash as a full hash, and a hash prefix as a partial hash.

            A list is updated following the Update API’s request frequency section. Chrome also follows a back-off mode in case of an unsuccessful response. These updates happen roughly every 30 minutes, following the minimum wait duration set by the server in the list update response.

            For those interested in browsing relevant source code, here’s where to look:

            Source Code

            1. GetListInfos() contains all the lists, along with their associated threat types, the platforms they are used on, and their file names on disk.
            2. HashPrefixMap shows how the lists are stored and maintained. They are grouped by the size of prefixes, and appended together to allow quick binary search based lookups.

            How is hash-based URL lookup done

            As an example of a Safe Browsing list, let's say that we have one for malware, containing partial hashes of URLs known to host malware. These partial hashes are generally 4 bytes long, but for illustrative purposes, we show only 2 bytes.

            ['036b', '1a02', 'bac8', 'bb90']
            

            Whenever Chrome needs to check the reputation of a resource with the Update API, for example when navigating to a URL, it does not share the raw URL (or any piece of it) with Safe Browsing to perform the lookup. Instead, Chrome uses full hashes of the URL (and some combinations) to look up the partial hashes in the locally maintained Safe Browsing list. Chrome sends only these matched partial hashes to the Safe Browsing service. This ensures that Chrome provides these protections while respecting the user’s privacy. This hash-based lookup happens in three steps in Chrome:

            Step 1: Generate URL Combinations and Full Hashes

            When Google blocks URLs that host potentially unsafe resources by placing them on a Safe Browsing list, the malicious actor can host the resource on a different URL. A malicious actor can cycle through various subdomains to generate new URLs. Safe Browsing uses host suffixes to identify malicious domains that host malware in their subdomains. Similarly, malicious actors can also cycle through various subpaths to generate new URLs. So Safe Browsing also uses path prefixes to identify websites that host malware at various subpaths. This prevents malicious actors from cycling through subdomains or paths for new malicious URLs, allowing robust and efficient identification of threats.

            To incorporate these host suffixes and path prefixes, Chrome first computes the full hashes of the URL and some patterns derived from the URL. Following Safe Browsing API's URLs and Hashing specification, Chrome computes the full hashes of URL combinations by following these steps:

            1. First, Chrome converts the URL into a canonical format, as defined in the specification.
            2. Then, Chrome generates up to 5 host suffixes/variants for the URL.
            3. Then, Chrome generates up to 6 path prefixes/variants for the URL.
            4. Then, for the combined 30 host suffixes and path prefixes combinations, Chrome generates the full hash for each combination.

            Source Code

            1. V4LocalDatabaseManager::CheckBrowseURL is an example which performs a hash-based lookup.
            2. V4ProtocolManagerUtil::UrlToFullHashes creates the various URL combinations for a URL, and computes their full hashes.

            Example

            For instance, let's say that a user is trying to visit https://evil.example.com/blah#frag. The canonical url is https://evil.example.com/blah. The host suffixes to be tried are evil.example.com, and example.com. The path prefixes are / and /blah. The four combined URL combinations are evil.example.com/, evil.example.com/blah, example.com/, and example.com/blah.

            url_combinations = ["evil.example.com/", "evil.example.com/blah","example.com/", "example.com/blah"]
            full_hashes = ['1a02…28', 'bb90…9f', '7a9e…67', 'bac8…fa']
            

            Step 2: Search Partial Hashes in Local Lists

            Chrome then checks the full hashes of the URL combinations against the locally maintained Safe Browsing lists. These lists, which contain partial hashes, do not provide a decisive malicious verdict, but can quickly identify if the URL is considered not malicious. If the full hash of the URL does not match any of the partial hashes from the local lists, the URL is considered safe and Chrome proceeds to load it. This happens for more than 99% of the URLs checked.

            Source Code

            1. V4LocalDatabaseManager::GetPrefixMatches gets the matching partial hashes for the full hashes of the URL and its combinations.

            Example

            Chrome finds that three full hashes 1a02…28, bb90…9f, and bac8…fa match local partial hashes. We note that this is for demonstration purposes, and a match here is rare.

            Step 3: Fetch Matching Full Hashes

            Next, Chrome sends only the matching partial hash (not the full URL or any particular part of the URL, or even their full hashes), to the Safe Browsing service's fullHashes.find method. In response, it receives the full hashes of all malicious URLs for which the full hash begins with one of the partial hashes sent by Chrome. Chrome checks the fetched full hashes with the generated full hashes of the URL combinations. If any match is found, it identifies the URL with various threats and their severities inferred from the matched full hashes.

            Source Code

            1. V4GetHashProtocolManager::GetFullHashes performs the lookup for the full hashes for the matched partial hashes.

            Example

            Chrome sends the matched partial hashes 1a02, bb90, and bac8 to fetch the full hashes. The server returns full hashes that match these partial hashes, 1a02…28, bb90…ce, and bac8…01. Chrome finds that one of the full hashes matches with the full hash of the URL combination being checked, and identifies the malicious URL as hosting malware.

            Conclusion

            Safe Browsing protects Chrome users from various malicious threats on the internet. While providing these protections, Chrome faces challenges such as constraints in memory capacity, network bandwidth usage, and a dynamic threat landscape. Chrome is also mindful of the users’ privacy choices, and shares little data with Google.

            In a follow up post, we will cover the more advanced protections Chrome provides to its users who have opted in to “Enhanced Protection”.

          Step 3: Fetch Matching Full Hashes

          Next, Chrome sends only the matching partial hash (not the full URL or any particular part of the URL, or even their full hashes), to the Safe Browsing service's fullHashes.find method. In response, it receives the full hashes of all malicious URLs for which the full hash begins with one of the partial hashes sent by Chrome. Chrome checks the fetched full hashes with the generated full hashes of the URL combinations. If any match is found, it identifies the URL with various threats and their severities inferred from the matched full hashes.

In an effort to showcase the breadth and depth of Black+ contributions to security and privacy fields, we’ve launched a profile series that aims to elevate and celebrate the Black+ voices in security and privacy we have here at Google.



Brooke Pearson manages the Privacy Sandbox program at Google, and her team's mission is to, “Create a thriving web ecosystem that is respectful of users and private by default.” Brooke lives this mission and it is what makes her an invaluable asset to the Chrome team and Google. 

In addition to her work advancing the fields of security and privacy, she is a fierce advocate for women in the workplace and for elevating the voices of her fellow Black+ practitioners in security and privacy. She has participated and supported the #ShareTheMicInCyber campaign since its inception.

Brooke is passionate about delivering privacy solutions that work and making browsing the web an inherently more private experience for users around the world.Why do you work in security or privacy?

I work in security and privacy to protect people and their personal information. It’s that simple. Security and privacy are two issues that are core to shaping the future of technology and how we interact with each other over the Internet. The challenges are immense, and yet the ability to impact positive change is what drew me to the field.

Tell us a little bit about your career journey to Google

My career journey into privacy does not involve traditional educational training in the field. In fact, my background is in public policy and communications, but when I transitioned to the technology industry, I realized that the most pressing policy issues for companies like Google surround the nascent field of privacy and the growing field of security.

After I graduated from college at Azusa Pacific University, I was the recipient of a Fulbright scholarship to Macau, where I spent one year studying Chinese and teaching English. I then moved to Washington D.C. where I initially worked for the State Department while finishing my graduate degree in International Public Policy at George Washington University. I had an amazing experience in that role and it afforded me some incredible networking opportunities and the chance to travel the world, as I worked in Afghanistan and Central Asia.

After about five years in the public sector, I joined Facebook as a Program Manager for the Global Public Policy team, initially focused on social good programs like Safety Check and Charitable Giving. Over time, I could see that the security team at Facebook was focused on fighting the proliferation of misinformation, and this called to me as an area where I could put my expertise in communication and geopolitical policy to work. So I switched teams and I've been in the security and privacy field ever since, eventually for Uber and now with Google's Chrome team.

At Google, privacy and security are at the heart of everything we do. Chrome is tackling some of the world's biggest security and privacy problems, and everyday my work impacts billions of people around the world. Most days, that's pretty daunting, but every day it's humbling and inspiring.

What is your security or privacy "soapbox"?

If we want to encourage people to engage in more secure behavior, we have to make it easy to understand and easy to act on. Every day we strive to make our users safer with Google by implementing security and privacy controls that are effective and easy for our users to use and understand.

As a program manager, I’ve learned that it is almost always more effective to offer a carrot than a stick, when it comes to security and privacy hygiene. I encourage all of our users to visit our Safety Center to learn all the ways Google helps you stay safe online, every day.

If you are interested in following Brooke’s work here at Google and beyond, please follow her on Twitter @brookelenet. We will be bringing you more profiles over the coming weeks and we hope you will engage with and share these with your network.

If you are interested in participating or learning more about #ShareTheMicInCyber, click here.


Why do you work in security or privacy?

I have been in the cyber world long enough to know how important it is for security and privacy to be top of mind and focus for organizations of all shapes and sizes. My passion lies in keeping users and Googlers safe. One of the main reasons I joined Google is its commitment to security and privacy.


Tell us a little bit about your career journey to Google...

I was fortunate to begin my cybersecurity career in the United States Government working at the Department of Energy, FBI, and the Intelligence Community. I transitioned to the private sector in 2017 and have been fortunate to lead talented security teams at Cardinal Health and Ford Motor Company.

My journey into cybersecurity was not traditional. I studied Political Science at Washington University in St. Louis, completed graduate education at George Mason University and Carnegie Mellon University. I honed my skills and expertise in this space through hands on experience and with the support of many amazing mentors. It has been the ride of a lifetime and I look forward to what is next.

To those thinking about making a career change or are just starting to get into security, my advice is don’t be afraid to ask for help.


What is your security or privacy "soapbox"?

At Google, we implement a model known as Federated Security, where our security teams partner across our Product Areas to enable security program maturity Google wide. Our Federated Security team believes in harnessing the power of relationship, engagement, and community to drive maturity into every product. Security and privacy are team sports – it takes business leaders and security leaders working together to secure and protect our digital and physical worlds.

If you are interested in following Rob’s work here at Google and beyond, please follow him on Twitter @RobDuhart. We will be bringing you more profiles over the coming weeks and we hope you will engage with and share these with your network.

If you are interested in participating or learning more about #ShareTheMicInCyber, click here.

If you are interested in participating or learning more about #ShareTheMicInCyber, click here.

Additionally, federated learning has enabled Gboard to train intelligent input models across many devices while keeping everything individual users type on their device. Today, the emoji is as common as punctuation - and have become the way for our users to express themselves in messaging. Our users want a way to have fresh and diversified emojis to better express their thoughts in messaging apps. Recently, we launched new on-device transformer models that are fine-tuned with federated learning in Gboard, to produce more contextual emoji predictions for English, Spanish and Portuguese.

Furthermore, following the success of privacy-preserving machine learning techniques, Gboard continues to leverage federated analytics to understand how Gboard is used from decentralized data. What we've learned from privacy-preserving analysis has let us make better decisions in our product.

When a user shares an emoji in a conversation, their phone keeps an ongoing count of which emojis are used. Later, when the phone is idle, plugged in, and connected to WiFi, Google’s federated analytics server invites the device to join a “round” of federated analytics data computation with hundreds of other participating phones. Every device involved in one round will compute the emoji share frequency, encrypt the result and send it a federated analytics server. Although the server can’t decrypt the data individually, the final tally of total emoji counts can be decrypted when combining encrypted data across devices. The aggregated data shows that the most popular emoji is 😂 in Whatsapp, 😭 in Roblox(gaming), and ✔ in Google Docs. Emoji 😷 moved up from 119th to 42nd in terms of frequency during COVID-19.

Gboard always has a strong commitment to Google’s Privacy Principles. Gboard strives to build privacy-preserving effortless input products for users to freely express their thoughts in 900+ languages while safeguarding user data. We will keep pushing the state of the art in smart input technologies on Android while safeguarding user data. Stay tuned!