Skip to main content
The Keyword

3 ways we are building equity into our health work

Illustration showing a woman wearing glasses at a computer, a doctor with a young patient and two people talking while one holds a tablet.

The goal of health equity is to ensure everyone has a fair and just opportunity to attain their highest level of health. The reality is the opposite for too many people — including people of color, women, those in rural communities and other historically marginalized populations. As Google’s Chief Health Equity officer, my team is committed to making sure we build AI-powered health tools responsibly and equitably.

At our annual health event, The Check Up, we unveiled three ways we are helping deliver a more equitable future.

Our recent research on identifying and mitigating biases

As medical AI rapidly evolves, it's critical we develop tools and resources that can be used to identify and mitigate biases that could negatively impact health outcomes. Our new research paper, “A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models," is a step in this direction. This paper provides a framework for how to assess if medical large language models (LLMs) may perpetuate historical biases and provides a collection of seven adversarial testing datasets called “EquityMedQA” as a guidepost.

These tools are based on literature regarding health inequities, actual model failures and participatory input from equity experts. We used these tools to evaluate our own large language models, and now they’re available to the research community and beyond.

A new framework to measure health equity within AI models

A group of health equity researchers, social scientists, clinicians, bioethicists, statisticians, and AI researchers came together across Google to develop a framework for building AI that avoids creating and reinforcing unfair bias.

This framework, which is called HEAL (Health Equity Assessment of Machine Learning performance), is designed to assess the likelihood that AI technology will perform equitably and to prevent AI models from being deployed that might make disparities worse — especially for groups that experience poorer health outcomes on average. The four-step process includes:

  1. Determining factors associated with health inequities and defining AI performance metrics.
  2. Identifying and quantifying pre-existing health outcome disparities.
  3. Measuring the performance of the AI tool for each subpopulation.
  4. Assessing the likelihood that the AI tool prioritizes performance with respect to health disparities.

Already, we’ve used this framework to test a dermatology AI model. The results showed that while this model performed equitably across race, ethnicity and sex subgroups, there were improvements we could make to perform better for older age groups. The framework found that when it came to evaluating cancerous conditions, like melanoma, the model performed equitably across age groups, but for non-cancer conditions, like eczema, it did not perform as well across the 70 and older age group.

We’ll continue to apply the framework to healthcare AI models in the future, and we’ll evolve and refine the framework in the process.

A more representative dataset to advance dermatology

Today, many dermatology datasets are not representative of the population, limiting developers from building equitable AI models. Current dataset images are often captured in a clinical setting and may not reflect different parts of the body, varying levels of severity of a condition or diverse skin tones, ages, genders and more. And they’re primarily focused on severe issues — like skin cancer — rather than more common issues like allergic, inflammatory or infectious conditions.

To create a more representative dataset of images, we partnered with Stanford Medicine on the Skin Condition Image Network (SCIN). Thousands of people contributed more than 10,000 real-world dermatology images to create this open-access dataset. Dermatologists and research teams then helped identify diagnoses on each image and labeled them based on two skin-tone scales to make sure it included an expansive collection of conditions and skin types.

Scientists and doctors can now use the SCIN dataset to help them develop tools to identify dermatological concerns, conduct dermatology-related research, and expose health professional students to more examples of skin conditions and their manifestations across different skin types.

We are early in this journey but we’re committed to making a difference. We believe that working with partners and sharing our learnings can help build a healthier future for everyone regardless of their background or location.