Skip to content

Demographic Bias of Vision-Language Foundation Models in Medical Imaging

License

Notifications You must be signed in to change notification settings

YyzHarry/vlm-fairness

Repository files navigation

Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging

License

[Paper]

Summary: Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically marginalized groups such as females or Black patients. The manifestation of such biases could systematically delay essential medical care for certain patient subgroups. In this study, we investigate the algorithmic fairness of state-of-the-art vision-language foundation models in chest X-ray diagnosis across five globally-sourced datasets. Our findings reveal that compared to board-certified radiologists, these foundation models consistently underdiagnose marginalized groups, with even higher rates seen in intersectional subgroups, such as Black female patients. Such demographic biases present over a wide range of pathologies and demographic attributes. Further analysis of the model embedding uncovers its significant encoding of demographic information. Deploying AI systems with these biases in medical imaging can intensify pre-existing care disparities, posing potential challenges to equitable healthcare access and raising ethical questions about their clinical application.

Dataset

To download all the datasets used in this study, please follow instructions in DataSources.md.

As the original image files are often high resolution, we cache the images as downsampled copies to speed training up for certain datasets. To do so, run

python -m scripts.cache_cxr --data_path <data_path> --dataset <dataset>

where datasets can be mimic or vindr. This process is required for vindr, and is optional for the remaining datasets.

Model Checkpoints

This repo uses CheXzero as a driving example for vision-language models. Download model checkpoints of CheXzero and save them in the ./checkpoints directory.

Zero-Shot Evaluation

python -m zero_shot \
       --dataset <dataset> \
       --split <split> \
       --template <name_of_your_prompt_template> \
       --data_dir <data_path> \
       --model_dir <model_path> \
       --predictions_dir <output_path>

Acknowledgements

This code is partly based on the open-source implementations from CheXzero and SubpopBench.

Citation

If you find this code or idea useful, please cite our work:

@article{yang2024demographic,
  title={Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging},
  author={Yuzhe Yang and Yujia Liu and Xin Liu and Avanti Gulhane and Domenico Mastrodicasa and Wei Wu and Edward J Wang and Dushyant W Sahani and Shwetak Patel},
  journal={arXiv preprint arXiv:2402.14815},
  year={2024}
}

Contact

If you have any questions, feel free to contact us through email ([email protected]) or GitHub issues. Enjoy!