Extended Data Fig. 5: Quantitative evaluation of reader and AI system performance with a 12-month follow-up interval for ground-truth cancer-positive status.
From: International evaluation of an AI system for breast cancer screening
![Extended Data Fig. 5](http://media.springernature.com/full/springer-static/esm/art%3A10.1038%2Fs41586-019-1799-6/MediaObjects/41586_2019_1799_Fig9_ESM.jpg)
Because a 12-month follow-up interval is unlikely to encompass a subsequent screening exam in either country, readerâmodel comparisons on retrospective clinical data may be skewed by the gatekeeper effect (Extended Data Fig. 4). See Fig. 2 for comparison with longer time intervals. a, Performance of the AI system on UK data. This plot was derived from a total of 25,717 eligible examples, including 274 positives. The AI system achieved an AUC of 0.966 (95% CI 0.954, 0.977). b, Performance of the AI system on US data. This plot was derived from a total of 2,770 eligible examples, including 359 positives. The AI system achieved an AUC of 0.883 (95% CI 0.859, 0.903). c, Reader performance. When computing reader metrics, we excluded cases for which the reader recommended repeat mammography to address technical issues. In the US data, the performance of radiologists could only be assessed on the subset of cases for which a BI-RADS grade was available.