Abstract
Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 /Â 30Â days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The de-identified teledermatology data used in this study are not publicly available due to restrictions in the data-sharing agreement.
Code availability
The deep learning framework (TensorFlow) used in this study is available at https://www.tensorflow.org/. The training framework (Estimator) is available at https://www.tensorflow.org/guide/estimators. The deep learning architecture (Inception-v4) is available at https://github.com/tensorflow/models/blob/master/research/slim/nets/inception_v4.py.
References
Hay, R. J. et al. The global burden of skin disease in 2010: an analysis of the prevalence and impact of skin conditions. J. Invest. Dermatol. 134, 1527â1534 (2014).
Lowell, B. A., Froelich, C. W., Federman, D. G. & Kirsner, R. S. Dermatology in primary care: prevalence and patient disposition. J. Am. Acad. Dermatol. 45, 250â255 (2001).
Awadalla, F., Rosenbaum, D. A., Camacho, F., Fleischer, A. B. Jr & Feldman, S. R. Dermatologic disease in family medicine. Fam. Med. 40, 507â511 (2008).
Feng, H., Berk-Krauss, J., Feng, P. W. & Stein, J. A. Comparison of dermatologist density between urban and rural counties in the United States. JAMA Dermatol. 154, 1265â1271 (2018).
Resneck, J. & Kimball, A. B. The dermatology workforce shortage. J. Am. Acad. Dermatol. 50, 50â54 (2004).
Johnson, M. L. On teaching dermatology to nondermatologists. Arch. Dermatol. 130, 850â852 (1994).
Ramsay, D. L. & Weary, P. E. Primary care in dermatology: whose role should it be? J. Am. Acad. Dermatol. 35, 1005â1008 (1996).
The Distribution of the US Primary Care Workforce (Agency for Healthcare Research & Quality, 2012); https://www.ahrq.gov/research/findings/factsheets/primary/pcwork3/index.html
Seth, D., Cheldize, K., Brown, D. & Freeman, E. F. Global burden of skin disease: inequities and innovations. Curr. Dermatol. Rep. 6, 204â210 (2017).
Federman, D. G., Concato, J. & Kirsner, R. S. Comparison of dermatologic diagnoses by primary care practitioners and dermatologists. A review of the literature. Arch. Fam. Med. 8, 170â172 (1999).
Moreno, G., Tran, H., Chia, A. L. K., Lim, A. & Shumack, S. Prospective study to assess general practitionersâ dermatological diagnostic skills in a referral setting. Australas. J. Dermatol. 48, 77â82 (2007).
Tran, H., Chen, K., Lim, A. C., Jabbour, J. & Shumack, S. Assessing diagnostic skill in dermatology: a comparison between general practitioners and dermatologists. Australas. J. Dermatol. 46, 230â234 (2005).
Federman, D. G. & Kirsner, R. S. The abilities of primary care physicians in dermatology: implications for quality of care. Am. J. Manag. Care 3, 1487â1492 (1997).
UpToDate https://www.uptodate.com/home
Cutrone, M. & Grimalt, R. Dermatological image search engines on the Internet: do they work? J. Eur. Acad. Dermatol. Venereol. 21, 175â177 (2007).
Yim, K. M., Florek, A. G., Oh, D. H., McKoy, K. & Armstrong, A. W. Teledermatology in the United States: an update in a dynamic era. Telemed. e-Health 24, 691â697 (2018).
Whited, J. D. et al. Clinical course outcomes for store and forward teledermatology versus conventional consultation: a randomized trial. J. Telemed. Telecare 19, 197â204 (2013).
Mounessa, J. S. et al. A systematic review of satisfaction with teledermatology. J. Telemed. Telecare 24, 263â270 (2018).
Cruz-Roa, A. A., Arevalo Ovalle, J. E., Madabhushi, A. & González Osorio, F. A. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med. Image Comput. Comput. Assist. Inter. 16, 403â410 (2013).
Codella, N. C. F. et al. Skin lesion analysis toward melanoma detection: a challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), hosted by the International Skin Imaging Collaboration (ISIC). In 2018 IEEE 15th International Symposium on Biomedical Imaging (IEEE, 2018); https://doi.org/10.1109/isbi.2018.8363547
Yuan, Y., Chao, M. & Lo, Y.-C. Automatic skin lesion segmentation using deep fully convolutional networks with jaccard distance. IEEE Trans. Med. Imaging 36, 1876â1886 (2017).
Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836â1842 (2018).
Brinker, T. J. et al. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur. J. Cancer 113, 47â54 (2019).
Maron, R. C. et al. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur. J. Cancer 119, 57â65 (2019).
Okuboyejo, D. A., Olugbara, O. O. & Odunaike, S. A. Automating skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850â854 (International Association of Engineers, 2013).
Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938â947 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115â118 (2017).
Han, S. S. et al. Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: automatic construction of onychomycosis datasets by region-based convolutional deep neural network. PLoS ONE 13, e0191493 (2018).
Sun, X., Yang, J., Sun, M. & Wang, K. A benchmark for automatic visual classification of clinical skin disease images. Proceedings of the European Conference on Computer Vision (ECCV) 2016 206â222 (Springer, 2016); https://doi.org/10.1007/978-3-319-46466-4_13
Boer, A. & Nischal, K.C. www.derm101.com: a growing online resource for learning dermatology and dermatopathology. Indian J. Dermatol. Venereol. Leprol. 73, 138â140 (2007).
Wilmer, E. N. et al. Most common dermatologic conditions encountered by dermatologists and nondermatologists. Cutis 94, 285â292 (2014).
Yang, J., Sun, X., Liang, J. & Rosin, P. L. Clinical skin lesion diagnosis using representations inspired by dermatologist criteria. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018); https://doi.org/10.1109/cvpr.2018.00137
Okuboyejo, D. A. Towards automation of skin disease diagnosis using image classification. In Proceedings of the World Congress on Engineering and Computer Science Vol. 2, 850â854 (International Association of Engineers, 2013).
Mishra, S., Imaizumi, H. & Yamasaki, T. Interpreting fine-grained dermatological classification by deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (IEEE, 2019).
Guyatt, G. Usersâ Guides to the Medical Literature: Essentials of Evidence-Based Clinical Practice 3rd edn (McGraw-Hill Education/Medical, 2015).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Br. J. Surg. 102, 148â158 (2015).
Webber, W., Moffat, A. & Zobel, J. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst. 28, 1â38 (2010).
Krauss, J. C., Boonstra, P. S., Vantsevich, A. V. & Friedman, C. P. Is the problem list in the eye of the beholder? An exploration of consistency across physicians. J. Am. Med. Inform. Assoc. 23, 859â865 (2016).
Eng, C., Liu, Y. & Bhatnagar, R. Measuring clinicianâmachine agreement in differential diagnoses for dermatology. Br. J. Dermatol. https://doi.org/10.1111/bjd.18609 (2019).
Sundararajan, M., Taly, A., & Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning vol. 70, 3319â3328 (2017).
Karimkhani, C. et al. Global skin disease morbidity and mortality: an update from the global burden of disease study 2013. JAMA Dermatol. 153, 406â412 (2017).
Stern, R. S. & Nelson, C. The diminishing role of the dermatologist in the office-based care of cutaneous diseases. J. Am. Acad. Dermatol. 29, 773â777 (1993).
Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2017 (GBD 2017) Results (Institute for Health Metrics and Evaluation (IHME), 2018); http://ghdx.healthdata.org/gbd-results-tool
Romano, C., Maritati, E. & Gianni, C. Tinea incognito in Italy: a 15-year survey. Mycoses 49, 383â387 (2006).
Prabhu, V. et al. Prototypical clustering networks for dermatological disease diagnosis. In Proceedings of the 4th Conference on Machine Learning for Health Care (MLHC, 2019).
He, S. Y. et al. Self-reported pigmentary phenotypes and race are significant but incomplete predictors of Fitzpatrick skin phototype in an ethnically diverse population. J. Am. Acad. Dermatol. 71, 731â737 (2014).
Barnett, M. L., Boddupalli, D., Nundy, S. & Bates, D. W. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw. Open 2, e190096 (2019).
SNOMED home page. SNOMED http://www.snomed.org/
Simpson, C. R., Anandan, C., Fischbacher, C., Lefevre, K. & Sheikh, A. Will systematized nomenclature of medicine-clinical terms improve our understanding of the disease burden posed by allergic disorders? Clin. Exp. Allergy 37, 1586â1593 (2007).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In Thirty-First AAAI Conference on Artificial Intelligence 4278â4284 (AAAI, 2017).
Snoek, C. G. M., Worring, M. & Smeulders, A. W. M. Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia 399â402 (ACM, 2005); https://doi.org/10.1145/1101149.1101236
Dean, J. et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 1223â1231 (NIPS, 2012).
Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Preprint at https://arxiv.org/pdf/1502.03167.pdf (2015).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211â252 (2015).
Opitz, D. & Maclin, R. Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169â198 (1999).
Permutation feature importance. Azure Machine Learning Studio https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/permutation-feature-importance.
Chihara, L. M. & Hesterberg, T. C. Mathematical Statistics with Resampling and R (Wiley, 2018).
Hahn, S. Understanding noninferiority trials. Korean J. Pediatr. 55, 403â407 (2012).
Acknowledgements
We thank W. Chen, J. Yoshimi, X. Ji and Q. Duong for software infrastructure support for data collection. Thanks also go to G. Foti, K. Su, T. Saensuksopa, D. Wang, Y. Gao and L. Tran. We also appreciate the input of C. Chen, M. Howell and A. Paller for their feedback on the manuscript. Last, but not least, this work would not have been possible without the participation of the dermatologists, primary care physicians and nurse practitioners who reviewed cases for this study, and S. Bis who helped to establish the skin condition mapping.
Author information
Authors and Affiliations
Contributions
Yuan Liu, A.J., C.E., D.H.W., K.L. and D.C. prepared the dataset for usage. S.J.H., K.K. and R.H.-W. provided clinical expertise and guidance for the study. Yuan Liu, A.J., C.E., K.L., P.B., G.d.O.M., J.G., D.A., S.J.H. and K.K. worked on the technical, logistical and quality control aspects of label collection. S.J.H. and K.K. established the skin condition mapping. Yuan Liu, K.L., V.G. and D.C. developed the model. Yuan Liu, A.J., N.S. and V.N. performed statistical analysis and additional analysis. Yun Liu guided study design, analysis of the results and statistical analysis. S.G. studied the potential utility of the model. R.C.D. and D.C. initiated the project and led the overall development, with strategic guidance and executive support from G.S.C., L.H.P. and D.R.W. Yuan Liu, Yun Liu and S.J.H. prepared the manuscript with the assistance and feedback from all other co-authors. K.K. and S.J.H. performed the work at Google Health via Advanced Clinical. G.d.O.M. performed the work at Google Health via Adecco Staffing. N.S. performed the work at Google Health.
Corresponding author
Ethics declarations
Competing interests
K.K. and S.J.H. were consultants of Google LLC. R.H.-W. is an employee of the Medical University of Graz. G.d.O.M. is an employee of Adecco Staffing supporting Google LLC. This study was funded by Google LLC. The remaining authors are employees of Google LLC and own Alphabet stock as part of the standard compensation package. Yuan Liu, A.J., C.E., D.H.W., K.L., P.B., J.G., V.G., D.A., Yun Liu, R.C.D. and D.C. are inventors on a filed patent related to this work. The authors declare no other competing interests.
Additional information
Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Performance of the deep learning system (DLS) and clinicians, broken down for each of the 26 categories of skin conditions and âotherâ.
a, Top-1 and top-3 sensitivity of the DLS on validation set A (n=3,756). b, Top-1 and top-3 sensitivity of the DLS and three types of clinicians: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set B (n=963). Numbers in parentheses in the x-axes indicate the number of cases. Detailed breakdown of each clinician and the DLS performance on the subset of cases graded by each clinician are in Supplementary Table 8. Error bars indicate 95% CI (see Statistical Analysis).
Extended Data Fig. 2 Performance of the deep learning system (DLS) and the clinicians on the 419-way classification: dermatologists (Derm), primary care physicians (PCP), and nurse practitioners (NP) on validation set A (n=3,756) and validation set B (n=963).
a, Top-1 and top-3 accuracy for the DLS and clinicians across all cases and 419 categories of skin conditions. b, Average overlap (to assess the full differential diagnosis) of the DLS and clinicians. Error bars indicate 95% confidence intervals (see Statistical Analysis).
Supplementary information
Supplementary Information
Supplementary Methods, Figs. 1â10 and Tables 1â13.
Rights and permissions
About this article
Cite this article
Liu, Y., Jain, A., Eng, C. et al. A deep learning system for differential diagnosis of skin diseases. Nat Med 26, 900â908 (2020). https://doi.org/10.1038/s41591-020-0842-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-020-0842-3
This article is cited by
-
Risk factors for scabies in hospital: a systematic review
BMC Infectious Diseases (2024)
-
CNN-IKOA: convolutional neural network with improved Kepler optimization algorithm for image segmentation: experimental validation and numerical exploration
Journal of Big Data (2024)
-
Evaluation of artificial intelligence-powered screening for sexually transmitted infections-related skin lesions using clinical images and metadata
BMC Medicine (2024)
-
Artificial intelligence in the neonatal intensive care unit: the time is now
Journal of Perinatology (2024)
-
Reliable and easy-to-use calculating tool for the Nail Psoriasis Severity Index using deep learning
npj Systems Biology and Applications (2024)