Automated abnormality detection in lower extremity radiographs using deep learning

Varma, Maya; Lu, Mandy; Gardner, Rachel; Dunnmon, Jared; Khandwala, Nishith; Rajpurkar, Pranav; Long, Jin; Beaulieu, Christopher; Shpanskaya, Katie; Fei-Fei, Li; Lungren, Matthew P.; Patel, Bhavik N.

doi:10.1038/s42256-019-0126-0

Article
Published: 09 December 2019

Automated abnormality detection in lower extremity radiographs using deep learning

Maya Varma¹,
Mandy Lu¹^Â na1,
Rachel Gardner¹^Â na1,
Jared Dunnmon¹,
Nishith Khandwala¹,
Pranav Rajpurkar¹,
Jin Long²,
Christopher Beaulieu³,
Katie Shpanskaya³,
Li Fei-Fei¹,
Matthew P. Lungren³^Â na1 &
â€¦
Bhavik N. PatelÂ ORCID: orcid.org/0000-0001-5157-9903³^Â na1Â

Nature Machine Intelligence volumeÂ 1,Â pages 578â€“583 (2019)Cite this article

1633 Accesses
26 Altmetric
Metrics details

Subjects

Abstract

Musculoskeletal disorders are a major healthcare challenge around the world. We investigate the utility of convolutional neural networks (CNNs) in performing generalized abnormality detection on lower extremity radiographs. We also explore the effect of pretraining, dataset size and model architecture on model performance to provide recommendations for future deep learning analyses on extremity radiographs, especially when access to large datasets is challenging. We collected a large dataset of 93,455 lower extremity radiographs of multiple body parts, with each exam labelled as normal or abnormal. A 161-layer densely connected, pretrained CNN achieved an AUC-ROC of 0.880 (sensitivityâ€‰=â€‰0.714, specificityâ€‰=â€‰0.961) on this abnormality classification task. Our findings show that a single CNN model can be effectively utilized for the identification of diverse abnormalities in highly variable radiographs of multiple body parts, a result that holds potential for improving patient triage and assisting with diagnostics in resource-limited settings.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Categorization of patients in training, validation and test sets.**

**Fig. 3: Grad-CAM visualizations for abnormal lower extremities.**

A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs

Article Open access 27 April 2021

Automated abnormality classification of chest radiographs using deep convolutional neural networks

Article Open access 14 May 2020

Radiologists can visually predict mortality risk based on the gestalt of chest radiographs comparable to a deep learning network

Article Open access 01 October 2021

Data availability

We are releasing our de-identified test set as part of this manuscript. This dataset includes radiographs from 182 patients and demonstrates class balance across normal and abnormal labels as well as the four types of lower extremity (foot, hip, knee and ankle). In addition, two board-certified radiologists manually refined all labels, which guarantees a high level of accuracy. The dataset is available at https://aimi.stanford.edu/lera-lower-extremity-radiographs-2.

Code availability

Our deep learning training framework is available at: https://github.com/maya124/MSK-LE.

References

Yelin, E., Weinstein, S. & King, T. The burden of musculoskeletal diseases in the United States. Semin. Arthritis Rheum. 46, 259â€“60. (2016).
ArticleÂ Google ScholarÂ
Amin, S., Achenbach, S. J., Atkinson, E. J., Khosla, S. & Melton, L. J. III Trends in fracture incidence: a population-based study over 20 years. J. Bone Miner. Res. 29, 581â€“589 (2014).
ArticleÂ Google ScholarÂ
Gyftopoulos, S. et al. Changing musculoskeletal extremity imaging utilization from 1994 through 2013: a Medicare beneficiary perspective. Am. J. Roentgenol. 209, 1103â€“1109 (2017).
ArticleÂ Google ScholarÂ
Lee, C. S., Nagy, P. G., Weaver, S. J. & Newman-Toker, D. E. Cognitive and system factors contributing to diagnostic errors in radiology. Am. J. Roentgenol. 201, 611â€“617 (2013).
ArticleÂ Google ScholarÂ
Bhargavan, M., Kaye, A. H., Forman, H. P. & Sunshine, J. H. Workload of radiologists in United States in 2006â€“2007 and trends since 1991â€“1992. Radiology 252, 458â€“467 (2009).
ArticleÂ Google ScholarÂ
Rajpurkar, P. et al. MURA: large dataset for abnormality detection in musculoskeletal radiographs. Preprint at https://arxiv.org/abs/1712.06957 (2017).
Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).
ArticleÂ Google ScholarÂ
Thian, Y. L. et al. Convolutional neural networks for automated fracture detection and localization on wrist radiographs. Radiology: Artificial Intelligence 1, e180001 (2019).
Google ScholarÂ
Huh, M., Agrawal, P. & Efros, A. A. What makes ImageNet good for transfer learning? Preprint at https://arxiv.org/abs/1608.08614 (2016).
Rajpurkar, P. et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. Preprint at https://arxiv.org/abs/1711.05225 (2017).
Larson, D. B. et al. Performance of a deep-learning neural network model in assessing skeletal maturity on pediatric hand radiographs. Radiology. 287, 313â€“22. (2018).
ArticleÂ Google ScholarÂ
Antony, J., McGuinness, K., Oâ€™Connor, N. E. & Moran K. Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In Proceedings of the International Conference on Pattern Recognition 1195â€“1200 (2017).
Bi, L., Kim, J., Kumar, A. & Feng, D. Automatic liver lesion detection using cascaded deep residual networks. Preprint at https://arxiv.org/abs/1704.02703 (2017).
Zhang, R. et al. Automatic detection and classification of colorectal polyps by transferring low-level CNN features from nonmedical domain. IEEE J. Biomed. Health Inform. 21, 41â€“47 (2017).
ArticleÂ Google ScholarÂ
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402â€“2410 (2016).
ArticleÂ Google ScholarÂ
Greenspan, H. et al. Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35, 1153â€“1159 (2016).
ArticleÂ Google ScholarÂ
Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122â€“1131 (2018).
ArticleÂ Google ScholarÂ
Yan, C. et al. Weakly supervised deep learning for thoracic disease classification and localization on chest X-rays. Preprint at https://arxiv.org/abs/1807.06067 (2018).
Bar, Y. et al. Chest pathology detection using deep learning with non-medical training. In Proceedings of the International Symposium on Biomedical Imaging 294â€“297 (2015).
Olczak, J. et al. Artificial intelligence for analyzing orthopedic trauma radiographs: deep learning algorithmsâ€”are they on par with humans for diagnosing fractures? Acta Orthop. 88, 581â€“586 (2017).
ArticleÂ Google ScholarÂ
Lindsey, R. et al. Deep neural network improves fracture detection by clinicians. Proc. Natl Acad. Sci. USA 115, 11591â€“11596 (2018).
ArticleÂ MathSciNetÂ Google ScholarÂ
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2921â€“2929 (IEEE, 2016).
Chartrand, G. et al. Deep learning: a primer for radiologists. Radiographics 37, 2113â€“31. (2017).
ArticleÂ Google ScholarÂ
Yosinski, J., Clune, J., Bengio, Y. & Lipson H. How transferable are features in deep neural networks? In Proceedings of the 27th International Neural Information Processing Systems Conference 3320â€“3328 (MIT Press, 2014).
Dunnmon, J. A. et al. Assessment of convolutional neural networks for automated classification of chest radiographs. Radiology. 290, 537â€“544 (2019).
ArticleÂ Google ScholarÂ
Gale, W., Oakden-Rayner, L., Carneiro, G., Bradley, A. P. & Palmer, L. J. Detecting hip fractures with radiologist-level performance using deep neural networks. Preprint at https://arxiv.org/abs/1711.06504 (2017).
Krupinski, E. A., Berbaum, K. S., Caldwell, R. T., Schartz, K. M. & Kim, J. Long radiology workdays reduce detection and accommodation accuracy. J. Am. Coll. Radiol. 7, 698â€“704 (2010).
ArticleÂ Google ScholarÂ
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211â€“252 (2015).
ArticleÂ MathSciNetÂ Google ScholarÂ
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 770â€“778 (IEEE, 2016).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 4700â€“4708 (IEEE, 2017).
He, K., Zhang, X., Ren, S. & Sun J. Delving deep into rectifiers. Surpassing human-level performance on ImageNet classification. In Proceedings of the International Conference on Computer Vision 1026â€“1034 (2015).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 44, 837â€“845 (1988).
ArticleÂ Google ScholarÂ

Download references

Acknowledgements

This study was supported by the Stanford Center for Artificial Intelligence in Medicine and Imaging (AIMI). The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under award no. R01LM012966 and Stanford Child Health Research Institute (Stanford NIH-NCATS-CTSA grant #UL1 TR001085). This research used data or services provided by STARR (STAnford medicine Research data Repository) a clinical data warehouse made possible by the Stanford School of Medicine Research Office.

Author information

These authors contributed equally: Mandy Lu, Rachel Gardner, Matthew P. Lungren, Bhavik N. Patel.

Authors and Affiliations

Department of Computer Science, Stanford University, Stanford, CA, USA
Maya Varma,Â Mandy Lu,Â Rachel Gardner,Â Jared Dunnmon,Â Nishith Khandwala,Â Pranav RajpurkarÂ &Â Li Fei-Fei
Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
Jin Long
Department of Radiology, Stanford University School of Medicine, Stanford, CA, USA
Christopher Beaulieu,Â Katie Shpanskaya,Â Matthew P. LungrenÂ &Â Bhavik N. Patel

Authors

Maya Varma
View author publications
You can also search for this author in PubMedÂ Google Scholar
Mandy Lu
View author publications
You can also search for this author in PubMedÂ Google Scholar
Rachel Gardner
View author publications
You can also search for this author in PubMedÂ Google Scholar
Jared Dunnmon
View author publications
You can also search for this author in PubMedÂ Google Scholar
Nishith Khandwala
View author publications
You can also search for this author in PubMedÂ Google Scholar
Pranav Rajpurkar
View author publications
You can also search for this author in PubMedÂ Google Scholar
Jin Long
View author publications
You can also search for this author in PubMedÂ Google Scholar
Christopher Beaulieu
View author publications
You can also search for this author in PubMedÂ Google Scholar
Katie Shpanskaya
View author publications
You can also search for this author in PubMedÂ Google Scholar
Li Fei-Fei
View author publications
You can also search for this author in PubMedÂ Google Scholar
Matthew P. Lungren
View author publications
You can also search for this author in PubMedÂ Google Scholar
Bhavik N. Patel
View author publications
You can also search for this author in PubMedÂ Google Scholar

Contributions

All authors contributed extensively to this work. M.V., M.L. and R.G. designed the methodology and algorithms, implemented models, analysed results and wrote the manuscript. B.N.P. and M.P.L. oversaw the entire project and helped with study design, methodology development and manuscript writing. N.K. and P.R. provided technical advice and manuscript feedback. J.D. and J.L. contributed to statistical analyses and writing the manuscript. C.B. and K.S. assisted with data collection and labelling. L.F.-F. provided resources and advice.

Corresponding author

Correspondence to Bhavik N. Patel.

Ethics declarations

Competing interests

There was no industry support or other funding for this work. There are no conflicts of interests that pertain specifically to this work. However, some of the authors are consultants for medical industry. M.P.L. is supported by the National Library of Medicine of the NIH (R01LM012966). B.N.P. has grant support from GE. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or GE. M.P.L.â€™s activities not related to this Article include positions as shareholder and advisory board member for Segmed Inc., Nines.ai and Bunker Hill. M.V., R.G., M.L., N.K., P.R., J.L. and K.S. are not employees or consultants for industry and had control of the data and the analysis.

Additional information

Publisherâ€™s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary tables and figures.

Reporting summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Varma, M., Lu, M., Gardner, R. et al. Automated abnormality detection in lower extremity radiographs using deep learning. Nat Mach Intell 1, 578â€“583 (2019). https://doi.org/10.1038/s42256-019-0126-0

Download citation

Received: 13 March 2019
Accepted: 01 November 2019
Published: 09 December 2019
Issue Date: December 2019
DOI: https://doi.org/10.1038/s42256-019-0126-0