Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jul 12;23(7):e26151.
doi: 10.2196/26151.

Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study

Affiliations

Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorithm Development and Validation Study

Stanislav Nikolov et al. J Med Internet Res. .

Abstract

Background: Over half a million individuals are diagnosed with head and neck cancer each year globally. Radiotherapy is an important curative treatment for this disease, but it requires manual time to delineate radiosensitive organs at risk. This planning process can delay treatment while also introducing interoperator variability, resulting in downstream radiation dose differences. Although auto-segmentation algorithms offer a potentially time-saving solution, the challenges in defining, quantifying, and achieving expert performance remain.

Objective: Adopting a deep learning approach, we aim to demonstrate a 3D U-Net architecture that achieves expert-level performance in delineating 21 distinct head and neck organs at risk commonly segmented in clinical practice.

Methods: The model was trained on a data set of 663 deidentified computed tomography scans acquired in routine clinical practice and with both segmentations taken from clinical practice and segmentations created by experienced radiographers as part of this research, all in accordance with consensus organ at risk definitions.

Results: We demonstrated the model's clinical applicability by assessing its performance on a test set of 21 computed tomography scans from clinical practice, each with 21 organs at risk segmented by 2 independent experts. We also introduced surface Dice similarity coefficient, a new metric for the comparison of organ delineation, to quantify the deviation between organ at risk surface contours rather than volumes, better reflecting the clinical task of correcting errors in automated organ segmentations. The model's generalizability was then demonstrated on 2 distinct open-source data sets, reflecting different centers and countries to model training.

Conclusions: Deep learning is an effective and clinically applicable technique for the segmentation of the head and neck anatomy for radiotherapy. With appropriate validation studies and regulatory approvals, this system could improve the efficiency, consistency, and safety of radiotherapy pathways.

Keywords: UNet; artificial intelligence; contouring; convolutional neural networks; machine learning; radiotherapy; segmentation; surface DSC.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: GR, HM, CK, COH, and DC were paid contractors of DeepMind and Google Health.

Figures

Figure 1
Figure 1
A typical clinical pathway for radiotherapy. After a patient is diagnosed and the decision is made to treat with radiotherapy, a defined workflow aims to provide treatment that is both safe and effective. In the United Kingdom, the time delay between decision to treat and treatment delivery should be no greater than 31 days. Time-intensive manual segmentation and dose optimization steps can introduce delays to treatment.
Figure 2
Figure 2
Case selection from the University College London Hospitals and The Cancer Imaging Archive computed tomography data sets. A consort-style diagram demonstrating the application of inclusion and exclusion criteria to select the training, validation, and test sets used in this work. CT: computed tomography; HN_C: Head and Neck Carcinoma; N/A: not applicable; TCIA: The Cancer Imaging Archive; TCGA: The Cancer Genome Atlas; UCLH: University College London Hospitals; Val: validation.
Figure 3
Figure 3
Process for the segmentation of ground truth and radiographer organs at risk volumes. The flowchart illustrates how the ground truth segmentations were created and compared with independent radiographer segmentations and the model. For the ground truth, each computed tomography scan in The Cancer Imaging Archive test set was segmented first by a radiographer and peer reviewed by a second radiographer. This then went through one or more iterations of review and editing with a specialist oncologist before creating a ground truth used to compare with the segmentations produced by both the model and additional radiographer. CT: computed tomography.
Figure 4
Figure 4
3D U-Net model architecture. (a) At training time, the model receives 21 contiguous computed tomography slices, which are processed through a series of “down” blocks, a fully connected block, and a series of “up” blocks to create a segmentation prediction. (b) A detailed view of the convolutional residual down and up blocks and the residual fully connected block.
Figure 5
Figure 5
Illustrations of masks, surfaces, border regions, and the “overlapping” surface at tolerance τ.
Figure 6
Figure 6
2D illustration of the implementation of the surface Dice similarity coefficient. (a) A binary mask displayed as an image. The origin of the image raster is (0,0). (b) The surface points (red circles) are located in a raster that is shifted half of the raster spacing on each axis. Each surface point has 4 neighbors in 2D (8 neighbors in 3D). The local contour (blue line) assigned to each surface point (red circle) depends on the neighbor constellation.
Figure 7
Figure 7
Example results. Computed tomography (CT) image: axial slices at 5 representative levels from the raw CT scan of a male patient aged 55-59 years were selected from the University College London Hospitals data set (patient 20). These were selected to best demonstrate the organs at risks included in the work. The levels shown as 2D slices have been selected to demonstrate all 21 organs at risks included in this study. The window leveling has been adjusted for each to best display the anatomy present. Oncologist contour: the ground truth segmentation, as defined by experienced radiographers and arbitrated by a head and neck specialist oncologist. Model contour: segmentations produced by our model. Contour comparison: contoured by oncologist only (green region) or model only (yellow region). Best viewed on a display. CT: computed tomography.
Figure 8
Figure 8
Surface Dice similarity coefficient performance metric. (a) Illustration of the computation of the surface Dice similarity coefficient. Continuous line: predicted surface. Dashed line: ground truth surface. Black arrow: the maximum margin of deviation that may be tolerated without penalty, hereafter referred to by τ. Note that in our use case each organ at risk has an independently calculated value for τ. Green: acceptable surface parts (distance between surfaces ≤τ). Pink: unacceptable regions of the surfaces (distance between surfaces ≤τ). The proposed surface Dice similarity coefficient metric reports the good surface parts compared with the total surface (sum of predicted surface area and ground truth surface area). (b) Illustration of the determination of the organ-specific tolerance. Green: segmentation of an organ by oncologist A. Black: segmentation by oncologist B. Red: distances between the surfaces.
Figure 9
Figure 9
University College London Hospitals (UCLH) test set: quantitative performance of the model in comparison with radiographers. (a) The model achieves a surface Dice similarity coefficient similar to humans in all 21 organs at risk (on the UCLH held out test set) when compared with the gold standard for each organ at an organ-specific tolerance τ. Blue: our model; green: radiographers. (b) Performance difference between the model and the radiographers. Each blue dot represents a model-radiographer pair. The gray area highlights nonsubstantial differences (−5% to +5%). The box extends from the lower to upper quartile values of the data, with a line at the median. The whiskers indicate most extreme, nonoutlier data points. Where data lie outside, an IQR of 1.5 is represented as a circular flier. The notches represent the 95% CI around the median. DSC: Dice similarity coefficient; UCLH: University College London Hospitals.
Figure 10
Figure 10
Model generalizability to an independent test set from The Cancer Imaging Archive (TCIA). Quantitative performance of the model on TCIA test set in comparison with radiographers. (a) Surface Dice similarity coefficient (on the TCIA open-source test set) for the segmentations compared with the gold standard for each organ at an organ-specific tolerance τ. Blue: our model, green: radiographers. (b) Performance difference between the model and the radiographers. Each blue dot represents a model-radiographer pair. Red lines show the mean difference. The gray area highlights nonsubstantial differences (−5% to +5%). The box extends from the lower to upper quartile values of the data, with a line at the median. The whiskers indicate most extreme, nonoutlier data points. Where data lie outside, an IQR of 1.5 is represented as a circular flier. The notches represent the 95% CI around the median. DSC: Dice similarity coefficient; TCIA: The Cancer Imaging Archive.

Similar articles

Cited by

References

    1. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin. 2011;61(2):69–90. doi: 10.3322/caac.20107. doi: 10.3322/caac.20107. - DOI - DOI - PubMed
    1. Head and neck cancers incidence statistics. Cancer Research UK. [2018-02-08]. https://www.cancerresearchuk.org/health-professional/cancer-statistics/s....
    1. NCIN data briefing: potentially HPV-related head and neck cancers. National Cancer Intelligence Network. [2021-05-17]. http://www.ncin.org.uk/publications/data_briefings/potentially_hpv_relat....
    1. Profile of head and neck cancers in England: incidence, mortality and survival. Oxford Cancer Intelligence Unit. 2010. [2021-05-17]. http://www.ncin.org.uk/view?rid=69.
    1. Parkin DM, Boyd L, Walker LC. 16. The fraction of cancer attributable to lifestyle and environmental factors in the UK in 2010. Br J Cancer. 2011 Dec 06;105 Suppl 2:77–81. doi: 10.1038/bjc.2011.489. http://europepmc.org/abstract/MED/22158327 - DOI - PMC - PubMed