Abstract
Since nail psoriasis restricts the patientâs daily activities, therapeutic intervention based on reliable and reproducible evaluation is critical. The Nail Psoriasis Severity Index (NAPSI) is a validated scoring tool, but its usefulness is limited by interobserver variability. This study aimed to develop a reliable and accurate NAPSI scoring tool using deep learning. The tool âNAPSI calculatorâ includes two parts: nail detection from images and NAPSI scoring. NAPSI was annotated by nine nail experts who are board-certified dermatologists with sufficient experience in a specialized clinic for nail diseases. In the final test set, the âNAPSI calculatorâ correctly located 137/138 nails and scored NAPSI with higher accuracy than the compared six non-board-certified residents: 83.9% vs 65.7%; Pâ=â0.008 and four board-certified non-nail expert dermatologists: 83.9% vs 73.0%; Pâ=â0.005. The âNAPSI calculatorâ can be readily used in a clinical situation, contributing to raising the medical practice level for nail psoriasis.
Similar content being viewed by others
Introduction
Psoriasis is a common inflammatory skin disease in which 10â80% of patients suffer from nail lesions called nail psoriasis1. Nail psoriasis causes a disfiguring cosmetic problem and restricts daily activities, severely impairing the patientâs quality of life2. Furthermore, it is a risk factor for the development of psoriatic arthritis3. Accordingly, early therapeutic intervention based on reliable and reproducible evaluation is critical in medical practice.
Nail psoriasis shows various nail changes via inflammation of the nail matrix and bed; the former causes pitting, leukonychia, red spots in the lunula, and crumbling, and the latter causes oil drop discoloration, onycholysis, nail bed hyperkeratosis, and splinter hemorrhage (Fig. 1a). Focusing on the eight representative nail findings of psoriasis, the Nail Psoriasis Severity Index (NAPSI) was proposed as a scoring tool for nail psoriasis severity. NAPSI scoring divides the nail into four quadrants with horizontal and longitudinal lines. In each quadrant, nail matrix and bed psoriasis are evaluated for the presence of any of the four findings. Then, NAPSI is calculated by summing up them (Fig. 1b)4.
However, the usefulness of NAPSI is limited by interobserver variability and the required efforts for score acquisition and proficiency2,5. The criteria for judging whether a patientâs nail has psoriasis findings are less quantitative and partially subjective, causing interobserver variabilities among individual dermatologists and making its acquisition difficult. In fact, despite NAPSI taking only nine values (0â8), the dermatologistsâ scores for the images in the original article that proposed NAPSI were distributed in the range of five points or more4. This study developed an easy-to-use deep learning-based calculating tool that enables reliable and accurate NAPSI scoring.
Results
For accurate NAPSI scoring, the tool must first have an ability to recognize the nail accurately. Accordingly, we divided the process into two steps. The first step is nail detection from images (step 1). The second step is NAPSI scoring of the detected nails (step 2). First, training and performance evaluation of each step was conducted. Then, the final performance evaluation of the integrated tool (step 3: the âNAPSI calculatorâ) was conducted. All codes for constructing the system and statistical analyses were written in Python version 3.7.7 and PyTorch version 1.10.2.
Step 1. Nail detection
In step 1, the nails of the testing images were detected with 93.8% of mean average precision (mAP) on average. The performances were stable and their fluctuations were minute throughout the 10 trials (95% CI, 93.4â93.9%) (Fig. 2a). Nails partially out of the screen at the edges of the image were also detected correctly. Although nails that were small in terms of the image size, such as the toenails of the little toe, were sometimes undetected (Fig. 2c), generally the model demonstrated good detection accuracy (Fig. 2a, b).
Step 2. NAPSI calculation
The distributions of the number of images for each annotated NAPSI in step 2 are shown in Table 1. On average, NAPSI was calculated with 82.8% (95% CI, 81.7â83.9%) micro average accuracy (Fig. 3a). The proportion of images with zero-point (no error) and one-point error was 43.1% (95% CI, 42.1â44.1%) and 39.6% (95% CI, 38.3â40.9%), respectively. Among the incorrect images, the percentage of those with two-point and more than two-point errors was 12.4% (95% CI, 11.3â13.5%) and 4.8% (95% CI, 4.1â5.6%), respectively (Fig. 3b). In 10 trials, there was little variability in these percentages. The macro average accuracy with error within one point reached 82.7% (95% CI, 81.6â83.8%) (Fig. 3a). The aggregate results of 10 trials showed that the NAPSIs calculated by the deep learning model were concentrated within a one-point error range for each of the annotated NAPSI scores (Fig. 3c). The images analyzed by this model are demonstrated in Fig. 4, which suggests that it could adapt to a mix of a variety of nail matrix and bed psoriasis findings or non-psoriasis findings, such as melanonychia.
Step 3. the âNAPSI calculatorâ
The distribution of image counts of each annotated NAPSI in step 3 is shown in Table 2. The âNAPSI calculatorâ detected nails with 99.3% accuracy (137/138 nails). The average micro accuracies in 10 trials were 83.9% (95% CI, 81.6â86.2%) (Fig. 5a). The proportion of images with zero-point error (no error) was 47.5% (95% CI, 46.1â48.9%), and that with one-point error was 36.4% (95% CI, 33.8â39.1%). Conversely, the average accuracy of NAPSI scored by the six non-board-certified dermatology residents was 65.7% (95% CI, 54.5â76.9%), zero-point error (no error) was 32.9% (95% CI, 26.0â39.7%), and one-point error was 32.9% (95% CI, 28.2â37.5%). Those of the four board-certified dermatologists, non-nail experts, was 73.0% (95% CI, 66.8â79.2%), zero-point error (no error) was 39.1% (95% CI, 33.9â44.4%), and one-point error was 33.9% (95% CI, 28.9â38.9%). The accuracy of the âNAPSI calculatorâ was significantly higher than the six non-board-certified dermatology residents (p-value: 0.008) and the four certified dermatologists (p-value: 0.005). In particular, the percentage of images with zero-point error was higher in this tool (Fig. 5a). The heatmap of the aggregate results of 10 trials is shown in Fig. 5b, which showed that NAPSI scored by the âNAPSI calculatorâ concentrated within one-point error range for each score of annotated NAPSI similar to Step 2.
In one patient for whom the time course of NAPSI could be obtained, the âNAPSI calculatorâ assessed NAPSI at each time point similar to the annotated NAPSI, and correctly proposed a transient exacerbation and a final improvement of the disease status (Fig. 6). The time required for nail detection of one hand image and NAPSI calculation of one nail image was approximately 0.85 and 0.95âs, respectively, on a laptop with an Intel® CoreTM i7 processor (quad-core) and 16GB of memory.
Grad-CAM was performed to visualize which parts of the image the âNAPSI calculatorâ is focusing on to calculate NAPSI6. In many images, it was suggested that the âNAPSI calculatorâ focused on the nail psoriasis findings of NAPSI (Supplementary Fig. 1a). However, in some images, the model focused on areas of lack of findings, and the results were not sufficient to conclude that the model learned the nail psoriasis findings adequately (Supplementary Fig. 1b).
Discussion
Deep learning has been applied in various areas of medicine. In dermatology, research using deep learning has progressed in diagnosing malignant tumors, such as melanoma7, and recent studies have used it to diagnose various skin diseases8. Various useful scoring tools in the medical field are also expected to be used by deep learning, as many require professional examinations. In the medical management of psoriasis, NAPSI was developed as a valuable tool for evaluating the severity of nail psoriasis. However, its scoring criteria are less quantitative and partially subjective, causing interobserver variabilities and requiring efforts for acquisition and proficiency2,4. Considering this, scoring NAPSI by deep learning instead of humans could contribute to reliable scoring, making NAPSI more useful and effective for daily clinical practice.
To date, four studies on NAPSI using deep learning systems have been reported. Each study has its unique approach, but issues remain regarding the need for nail image preparation and verification of NAPSI accuracy. One study achieved excellent accuracy in recognizing each nail psoriasis finding but it requires a dedicated camera to photograph each nail individually. The accuracy of the final NAPSI was evaluated by comparison with a single dermatologist on only 10 images9. Second study successfully intervenes with a model for dividing the nail into four quadrants assuming the detection of nail lesions according to NAPSI, but the accuracy of lesion detection was not sufficient and the final NAPSI has not been evaluated10. Another study has developed models that utilize key point detection to identify fingernails and score modified NAPSI (mNAPSI). However, photography for the key point detection is required and the accuracy of detection, such as how well it can correspond to various hand shapes, has not been validated. Although mNAPSI can show a maximum of 14 points, the data in this study is skewed, with approximately 80% of the data showing zero or one point, with a maximum of six points11. The most recent study is to identify all the eight NAPSI findings on fingernails one by one using eight object detection models but regarding the cropping of the nail part, manual work such as classification and rotation of the images is required in advance, depending on the shape of the hand12. In this study, we aimed to construct a deep learning tool that can easily and accurately perform the calculation of NAPSI from clinical photographs taken under general conditions without any special equipment or preparation. It is also available to any doctors including non-dermatologists with a simple computer setup for using Python and PyTorch. We also trained our model using nail images of sufficient amount and disease severity and validated the accuracy of the calculated NAPSI in comparison with four board-certified dermatologists and six non-board-certified residents in dermatology.
Our deep learning system can automatically detect each nail within an image before calculating NAPSI. This is an important process because if this part is missing, each nail area must be manually cropped from the images. Furthermore, cropping out regions other than nails might reduce the risk of overfitting and interference from information other than nail findings. Our tool was able to recognize nails in an image with sufficient accuracy (99.3%, 137/138 nails). This high performance is attributed to the fact that images of the training set collected from Google search were rich in variation such as manicured nails, nail cartoons, and nails on strangely posed fingers, which could increase its generalizability for nail detection.
Because NAPSI takes discrete values from zero to eight, NAPSI calculation was handled as a classification problem, where each class number is not only a class name but also the NAPSI points. However, the cross-entropy loss function did not consider the degree of inaccuracy among classes. Therefore, we multiplied the cross-entropy loss by the squared difference between the calculated and annotated NAPSI. Eventually, the âNAPSI calculatorâ scored NAPSI generally within one-point error in every score of annotated NAPSI according to the learning strategy and it reached higher accuracy with lower variance than those of the six non-board-certified residents in dermatology (83.9% vs 63.7%) and the four board-certified dermatologists (83.9% vs 73.0%), indicating that the âNAPSI calculatorâ is not only highly valid but also reproducible and reliable (Fig. 5a, b). The âNAPSI calculatorâ also correctly evaluated a 16-month NAPSI course and suggested improved disease severity in one patient (Fig. 6). While the dermatologists who annotated the NAPSI were nail experts, the compared six and four doctors were non-board-certified dermatology residents and non-nail expert board-certified dermatologists. Therefore, the superiority of the âNAPSI calculatorâ could be attributed to the difference in the experience in managing nail psoriasis.
Overall, the âNAPSI calculatorâ performance was reasonably good but may have several limitations. First, images with an error of two or more points between the annotated and calculated NAPSI accounted for 17.2% of all images. These included challenging cases in which NAPSI scoring was difficult because the psoriasis findings were centrally located despite the small distribution area, or the nail matrix findings were too strong to assess the nail bed findings appropriately. It suggested that the cases difficult for dermatologists are also difficult for deep learning. Second, there is a bias in the variation of nail psoriasis images. The ratio of severe cases among our patients with nail psoriasis may be higher than that in the general psoriasis patient population. These sampling biases might have caused the overfitting of our hospitalâs nail psoriasis images (Table 1). Third, our model is designed to differentiate between the nail matrix and nail bed features that make up NAPSI, but it is not intended to identify each feature in detail. Ideally, NAPSI scoring requires a precise assessment of all eight findings. Although NAPSI is calculated by summing the nail matrix and the nail bed scores, these two are not determined by the distribution of findings rather than the specific type of each feature (Fig. 1). Given the high accuracy achieved by our model, it may not be necessary for the model to differentiate each of the eight findings in detail to achieve accurate NAPSI calculations. Fourth, the Grad-CAM result suggests that the âNAPSI calculatorâ may not yet fully learn the findings of nail psoriasis and NAPSI, leaving room for improvement in accuracy (Supplementary Fig. 1).
This study showed that our deep learning tool the âNAPSI calculatorâ enables reliable and accurate NAPSI scoring, which overcomes its interobserver variability and the effort required for acquisition and proficiency. It can be readily used in a clinical situation and may enhance the usefulness of NAPSI and promote its use, contributing to raising the general level of medical practice for nail psoriasis.
Methods
Step 1
Images and annotation
In total, 995 hand and foot images were collected from Google searches. Search keywords were adopted from the names of body parts (âfingersâ, âfingernailsâ, and âhandsâ), nail diseases considered to be relatively frequent (âonychomycosisâ, âBeauâs linesâ, âmelanonychiaâ, ânail dystrophyâ, ânail lichenâ, ânail psoriasisâ, âonychomadesisâ, âtwenty nail dystrophyâ, and âyellow nail syndromeâ), and several Japanese words meaning nails or nail diseases (âæâ (âyubiâ): fingers, âçªâ (âtsumeâ): nails, âæâ (âteâ): hands, âçªç½ç¬â (âtsume-hakusenâ): onychomycosis, âçªä¹¾ç¬â (âtsume-kansenâ): nail psoriasis, âçªæå¹³èç¬â (âtsume-henpeitaisenâ): nail lichen, âçªç²è²ç´ ç·æ¡â (âsoukou-sikisosenjouâ): melanonychia, and âçªç²å¥é¢â (âsoukou-hakuriâ): onycholysis). Per each construction, these images were randomly divided into 700 (70%) and 295 (30%) images for the training and validation sets, respectively. For the test set, we used 881 hand and foot images collected from 78 of the 90 patients in our institute (Fig. 7). These 78 were patients with nail psoriasis who had visited our institute between November 2012 and May 2020. After the color normalization, flipping, rotating, and color perturbation were performed on the training data to enhance the versatility of the models. As the nail areas, the coordinates of the rectangle (bounding box) vertexes that surround the nails were annotated.
Model architecture and loss function
Single Shot MultiBox Detector with ImageNet pre-trained VGG16 as backbone was used13,14. The candidate rectangles for nail regions were selected using the Jaccard index, meaning the similarity to the annotated bounding boxes surrounding the nails in this case, with the threshold set at 0.5. The smooth L1 loss, a loss function frequently used in object detection tasks, was used to calculate the regression loss for the locations between correct and predicted rectangles14. The cross-entropy loss function, a loss function commonly used for classification tasks, was adopted to classify whether the detected object is a nail. The batch size was set to 16, the learning rate to 0.001, the momentum to 0.9, and the weight decoy to 0.0005.
Evaluation metric for the performance
The performance was evaluated using the mean average precision (mAP) which is calculated as the area under the precision-recall curve15,16. In Step 1, the model was evaluated 10 times with 500 epochs of training, using a newly split training/validation set and an initialized network parameter each time, and the mean of mAPs was calculated with 95% confidence intervals.
Step 2
Images and annotation
First, 3783 nail images were manually cropped out from the 881 hand and foot images that were used for the test set in step 1. Among them, toe images were excluded because mechanical stress easily deforms toenails, and it was challenging even for trained dermatologists to score NAPSI correctly. Extremely low-quality images with a resolution of less than 250Ã250 were also excluded. Finally, we prepared 2939 nail images. These were divided into the âtrainvalâ (i.e., training and validation set) and testing sets, with no duplicated patients. Per each construction,â the âtrainvalâ images were randomly divided into the training and validation sets, resulting in the generation of three datasets at a ratio of approximately 70%, 15%, and 15% for the training, validation, and testing, respectively (Fig. 7). The images in the training set were resized to 256Ã256 and then flipped horizontally with a 50% probability and rotated a maximum of 45 degrees. The color was normalized by the following parameters: mean = (0.485, 0.456, 0.406), and standard deviation = (0.229, 0.224, 0.225). For the validation set, only resizing and color normalization were performed. NAPSI was annotated by nine board-certified dermatologists who also have taken charge of a specialized clinic for nail diseases for at least two years and have sufficient experience in managing nail psoriasis. For one image, nail matrix NAPSI, nail bed NAPSI, and NAPSI (the sum of the two) were annotated.
Model architecture and loss function
In NAPSI scoring, nail matrix and nail bed psoriasis are evaluated separately and the points for each are summed. Thus, two models for nail matrix and nail bed were constructed using the VGG16 pre-trained with ImageNet13. These models classified nail images into five classes (0â4). The numbers of the two classes were then summed as NAPSI. We handled NAPSI calculation as a nine-level severity classification problem (0â8). In a multi-classification problem, the cross-entropy loss function is usually used. However, it deals with each class equally and does not distinguish an incorrect value that is close to the annotated NAPSI from another value that is far from it. To make a loss large when the calculated NAPSI is far from the annotated NAPSI, we multiplied the cross-entropy loss by the squared difference between them. The batch size, learning rate, and momentum were set to 16, 0.0001, and 0.9, respectively.
Evaluation metric for the performance
Because the number of images for each score was not evenly distributed, both micro and macro average accuracy were used to evaluate step 2. The micro average accuracy was calculated as the proportion of the number of cases correctly socred and the total number of the dataset. The macro average accuracy was calculated as the mean of the accuracy of each class. In this model, when the error between the calculated and annotated NAPSI was within one point, we handled it as accurate. In total, 10 times model construction with 200 epoch training, and the mean of the accuracies with 95% confidence intervals was calculated. In each construction, the training/validation set was newly split and the network parameters were initialized.
Step 3: the âNAPSI calculatorâ
Images and annotation
We prepared 27 images of hands and two images of fingers from 12 of the 90 patients with nail psoriasis. Some patients had been photographed at multiple time points of different disease severity. These 29 images included 138 nails. These patients were diagnosed with nail psoriasis between June 2020 and November 2021 and were not involved in the training and validation set of steps 1 and 2 (Fig. 7). NAPSI was annotated by nine board-certified dermatologists with the same procedure as step 2.
Evaluation metric for the modelâs performance
We tried 10 âNAPSI calculatorsâ using the nail locating model with the highest performance in step 1 and the 10 models with parameters after the 200 epochs training in step 2. Six non-board-certified residents in dermatology and four board-certified dermatologists, different from the nine nail experts who annotated NAPSI, also scored NAPSI for the comparison. The mean of the micro accuracies between the âNAPSI calculatorâ and the six residents was compared using Welchâs t-test. A p-valueâ<â0.05 was considered significant.
Data availability
The datasets generated and/or analyzed in Step 1 during the current study are available in the KDerm/NAPSI_calculator/nail_detection repository, https://github.com/KDerm/NAPSI_calculator/tree/main/nail_detection. The datasets generated and/or analyzed in Steps 2 and 3 are not publicly available due to the form of consent but are available from the corresponding author on reasonable request.
Code availability
The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.
References
Klaassen, K. M., van de Kerkhof, P. C. & Pasch, M. C. Nail psoriasis: a questionnaire-based survey. Br. J. Dermatol. 169, 314â319 (2013).
Klaassen, K. M. et al. Scoring nail psoriasis. J. Am. Acad. Dermatol. 70, 1061â1066 (2014).
Sobolewski, P., Walecka, I. & Dopytalska, K. Nail involvement in psoriatic arthritis. Reumatologia 55, 131â135 (2017).
Rich, P. & Scher, R. K. Nail psoriasis severity index: a useful tool for evaluation of nail psoriasis. J. Am. Acad. Dermatol. 49, 206â212 (2003).
Aktan, S., Ilknur, T., Akin, C. & Ozkan, S. Interobserver reliability of the Nail Psoriasis Severity Index. Clin. Exp. Dermatol. 32, 141â144 (2007).
Selvaraju, R. R. et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Preprint at https://arxiv.org/abs/1610.02391 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115â118 (2017).
Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900â908 (2020).
Hsieh, K. Y. et al. A mask R-CNN based automatic assessment system for nail psoriasis severity. Comput. Biol. Med. 143, 105300 (2022).
Ji, B., Wang, Y. & Zuo, D. Automatic detection and evaluation of nail psoriasis based on deep learning: a preliminary application and exploration. Int. Conf. Computer Application Inf. Security 12260, 311â317 (2022).
Folle, L. et al. DeepNAPSI multi-reader nail psoriasis prediction using deep learning. Sci. Rep. 13, 5329 (2023).
Paik, K., Kim, B. R. & Youn, S. W. Automatic evaluation of Nail Psoriasis Severity Index using deep learning algorithm. J. Dermatol. https://doi.org/10.1111/1346-8138.17313 (2024).
Simonyan, K. & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Preprint at https://arxiv.org/abs/1409.1556 (2014).
Liu, W. et al. SSD: Single Shot MultiBox Detector. Preprint at https://arxiv.org/abs/1512.02325 (2016).
Padilla, R., Passos, W. L., Dias, T. L. B., Netto, S. L. & da Sliva, E. A. B. A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10, 279 (2021).
Rafael, P., Netto, S. L. & da Silva, E. A. B. A survey on performance metrics for object-detection algorithms. In International Conference on Systems, Signals and Image Processing (IWSSIP). 237â242 (IWSSIP, 2020).
Acknowledgements
We would like to thank R.H., J.M., K.M., R.K., H.M., Y.K., Y.M., M.T., G.T., K.H., T.M., K.N., K.K., S.T., Y.A., and U.T. for scoring NAPSI. All of the persons listed here are dermatologists working at Keio University Hospital. No funding was granted for this study.
Author information
Authors and Affiliations
Contributions
H.H. and M.S. conceived and planned the study. M.S. supervised the study. H.H. wrote the manuscript. K.T., N.N., J.S., and M.A. discussed the results. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
Authors H.H., K.T., N.N., M.A., and M.S. declare no financial or non-financial competing interests. Author J.S. serves as an associate editor of this journal and had no role in the peer-review or decision to publish this manuscript. Author J.S. declares no financial competing interests.
Ethics approval and consent to participate
Reviewed and approved by Keio University School of Medicine Ethics Committee; approval #20150326. This study uses clinical information and images based on the opt-out consent and does not include recognizable patient photographs or other identifiable material.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Horikawa, H., Tanese, K., Nonaka, N. et al. Reliable and easy-to-use calculating tool for the Nail Psoriasis Severity Index using deep learning. npj Syst Biol Appl 10, 130 (2024). https://doi.org/10.1038/s41540-024-00458-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41540-024-00458-x