Introduction

Psoriasis is a common inflammatory skin disease in which 10–80% of patients suffer from nail lesions called nail psoriasis1. Nail psoriasis causes a disfiguring cosmetic problem and restricts daily activities, severely impairing the patient’s quality of life2. Furthermore, it is a risk factor for the development of psoriatic arthritis3. Accordingly, early therapeutic intervention based on reliable and reproducible evaluation is critical in medical practice.

Nail psoriasis shows various nail changes via inflammation of the nail matrix and bed; the former causes pitting, leukonychia, red spots in the lunula, and crumbling, and the latter causes oil drop discoloration, onycholysis, nail bed hyperkeratosis, and splinter hemorrhage (Fig. 1a). Focusing on the eight representative nail findings of psoriasis, the Nail Psoriasis Severity Index (NAPSI) was proposed as a scoring tool for nail psoriasis severity. NAPSI scoring divides the nail into four quadrants with horizontal and longitudinal lines. In each quadrant, nail matrix and bed psoriasis are evaluated for the presence of any of the four findings. Then, NAPSI is calculated by summing up them (Fig. 1b)4.

Fig. 1: Findings of nail psoriasis and example of NAPSI scoring.
figure 1

a Eight findings of nail psoriasis. The upper and lower four findings are nail matrix and nail bed psoriasis, respectively. b An example of NAPSI Scoring. In each quadrant of the nail, nail matrix and nail bed psoriasis are evaluated for the presence of any of the findings. Note that multiple nail matrix findings (or nail bed findings) in one quadrant are calculated as one point for nail matrix NAPSI (or nail bed NAPSI) regardless of the type or number.

However, the usefulness of NAPSI is limited by interobserver variability and the required efforts for score acquisition and proficiency2,5. The criteria for judging whether a patient’s nail has psoriasis findings are less quantitative and partially subjective, causing interobserver variabilities among individual dermatologists and making its acquisition difficult. In fact, despite NAPSI taking only nine values (0–8), the dermatologists’ scores for the images in the original article that proposed NAPSI were distributed in the range of five points or more4. This study developed an easy-to-use deep learning-based calculating tool that enables reliable and accurate NAPSI scoring.

Results

For accurate NAPSI scoring, the tool must first have an ability to recognize the nail accurately. Accordingly, we divided the process into two steps. The first step is nail detection from images (step 1). The second step is NAPSI scoring of the detected nails (step 2). First, training and performance evaluation of each step was conducted. Then, the final performance evaluation of the integrated tool (step 3: the “NAPSI calculator”) was conducted. All codes for constructing the system and statistical analyses were written in Python version 3.7.7 and PyTorch version 1.10.2.

Step 1. Nail detection

In step 1, the nails of the testing images were detected with 93.8% of mean average precision (mAP) on average. The performances were stable and their fluctuations were minute throughout the 10 trials (95% CI, 93.4–93.9%) (Fig. 2a). Nails partially out of the screen at the edges of the image were also detected correctly. Although nails that were small in terms of the image size, such as the toenails of the little toe, were sometimes undetected (Fig. 2c), generally the model demonstrated good detection accuracy (Fig. 2a, b).

Fig. 2: Performance in step 1.
figure 2

a Mean average precisions in step 1. b Representative result images in step 1. Nails including those of partially out of the screen at the edges of the image were detected correctly. c Representative images in which little toenails were not detected. The number written on the detection box means the probability of nails. mAP, mean average precision.

Step 2. NAPSI calculation

The distributions of the number of images for each annotated NAPSI in step 2 are shown in Table 1. On average, NAPSI was calculated with 82.8% (95% CI, 81.7–83.9%) micro average accuracy (Fig. 3a). The proportion of images with zero-point (no error) and one-point error was 43.1% (95% CI, 42.1–44.1%) and 39.6% (95% CI, 38.3–40.9%), respectively. Among the incorrect images, the percentage of those with two-point and more than two-point errors was 12.4% (95% CI, 11.3–13.5%) and 4.8% (95% CI, 4.1–5.6%), respectively (Fig. 3b). In 10 trials, there was little variability in these percentages. The macro average accuracy with error within one point reached 82.7% (95% CI, 81.6–83.8%) (Fig. 3a). The aggregate results of 10 trials showed that the NAPSIs calculated by the deep learning model were concentrated within a one-point error range for each of the annotated NAPSI scores (Fig. 3c). The images analyzed by this model are demonstrated in Fig. 4, which suggests that it could adapt to a mix of a variety of nail matrix and bed psoriasis findings or non-psoriasis findings, such as melanonychia.

Table 1 Step 2: Distribution of the number of images for each NAPSI score (step 2)
Fig. 3: Performance in step 2.
figure 3

a Micro and macro average accuracies in step 2. b Proportion of each error point group in the micro average. Error bars mean 95% confidence intervals. c Heatmap of aggregated results of 10 trials. DLM, NAPSI proposed by the deep learning model; Label, annotated NAPSI.

Fig. 4: Representative result images in step 2.
figure 4

Representative images proposed by one of the models in step 2. DLM, NAPSI proposed by the deep learning model; Label, annotated NAPSI.

Step 3. the “NAPSI calculator”

The distribution of image counts of each annotated NAPSI in step 3 is shown in Table 2. The “NAPSI calculator” detected nails with 99.3% accuracy (137/138 nails). The average micro accuracies in 10 trials were 83.9% (95% CI, 81.6–86.2%) (Fig. 5a). The proportion of images with zero-point error (no error) was 47.5% (95% CI, 46.1–48.9%), and that with one-point error was 36.4% (95% CI, 33.8–39.1%). Conversely, the average accuracy of NAPSI scored by the six non-board-certified dermatology residents was 65.7% (95% CI, 54.5–76.9%), zero-point error (no error) was 32.9% (95% CI, 26.0–39.7%), and one-point error was 32.9% (95% CI, 28.2–37.5%). Those of the four board-certified dermatologists, non-nail experts, was 73.0% (95% CI, 66.8–79.2%), zero-point error (no error) was 39.1% (95% CI, 33.9–44.4%), and one-point error was 33.9% (95% CI, 28.9–38.9%). The accuracy of the “NAPSI calculator” was significantly higher than the six non-board-certified dermatology residents (p-value: 0.008) and the four certified dermatologists (p-value: 0.005). In particular, the percentage of images with zero-point error was higher in this tool (Fig. 5a). The heatmap of the aggregate results of 10 trials is shown in Fig. 5b, which showed that NAPSI scored by the “NAPSI calculator” concentrated within one-point error range for each score of annotated NAPSI similar to Step 2.

Table 2 Step 3: Distribution of the number of images for each NAPSI score
Fig. 5: Performance of the “NAPSI calculator”.
figure 5

a Accuracies of NAPSI proposed by the “NAPSI calculator”, the four board-certified dermatologists, and the six non-board-certified dermatology residents. b Heatmap of aggregated results of 10 trials. DLM, NAPSI calculated by the “NAPSI calculator”; BCD, NAPSI calculated by the board-certified dermatologists; non-BCD, NAPSI calculated by the non-board-certified dermatology residents; Label, annotated NAPSI.

In one patient for whom the time course of NAPSI could be obtained, the “NAPSI calculator” assessed NAPSI at each time point similar to the annotated NAPSI, and correctly proposed a transient exacerbation and a final improvement of the disease status (Fig. 6). The time required for nail detection of one hand image and NAPSI calculation of one nail image was approximately 0.85 and 0.95 s, respectively, on a laptop with an Intel® CoreTM i7 processor (quad-core) and 16GB of memory.

Fig. 6: Representative images of the “NAPSI calculator”.
figure 6

Analysis results of the “NAPSI calculators” in one patient’s left hand. DLM, NAPSI calculated by the “NAPSI calculator”; Label, annotated NAPSI.

Grad-CAM was performed to visualize which parts of the image the “NAPSI calculator” is focusing on to calculate NAPSI6. In many images, it was suggested that the “NAPSI calculator” focused on the nail psoriasis findings of NAPSI (Supplementary Fig. 1a). However, in some images, the model focused on areas of lack of findings, and the results were not sufficient to conclude that the model learned the nail psoriasis findings adequately (Supplementary Fig. 1b).

Discussion

Deep learning has been applied in various areas of medicine. In dermatology, research using deep learning has progressed in diagnosing malignant tumors, such as melanoma7, and recent studies have used it to diagnose various skin diseases8. Various useful scoring tools in the medical field are also expected to be used by deep learning, as many require professional examinations. In the medical management of psoriasis, NAPSI was developed as a valuable tool for evaluating the severity of nail psoriasis. However, its scoring criteria are less quantitative and partially subjective, causing interobserver variabilities and requiring efforts for acquisition and proficiency2,4. Considering this, scoring NAPSI by deep learning instead of humans could contribute to reliable scoring, making NAPSI more useful and effective for daily clinical practice.

To date, four studies on NAPSI using deep learning systems have been reported. Each study has its unique approach, but issues remain regarding the need for nail image preparation and verification of NAPSI accuracy. One study achieved excellent accuracy in recognizing each nail psoriasis finding but it requires a dedicated camera to photograph each nail individually. The accuracy of the final NAPSI was evaluated by comparison with a single dermatologist on only 10 images9. Second study successfully intervenes with a model for dividing the nail into four quadrants assuming the detection of nail lesions according to NAPSI, but the accuracy of lesion detection was not sufficient and the final NAPSI has not been evaluated10. Another study has developed models that utilize key point detection to identify fingernails and score modified NAPSI (mNAPSI). However, photography for the key point detection is required and the accuracy of detection, such as how well it can correspond to various hand shapes, has not been validated. Although mNAPSI can show a maximum of 14 points, the data in this study is skewed, with approximately 80% of the data showing zero or one point, with a maximum of six points11. The most recent study is to identify all the eight NAPSI findings on fingernails one by one using eight object detection models but regarding the cropping of the nail part, manual work such as classification and rotation of the images is required in advance, depending on the shape of the hand12. In this study, we aimed to construct a deep learning tool that can easily and accurately perform the calculation of NAPSI from clinical photographs taken under general conditions without any special equipment or preparation. It is also available to any doctors including non-dermatologists with a simple computer setup for using Python and PyTorch. We also trained our model using nail images of sufficient amount and disease severity and validated the accuracy of the calculated NAPSI in comparison with four board-certified dermatologists and six non-board-certified residents in dermatology.

Our deep learning system can automatically detect each nail within an image before calculating NAPSI. This is an important process because if this part is missing, each nail area must be manually cropped from the images. Furthermore, cropping out regions other than nails might reduce the risk of overfitting and interference from information other than nail findings. Our tool was able to recognize nails in an image with sufficient accuracy (99.3%, 137/138 nails). This high performance is attributed to the fact that images of the training set collected from Google search were rich in variation such as manicured nails, nail cartoons, and nails on strangely posed fingers, which could increase its generalizability for nail detection.

Because NAPSI takes discrete values from zero to eight, NAPSI calculation was handled as a classification problem, where each class number is not only a class name but also the NAPSI points. However, the cross-entropy loss function did not consider the degree of inaccuracy among classes. Therefore, we multiplied the cross-entropy loss by the squared difference between the calculated and annotated NAPSI. Eventually, the “NAPSI calculator” scored NAPSI generally within one-point error in every score of annotated NAPSI according to the learning strategy and it reached higher accuracy with lower variance than those of the six non-board-certified residents in dermatology (83.9% vs 63.7%) and the four board-certified dermatologists (83.9% vs 73.0%), indicating that the “NAPSI calculator” is not only highly valid but also reproducible and reliable (Fig. 5a, b). The “NAPSI calculator” also correctly evaluated a 16-month NAPSI course and suggested improved disease severity in one patient (Fig. 6). While the dermatologists who annotated the NAPSI were nail experts, the compared six and four doctors were non-board-certified dermatology residents and non-nail expert board-certified dermatologists. Therefore, the superiority of the “NAPSI calculator” could be attributed to the difference in the experience in managing nail psoriasis.

Overall, the “NAPSI calculator” performance was reasonably good but may have several limitations. First, images with an error of two or more points between the annotated and calculated NAPSI accounted for 17.2% of all images. These included challenging cases in which NAPSI scoring was difficult because the psoriasis findings were centrally located despite the small distribution area, or the nail matrix findings were too strong to assess the nail bed findings appropriately. It suggested that the cases difficult for dermatologists are also difficult for deep learning. Second, there is a bias in the variation of nail psoriasis images. The ratio of severe cases among our patients with nail psoriasis may be higher than that in the general psoriasis patient population. These sampling biases might have caused the overfitting of our hospital’s nail psoriasis images (Table 1). Third, our model is designed to differentiate between the nail matrix and nail bed features that make up NAPSI, but it is not intended to identify each feature in detail. Ideally, NAPSI scoring requires a precise assessment of all eight findings. Although NAPSI is calculated by summing the nail matrix and the nail bed scores, these two are not determined by the distribution of findings rather than the specific type of each feature (Fig. 1). Given the high accuracy achieved by our model, it may not be necessary for the model to differentiate each of the eight findings in detail to achieve accurate NAPSI calculations. Fourth, the Grad-CAM result suggests that the “NAPSI calculator” may not yet fully learn the findings of nail psoriasis and NAPSI, leaving room for improvement in accuracy (Supplementary Fig. 1).

This study showed that our deep learning tool the “NAPSI calculator” enables reliable and accurate NAPSI scoring, which overcomes its interobserver variability and the effort required for acquisition and proficiency. It can be readily used in a clinical situation and may enhance the usefulness of NAPSI and promote its use, contributing to raising the general level of medical practice for nail psoriasis.

Methods

Step 1

Images and annotation

In total, 995 hand and foot images were collected from Google searches. Search keywords were adopted from the names of body parts (“fingers”, “fingernails”, and “hands”), nail diseases considered to be relatively frequent (“onychomycosis”, “Beau’s lines”, “melanonychia”, “nail dystrophy”, “nail lichen”, “nail psoriasis”, “onychomadesis”, “twenty nail dystrophy”, and “yellow nail syndrome”), and several Japanese words meaning nails or nail diseases (“指” (“yubi”): fingers, “爪” (“tsume”): nails, “手” (“te”): hands, “爪白癬” (“tsume-hakusen”): onychomycosis, “爪乾癬” (“tsume-kansen”): nail psoriasis, “爪扁平苔癬” (“tsume-henpeitaisen”): nail lichen, “爪甲色素線条” (“soukou-sikisosenjou”): melanonychia, and “爪甲剥離” (“soukou-hakuri”): onycholysis). Per each construction, these images were randomly divided into 700 (70%) and 295 (30%) images for the training and validation sets, respectively. For the test set, we used 881 hand and foot images collected from 78 of the 90 patients in our institute (Fig. 7). These 78 were patients with nail psoriasis who had visited our institute between November 2012 and May 2020. After the color normalization, flipping, rotating, and color perturbation were performed on the training data to enhance the versatility of the models. As the nail areas, the coordinates of the rectangle (bounding box) vertexes that surround the nails were annotated.

Fig. 7: Diagram of data aggregation.
figure 7

The dataset includes 995 hand or foot images collected by Google search and 881 hand or foot images from 78 patients with nail psoriasis at Keio University Hospital. For the training in step 1, the 995 images were divided into 700 and 295 for the training and validation set, respectively. The 881 nail images were used for the test set. In step 2, 2939 fingernail images were prepared. These were divided into 2000, 470, and 469 for the training, validation, and test set, respectively. For the verification of the “NAPSI calculator”, 29 hand images including 138 nails were prepared from 12 patients with nail psoriasis. NAPSI indicates Nail Psoriasis Severity Index.

Model architecture and loss function

Single Shot MultiBox Detector with ImageNet pre-trained VGG16 as backbone was used13,14. The candidate rectangles for nail regions were selected using the Jaccard index, meaning the similarity to the annotated bounding boxes surrounding the nails in this case, with the threshold set at 0.5. The smooth L1 loss, a loss function frequently used in object detection tasks, was used to calculate the regression loss for the locations between correct and predicted rectangles14. The cross-entropy loss function, a loss function commonly used for classification tasks, was adopted to classify whether the detected object is a nail. The batch size was set to 16, the learning rate to 0.001, the momentum to 0.9, and the weight decoy to 0.0005.

Evaluation metric for the performance

The performance was evaluated using the mean average precision (mAP) which is calculated as the area under the precision-recall curve15,16. In Step 1, the model was evaluated 10 times with 500 epochs of training, using a newly split training/validation set and an initialized network parameter each time, and the mean of mAPs was calculated with 95% confidence intervals.

Step 2

Images and annotation

First, 3783 nail images were manually cropped out from the 881 hand and foot images that were used for the test set in step 1. Among them, toe images were excluded because mechanical stress easily deforms toenails, and it was challenging even for trained dermatologists to score NAPSI correctly. Extremely low-quality images with a resolution of less than 250×250 were also excluded. Finally, we prepared 2939 nail images. These were divided into the “trainval” (i.e., training and validation set) and testing sets, with no duplicated patients. Per each construction,” the “trainval” images were randomly divided into the training and validation sets, resulting in the generation of three datasets at a ratio of approximately 70%, 15%, and 15% for the training, validation, and testing, respectively (Fig. 7). The images in the training set were resized to 256×256 and then flipped horizontally with a 50% probability and rotated a maximum of 45 degrees. The color was normalized by the following parameters: mean = (0.485, 0.456, 0.406), and standard deviation = (0.229, 0.224, 0.225). For the validation set, only resizing and color normalization were performed. NAPSI was annotated by nine board-certified dermatologists who also have taken charge of a specialized clinic for nail diseases for at least two years and have sufficient experience in managing nail psoriasis. For one image, nail matrix NAPSI, nail bed NAPSI, and NAPSI (the sum of the two) were annotated.

Model architecture and loss function

In NAPSI scoring, nail matrix and nail bed psoriasis are evaluated separately and the points for each are summed. Thus, two models for nail matrix and nail bed were constructed using the VGG16 pre-trained with ImageNet13. These models classified nail images into five classes (0–4). The numbers of the two classes were then summed as NAPSI. We handled NAPSI calculation as a nine-level severity classification problem (0–8). In a multi-classification problem, the cross-entropy loss function is usually used. However, it deals with each class equally and does not distinguish an incorrect value that is close to the annotated NAPSI from another value that is far from it. To make a loss large when the calculated NAPSI is far from the annotated NAPSI, we multiplied the cross-entropy loss by the squared difference between them. The batch size, learning rate, and momentum were set to 16, 0.0001, and 0.9, respectively.

Evaluation metric for the performance

Because the number of images for each score was not evenly distributed, both micro and macro average accuracy were used to evaluate step 2. The micro average accuracy was calculated as the proportion of the number of cases correctly socred and the total number of the dataset. The macro average accuracy was calculated as the mean of the accuracy of each class. In this model, when the error between the calculated and annotated NAPSI was within one point, we handled it as accurate. In total, 10 times model construction with 200 epoch training, and the mean of the accuracies with 95% confidence intervals was calculated. In each construction, the training/validation set was newly split and the network parameters were initialized.

Step 3: the “NAPSI calculator”

Images and annotation

We prepared 27 images of hands and two images of fingers from 12 of the 90 patients with nail psoriasis. Some patients had been photographed at multiple time points of different disease severity. These 29 images included 138 nails. These patients were diagnosed with nail psoriasis between June 2020 and November 2021 and were not involved in the training and validation set of steps 1 and 2 (Fig. 7). NAPSI was annotated by nine board-certified dermatologists with the same procedure as step 2.

Evaluation metric for the model’s performance

We tried 10 “NAPSI calculators” using the nail locating model with the highest performance in step 1 and the 10 models with parameters after the 200 epochs training in step 2. Six non-board-certified residents in dermatology and four board-certified dermatologists, different from the nine nail experts who annotated NAPSI, also scored NAPSI for the comparison. The mean of the micro accuracies between the “NAPSI calculator” and the six residents was compared using Welch’s t-test. A p-value < 0.05 was considered significant.