Implementation of backdoor watermarking methods for deep classification models, featuring
- Unambiguous backdoor trigger sets with high-fidelity embedding [1];
- Realization of deep fidelity for backdoor watermarking deep classification models [2].
The repo contains pretrained ResNet18 for MNIST for demonstration. Note that the visualization of training data feature distribution and prototype vectors (corresponding to decision boundaries) is only for ResNet18 with the penultimate layer changed to 2 neurons. All the pretrained models (ref models) can be downloaded from here.
- Set the last layer (a linear layer) bias to "False";
- Rename the last layer to "out";
- Modify model output to the tuple (feature, output).
We propose that in general, watermarks for deep learning models should preserve deep fidelity, not only the fidelity measured by extrinsic metrics (e.g., learning accuracy), but also the intrinsic mechanism of the model to be protected.
Realizing this concept for deep classification models, it is stated that deep fidelity is not only concerned with good learning accuracies (train and val accuracies), but also concerned with the preservation of the intrinsic mechanism of the model, i.e., feature learning and classification. Therefore, to achieve deep fidelity, one should preserve both the training data feature distribution and the decision boundary of the unwatermarked model. The figure below illustrate how deep fidelity is achieved:
We provide the visualization module to visualize how deep fidelity is achieved in backdoor watermarking ResNet18 for MNIST classification. The task is easy enough such that the penultimate layer of ResNet18 can be reduced to 2 neurons without performance degradation (see figure below). Note that the feature and prototype visualization is the exact situation, not approximation. This visualization idea is inspired by Wen et al., ECCV 2016.
[1] G. Hua, A. B. J. Teoh, Y. Xiang, and H. Jiang, "Unambiguous and high-fidelity backdoor watermarking for deep neural networks," IEEE Transactions on Neural Networks and Learning Systems, 2023. link
[2] G. Hua and A. B. J. Teoh, "Deep fidelity in DNN watermarking: A study of backdoor watermarking for classification models," Pattern Recognition, 2023. link