An illustrative implementation of a privacy backdoor "data trap" on a small MNIST MLP model based on Privacy Backdoors: Stealing Data with Corrupted Pretrained Models by Feng and Tramèr.
Was written for an article describing the attack and its limitations which you can find on my blog.
Tested only on macOS 13.7.3 and Python 3.10.14.
git clone https://github.com/hkscy/MLPdatatrap-example.git
cd MLPdatatrap-example
conda create --name datatraps --file requirements.txt
Steps below will download MNIST 10, train an MLP on the dataset, backdoor a copy of the model, and then finetune both corrupted and uncorrupted models using each of SGD and Adam optimisation. Plots of the activations, loss gradients, and weight updates are output for all 4 models along with fine-grained data and the recovered (or not) finetuning data.
conda activate datatraps
python datatrap_plot_sgd_adam.py
