This Pytorch codebase implements efficient training of differentially private (DP) vision neural networks (CNN, including convolutional Vision Transformers), using mixed ghost per-sample gradient clipping.
There are a few DP libraries that change the regular non-private training of neural networks to a privacy-preserving one. Examples include Opacus, FastGradClip, private-transformers, and tensorflow-privacy.
However, they are not suitable for DP training of large CNNs, because they are either not generalizable or computationally inefficient. E.g. causing >20 times memory burden or >5 times slowdown than the regular training.
This codebase implements a new technique --the mixed ghost clipping-- for the convolutional layers, that substantially reduces the space and time complexity of DP deep learning.
- We implement a mixed ghost clipping technique for the Conv1d/Conv2d/Conv3d layers, that trains DP CNNs almost as light as (with 0.1%-10% memory overhead) the regular training. This allows us to train 18 times larger batch size on VGG19 and CIFAR10 than Opacus, as well as to train efficiently on ImageNet (224X224) or larger images, which easily cause out of memory error with private-transformers.
- Larger batch size can improve the throughput of mixed ghost clipping to be 3 times faster than existing DP training methods. On all models we tested, the slowdown is at most 2 times to the regular training.
- We support general optimizers and clipping functions. Loading vision models from codebases such as timm and torchvision, our method can privately train VGG, ResNet, Wide ResNet, ResNeXt, etc. with a few additional lines of code.
- We demonstrate DP training of convolutional Vision Transformers (up to 300 million parameters, again 10% memory overhead and less than 200% slowdonw than non-private training). We improve from previous SOTA 67.4% accuracy to 83.0% accuracy at eps=1 on CIFAR100, and to 96.7% accuracy at eps=1 on CIFAR10.
To DP training models on CIFAR10 and CIFAR100, one can run
python -m cifar_DP --lr 0.001 --epochs 3 --model beit_large_patch16_224
Arguments:
--lr
: learning rate, default is 0.001--epochs
: number of epochs, default is 1--model
: name of models in timm, default isresnet18
; see supported models below--cifar_data
: dataset to train,CIFAR10
(default) orCIFAR100
--eps
: privacy budget, default is 2--grad_norm
: per-sample gradient clipping norm, default is 0.1--mode
: which DP clipping algorithm to use, one ofghost_mixed
(default; the mixed ghost clipping),ghost
(the ghost clipping),non-ghost
(the Opacus approach),non-private
(standard non-DP training)--bs
: logical batch size that determines the convergence and accuracy, but not the memory nor speed; default is 1000--mini_bs
: virtual or physical batch size for the gradient accumulation, which determines the memory and speed of training; default is 50--pretrained
: whether to use pretrained model fromtimm
, default is True
Privately training vision models is simple:
- Create the model and any optimizer
- Attach this optimizer to our
PrivacyEngine
(this essentially adds Pytorch hooks for per-sample clipping) - Compute per-example losses (setting
reduction=='none'
) for a mini-batch of data - Pass the loss to
optimizer.step
oroptimizer.virtual_step
without calling thebackward
function (this is implicitly called insidePrivacyEngine
)
Below is a quick example of using our codebase for training CNN models with mixed ghost clipping:
import torchvision, torch, opacus
from private_vision import PrivacyEngine
model = torchvision.models.resnet18()
# replace BatchNorm by GroupNorm or LayerNorm
model=opacus.validators.ModuleValidator.fix(model)
optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-4)
privacy_engine = PrivacyEngine(
model,
batch_size=256,
sample_size=50000,
epochs=3,
max_grad_norm=0.1,
target_epsilon=3,
ghost_clipping=True,
mixed=True,
)
privacy_engine.attach(optimizer)
# Same training procedure, e.g. data loading, forward pass, logits...
loss = F.cross_entropy(model(batch), labels, reduction="none")
# do not use loss.backward()
optimizer.step(loss=loss)
In the above PrivacyEngine
, which shares the design of Opacus v0.15 (see how to use keywords like max_grad_norm
and target_epsilon
in https://github.com/pytorch/opacus/blob/v0.15.0/opacus/privacy_engine.py),
- setting
ghost_clipping=True, mixed=True
implements the best method, mixed ghost clipping; - setting
ghost_clipping=True, mixed=False
implements the ghost clipping, which are very memory-costly for large images (e.g. failing to fit a single 400X400 image into ResNet18 with a 16GB GPU); - setting
ghost_clipping=False
implements a similar approach to Opacus, which needs to instantiate the per-sample gradients that are very memory-costly.
A special use of our privacy engine is to use the gradient accumulation. This is achieved with virtual step function.
import torchvision, torch
from private_vision import PrivacyEngine
gradient_accumulation_steps = 10
# Batch size/physical batch size. Take an update once this many iterations
model = torchvision.models.resnet18()
model=opacus.validators.ModuleValidator.fix(model)
optimizer = torch.optim.Adam(model.parameters())
privacy_engine = PrivacyEngine(...)
privacy_engine.attach(optimizer)
for i, batch in enumerate(dataloader):
loss = F.cross_entropy(model(batch), labels, reduction="none")
if i % gradient_accumulation_steps == 0:
optimizer.step(loss=loss)
optimizer.zero_grad()
else:
optimizer.virtual_step(loss=loss)
- nn.Linear (2D Ian Goodfellow)
- nn.Linear (3D Xuechen et al.)
- nn.LayerNorm (Opacus)
- nn.GroupNorm (Opacus)
- nn.Embedding (Xuechen et al.)
- nn.Conv1d (this work)
- nn.Conv2d (this work)
- nn.Conv3d (this work)
- nn.Linear (4D; this work)
For unsupported modules, an error message will print out the module names and you need to mannually freeze them to not require gradient.
As a consequence, we can privately train most of the models from timm
(this list is non-exclusive):
beit_base_patch16_224, beit_large_patch16_224, cait_s24_224, cait_xxs24_224, convit_base, convit_small, convit_tiny, convnext_base, convnext_large, crossvit_9_240, crossvit_15_240, crossvit_18_240, crossvit_base_240, crossvit_small_240, crossvit_tiny_240, deit3_base_patch16_224, deit_small_patch16_224, deit_tiny_patch16_224,
dla34, dla102, dla169, ecaresnet50d, ecaresnet269d, gluon_resnet18_v1b, gluon_resnet50_v1b, gluon_resnet152_v1b, gluon_resnet152_v1d, gluon_resnet152_v1s, hrnet_w18, hrnet_w48, ig_resnext101_32x8d, inception_v3, jx_nest_base, legacy_senet154, legacy_seresnet18, legacy_seresnet152, mixer_b16_224, mixer_l16_224, pit_b_224, pvt_v2_b1,
resnet18, resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, res2net50_14w_8s, res2next50, resnest50d, seresnet50, seresnext50_32x4d, ssl_resnet50, ssl_resnext50_32x4d, swsl_resnet50, swsl_resnext50_32x4d, tv_resnet152, tv_resnext50_32x4d, twins_pcpvt_base, twins_pcpvt_large, twins_svt_base, twins_svt_large
vgg11, vgg11_bn, vgg13, vgg16, vgg19, visformer_small, vit_base_patch16_224, vit_base_patch32_224, vit_large_patch16_224, vit_small_patch16_224, vit_tiny_patch16_224, volo_d1_224, wide_resnet50_2, wide_resnet101_2, xception, xcit_large_24_p16_224, xcit_medium_24_p16_224, xcit_small_24_p16_224, xcit_tiny_24_p16_224
We also support models in torchvision
and other vision libraries, e.g. densenet121, densnet161, densenet201
.
Please cite our paper if you use PrivateVision in your papers, as follows:
@article{bu2022scalable,
title={Scalable and efficient training of large convolutional neural networks with differential privacy},
author={Bu, Zhiqi and Mao, Jialin and Xu, Shiyun},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={38305--38318},
year={2022}
}
This code is largely based on https://github.com/lxuechen/private-transformers (v0.1.0) and https://github.com/pytorch/opacus (v0.15).