Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
- ✅ [2024/09/25] Codebase is created.
source env_setup.sh
You can put all the data in a folder and pass the path to --data_path
argument.
We provide two ways for preparing VTAB-1k dataset
- TFDS
- Following VPT's instruction to download the data through
TFDS - When you pass the data name for
--data
argument, addtfds_vtab
before each VTAB dataset,
e.g.tfds_vtab-cifar(num_classes=100)
- You can find all the dataset names of the TFDS version VTAB1-k from
TFDS_DATASETS
inutils/global_var.py
- Following VPT's instruction to download the data through
- Processed Version
- Download the processed version from here.
- When you pass the data name for
--data
argument, addprocess_vtab
, e.g.process_vtab-cifar
- You can find all the dataset names of the processed version VTAB1-k from
VTAB_DATASETS
inutils/global_var.py
Both ways apply the same transforms to the dataset: Resize, ToTensor and optionally Normalize with mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225].
- Download ImageNet from the official website
- When you pass the data name for
--data
argument, useimagenet-fs
- Download ImageNet-Sketch from https://github.com/HaohanWang/ImageNet-Sketch
- Download ImageNet-A from https://github.com/hendrycks/natural-adv-examples
- Download ImageNet-R from https://github.com/hendrycks/imagenet-r
- Download ImageNetV2 from https://github.com/modestyachts/ImageNetV2.
Download when you run the code
Download the pretrained weights from the following links and put them in the pretrained_weights
folder.
- ViT-B-Sup-21k rename it as
ViT-B_16_in21k.npz
- ViT-B-CLIP rename it as
ViT-B_16_clip.bin
- ViT-B-DINOV2 rename it as
ViT-B_14_dinov2.pth
Example commands for implemented methods:
### SSF
CUDA_VISIBLE_DEVICES=0 python main.py --ssf --data processed_vtab-dtd --debug
### VPT
CUDA_VISIBLE_DEVICES=0 python main.py --vpt_mode deep --vpt_num 10 --data processed_vtab-dtd --debug
### AdaptFormer
CUDA_VISIBLE_DEVICES=0 python main.py --ft_mlp_module adapter --ft_mlp_mode parallel --ft_mlp_ln before --adapter_bottleneck 64 --adapter_init lora_kaiming --adapter_scaler 0.1 --data processed_vtab-dtd --debug --data_path data_folder/vtab_processed
### Convpass
CUDA_VISIBLE_DEVICES=0 python main.py --ft_attn_module convpass --ft_attn_mode parallel --ft_attn_ln after --ft_mlp_module convpass --ft_mlp_mode parallel --ft_mlp_ln after --convpass_scaler 0.1 --data processed_vtab-dtd --debug --optimizer adamw --data_path data_folder/vtab_processed
### VPT Version (zero-init) Adapter
CUDA_VISIBLE_DEVICES=0 python main.py --ft_mlp_module adapter --ft_mlp_mode sequential_after --adapter_bottleneck 8 --adapter_init zero --adapter_scaler 1 --data processed_vtab-dtd --debug --data_path data_folder/vtab_processed
### lora init Pfeiffer Adapter
CUDA_VISIBLE_DEVICES=0 python main.py --ft_mlp_module adapter --ft_mlp_mode sequential_after --adapter_bottleneck 8 --adapter_init lora_xavier --adapter_scaler 1 --data processed_vtab-dtd --debug --data_path data_folder/vtab_processed
### random init Pfeiffer Adapter
CUDA_VISIBLE_DEVICES=0 python main.py --ft_mlp_module adapter --ft_mlp_mode sequential_after --adapter_bottleneck 8 --adapter_init xavier --adapter_scaler 1 --data processed_vtab-dtd --debug --data_path data_folder/vtab_processed
### random init Houlsby Adapter
CUDA_VISIBLE_DEVICES=0 python main.py --ft_attn_module adapter --ft_attn_mode sequential_after --ft_mlp_module adapter --ft_mlp_mode sequential_after --adapter_bottleneck 8 --adapter_init xavier --adapter_scaler 1 --data processed_vtab-dtd --debug --data_path data_folder/vtab_processed
### LoRA
CUDA_VISIBLE_DEVICES=0 python main.py --lora_bottleneck 8 --data processed_vtab-dtd --debug --data_path data_folder/vtab_processed
### FacT TK
CUDA_VISIBLE_DEVICES=0 python main.py --data processed_vtab-dtd --optimizer adamw --fact_type tk --drop_path_rate 0.1 --debug --fact_dim 32 --fact_scaler 0.1 --data_path data_folder/vtab_processed
### Fact TT
CUDA_VISIBLE_DEVICES=0 python main.py --data processed_vtab-dtd --optimizer adamw --fact_type tt --drop_path_rate 0.1 --data_path data_folder/vtab_processed
### ReAdapter
CUDA_VISIBLE_DEVICES=0 python main.py --ft_attn_module repadapter --ft_attn_mode sequential_before --ft_mlp_module repadapter --ft_mlp_mode sequential_before --repadapter_bottleneck 8 --repadapter_scaler 10 --data processed_vtab-smallnorb_azi --optimizer adamw --debug
### BitFit
CUDA_VISIBLE_DEVICES=0 python main.py --bitfit --data processed_vtab-clevr_count --debug --data_path data_folder/vtab_processed
### VQT
CUDA_VISIBLE_DEVICES=0 python main.py --vqt_num 10 --data_path data_folder/vtab_processed
### LayerNorm
CUDA_VISIBLE_DEVICES=0 python main.py --ln --data processed_vtab-clevr_count --debug --data_path data_folder/vtab_processed
### DiffFit
CUDA_VISIBLE_DEVICES=0 python main.py --difffit --data processed_vtab-clevr_count --debug --data_path data_folder/vtab_processed
### Attention
CUDA_VISIBLE_DEVICES=0 python main.py --attention_index 9 10 11 --attention_type qkv --data_path data_folder/vtab_processed
### MLP
CUDA_VISIBLE_DEVICES=0 python main.py --mlp_index 9 10 11 --mlp_type full --data_path data_folder/vtab_processed
### block
CUDA_VISIBLE_DEVICES=0 python main.py --block_index 9 10 11 --data_path data_folder/vtab_processed
### full
CUDA_VISIBLE_DEVICES=0 python main.py --full --eval_freq 2 --data_path data_folder/vtab_processed
CUDA_VISIBLE_DEVICES=0 python main_tune.py --data processed_vtab-caltech101 --default experiment/config/default_vtab_processed.yml --tune experiment/config/method/lora.yml --lrwd experiment/config/lr_wd_vtab_processed.yml
All the config files can be found in the experiment/config
folder.
The default
config file is used to set the default hyperparameters for the training.
The tune
config file is used to set the tunable hyperparameters for the method.
The lrwd
config file is used to set the learning rate and weight decay for the training.
We provide the tuning results tune_summary.csv
and final run results final_result.json
in the tune_output
folder.
If you want to rerun the final run using the best tuning parameters, you can specify a new output name for the final run using this option --final_output_name
.
To tune all methods for one VTAB-1K dataset, here is an example command:
dataset='processed_vtab-dsprites_loc'
for METHOD in lora_p_adapter repadapter rand_h_adapter adaptformer convpass fact_tk fact_tt lora difffit full linear ssf bitfit ln vpt_shallow vpt_deep
do
CUDA_VISIBLE_DEVICES=0 python main_tune.py --data ${dataset} --default experiment/config/default_vtab_processed.yml --tune experiment/config/method/$METHOD.yml --lrwd experiment/config/lr_wd_vtab_processed.yml
done
We use the CLIP ViT-B/16 model and add an FC layer as the prediction head with zero-initialized bias and initialize weights using the class label text embedded by the text encoder. Subsequently, we discard the text encoder and apply PETL methods to the visual encoder, fine-tuning only the PETL modules and the head.
The code to generate the prediction head for CLIP can be found at build_clip_zs_classifier.py.
This is an example command to run a PETL method for the 100-shot ImageNet dataset:
CUDA_VISIBLE_DEVICES=0 python main.py --data fs-imagenet --data_path data_folder/imagenet/images --warmup_lr_init 1.0e-7 --lr 0.00003 --wd 0.005 --eval_freq 1 --store_ckp --lora_bottleneck 32 --batch_size 256 --final_acc_hp --early_patience 10
This is an example command to tune a few methods for the 100-shot ImageNet dataset:
for METHOD in rand_h_adapter_64 lora_16 ssf
do
for dataset in fs-imagenet
do
CUDA_VISIBLE_DEVICES=0 python main_tune.py --data ${dataset} --test_bs 2048 --bs 256 --default experiment/config/clip_fs_imagenet.yml --tune experiment/config/method-imagenet/$METHOD.yml --lrwd experiment/config/lr_wd_clip_imagenet.yml
done
done
To evaluate the performance of fine-tuned models on OOD data, here is an example command:
for METHOD in rand_h_adapter_64 lora_16 ssf
do
for dataset in fs-imagenet eval_imagenet-v2 eval_imagenet-r eval_imagenet-a eval_imagenet-s
do
CUDA_VISIBLE_DEVICES=0 python main_evaluate.py --test_data ${dataset} --bs 2048 --default experiment/config/clip_fs_imagenet.yml --tune experiment/config/method-imagenet/$METHOD.yml --data_path /research/nfs_chao_209/zheda
done
done
When it comes to applying WiSE to PETL methods, there are two types of PETL methods.
Methods that insert additional parameters to the model, such as Adapter, and methods that directly fine-tuned existing parameters, such as bitfit.
For the former, we use merge_petl.py
and use merge_model.py
for the latter.
Example commands for each type:
CUDA_VISIBLE_DEVICES=0 python merge_model.py --bs 1024 --default experiment/config/clip_fs_imagenet.yml --tune experiment/config/method-imagenet/ln.yml
CUDA_VISIBLE_DEVICES=0 python merge_petl.py --bs 1024 --default experiment/config/clip_fs_imagenet.yml --tune experiment/config/method-imagenet/fact_tk_64.yml
To get the WiSE curve plots, you can use WiSE_PETL.ipynb
.
We use three datasets for mann-shot experiments: CIFAR100, Clevr-distance and RESISC.
Here is an example command to run the PETL methods for the Clevr-distance dataset. Config files for CIFAR and RESISC can be found in the experiment/config
folder.
for METHOD in rand_h_adapter_8 lora_8 fact_tk_8
do
for dataset in clevr
do
CUDA_VISIBLE_DEVICES=0 python main_tune.py --data ${dataset} --default experiment/config/default_clevr.yml --tune experiment/config/method_clevr/$METHOD.yml --lrwd experiment/config/lr_wd_clevr.yml
done
done
We provide the tuning results tune_summary.csv
, final run results final_result.json
and other results for all the experiments in the tune_output
folder.
- add a new module file in
./model/
. - add the module accordingly in block.py, mlp.py, patch_embed.py, vision_transformer.py, attention.py.
- add the name of the added module in TUNE_MODULES and modify
get_model()
in./experiment/build_model.py
accordingly. - add required arguments in
main.py
- add a new dataset file in
/data/dataset
. - modify build_loader.py to include the new dataset.
- modify
get_base_model()
in build_model.py.
- add a general config file and a learning rate and weight decay config file in
./experiment/config/
. - add new method config files. You can create a new folder in
./experiment/config/
for your experiment.
Use main_collect_prediction.py.
If you use this paper/code in your research, please consider citing us:
@article{mai2024lessons,
title={Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition},
author={Mai, Zheda and Zhang, Ping and Tu, Cheng-Hao and Chen, Hong-You and Zhang, Li and Chao, Wei-Lun},
journal={arXiv preprint arXiv:2409.16434},
year={2024}
}
- SSF: https://github.com/dongzelian/SSF
- Adaptformer: https://github.com/ShoufaChen/AdaptFormer
- ConvPass: https://github.com/JieShibo/PETL-ViT
- LoRA: https://github.com/ZhangYuanhan-AI/NOAH
- Adapter: https://github.com/ZhangYuanhan-AI/NOAH, https://arxiv.org/pdf/2007.07779.pdf, https://arxiv.org/pdf/1902.00751.pdf
- VPT: https://github.com/KMnP/vpt
- FacT: https://github.com/JieShibo/PETL-ViT
- RepAdpater: https://github.com/luogen1996/RepAdapter
- VQT: https://openaccess.thecvf.com/content/CVPR2023/papers/Tu_Visual_Query_Tuning_Towards_Effective_Usage_of_Intermediate_Representations_for_CVPR_2023_paper.pdf
- BitFit: https://arxiv.org/pdf/2106.10199.pdf
- DiffFit: https://arxiv.org/pdf/2304.06648.pd