PvGaLM: Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

PvGaLM is a novel privacy-preserving pipeline for relational learning. It provides decoupled training relation sampling paired with efficient tuple-level gradient clipping so that large pretrained models can be fine-tuned on graph data through DP-SGD with rigorous privacy guarantee.

We consider two specific use cases for private relational learning:
Cross-category recommendation: When launching new product lines, RecSys models often face the problem of lacking historical data for prediction (e.g., co-purchase), which can be alleviated by leveraging user purchase history of complementary categories, but these co-purchase relations contain sensitive user behaviors.
Cross-regional model deployment: Financial institutions operate in multiple locations, and their service models (e.g., fraud detection) are normally trained on transaction data collected from major markets and then deployed to multiple regions after fine-tuning, but this practice is often challenged by regional data protection regulations.

Currently Supported Models & Tasks

HuggingFace: BERT.base, BERT.large, Llama2-7B, Llama2-13B
Tasks: Relation Prediction, Entity Classification

Environment Configuration

Requirements

(Other versions may work but are untested)

python >= 3.11
pytorch >= 2.1.2
transformers >= 4.23.0
peft == 0.10.0
opacus == 1.4.1

Install by Conda

Update conda:

conda update -n base -c defaults conda

Install basic dependencies to the virtual environment and activate it:

conda env create -f environment_pyvacy.yml
conda activate pyvacy

Install by Pip

conda create -n pyvacy python=3.11
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
pip install transformers==4.39.3
pip install sentencepiece
pip install peft
pip install datasets
pip install evaluate
pip install opacus==1.4.1
pip install wandb
pip install pandas
pip install scikit-learn

To use Jupyter Notebook

conda install ipykernel
ipython kernel install --user --name=pyvacy

Commands

non-private fine-tuning on relational data

bash lp_train_galm.sh $GPU_ID $DATASET $BATCH_SIZE $MODEL
bash lp_train_galm.sh 0 sports 128 large

fine-tuning on relational data with DP

bash lp_train_pvgalm.sh $GPU_ID $DATASET $EPSILON $NOISE_SCALE $CLIPPING_NORM $BATCH_SIZE 
bash lp_train_pvgalm.sh 0 mag_us -1 0.433 1 64

few-shot relation prediction

bash lp_train_pvgalm_fewshot.sh $GPU_ID $DATASET $BATCH_SIZE
bash lp_train_pvgalm_fewshot.sh 0 cloth 16

few-shot entity classification

bash nc_train_galm.sh $GPU_ID $DATASET $BATCH_SIZE
bash nc_train_galm.sh 0 mag_cn 128

Datasets

Please follow the description in Appx. C to obtain the MAG and the AMAZ datasets.

Citation

Please cite our paper if you are interested in our work.

@inproceedings{
    yin2024privately,
    title={Privately Learning from Graphs with Applications in Fine-tuning Large Language Models},
    author={Haoteng Yin and Rongzhe Wei and Eli Chien and Pan Li},
    booktitle={Statistical Foundations of LLMs and Foundation Models (NeurIPS 2024 Workshop)},
    year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dpgrl		dpgrl
figures		figures
script		script
template		template
.gitignore		.gitignore
LICENSE		LICENSE
Lora_SeqCLS.py		Lora_SeqCLS.py
Lora_SeqLP.py		Lora_SeqLP.py
Lora_SeqLP_prv.py		Lora_SeqLP_prv.py
README.md		README.md
arguments.py		arguments.py
dataset.py		dataset.py
environment_pyvacy.yml		environment_pyvacy.yml
model.py		model.py
trainer.py		trainer.py
transformers_support.py		transformers_support.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PvGaLM: Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

Currently Supported Models & Tasks

Environment Configuration

Requirements

Install by Conda

Install by Pip

Commands

Datasets

Citation

About

Releases

Packages

Languages

License

Graph-COM/PvGaLM

Folders and files

Latest commit

History

Repository files navigation

PvGaLM: Privately Learning from Graphs with Applications in Fine-tuning Large Language Models

Currently Supported Models & Tasks

Environment Configuration

Requirements

Install by Conda

Install by Pip

Commands

Datasets

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages