## Prompt Tuning for Generative Multimodal Pretrained Models
### Overview
This is the code for **"Prompt Tuning for Generative Multimodal Pretrained Models"**, [Check our paper on ArXiv](https://arxiv.org/abs/2208.02532). This paper explores prompt tuning for generative multimodal pretrained models, instead of the constrastive learning models. We specifically focuses on the unified sequence-to-sequence learning framework and implement on our OFA models.
### Requirements
* python 3.7.4
* pytorch 1.8.1
* torchvision 0.9.1
* JAVA 1.8 (for COCO evaluation)
### Installation
```bash
pip install -r requirements.txt
```
### Datasets and Checkpoints
See [datasets.md](datasets.md) and [checkpoints.md](checkpoints.md).
### Training
We provide a demo script (`run_scripts/refcoco/train_refcoco_prefix.sh`) that has all the required parts for training.
```sh
sh ./run_scripts/refcoco/train_refcoco_prefix.sh
```
A few options of note:
* `--encoder-prompt` :: whether to insert prompts to the encoder
* `--decoder-prompt` :: whether to insert prompts to the decoder
* `--encoder-prompt-length` :: encoder prompt length
* `--decoder-prompt-length` :: decoder prompt length
* `--bitfit` :: whether to use bitfit
* `--adapter` :: whether to use adapter
* `--adapter-dim` :: adapter projection dim
We recommend that your workspace directory should be organized like this:
```
OFA/
âââ checkpoints/
â  âââ ofa_base.pt
â  âââ ofa_large.pt
â  âââ ...
âââ criterions/
âââ data/
âââ dataset/
â  âââ caption_data/
â  âââ refcoco_data/
â  âââ ...
âââ fairseq/
âââ models/
âââ run_scripts/
âââ tasks/
âââ train.py
âââ trainer.py
âââ utils/
```