Code for the SIGIR 2022 short paper "Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt Tuning"
- Our method is related to Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning that has been accepted by NeurIPS2022. The project address is visible at RetroPrompt.
- Our method is based on KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction that has been accepted by WWW2022. The project address is visible at KnowPrompt.
- Our paper is also related to PTR: Prompt Tuning with Rules for Text Classification.
The inference procedure of our RetrievalRE.
Step1 Download the basic code
git clone --depth 1 https://github.com/zjunlp/PromptKG.gitStep2 Create a virtual environment using Anaconda and enter it.
conda create -n retrievalre python=3.8
conda activate retrievalreStep3 Enter the task directory
cd PrompKG/research/RetrievalRE
pip install -r requirements.txtTo evaluate RetrieveRE, we conduct experiments on 5 RE datasets, include SEMEVAL, DialogRE, TACRED-Revisit, Re-TACRE and TACRED. You can download them via this link.
The expected structure of files is:
RetrieveRE
|-- dataset
| |-- semeval
| | |-- train.txt
| | |-- dev.txt
| | |-- test.txt
| | |-- temp.txt
| | |-- rel2id.json
| |-- dialogue
| | |-- train.json
| | |-- dev.json
| | |-- test.json
| | |-- rel2id.json
| |-- tacred
| | |-- ...
| |-- tacrev
| | |-- ...
| |-- retacred
| | |-- ...
|-- scripts
| |-- knn_scripts
| | |-- get_knn_semeval.sh
| | |-- combine_knn_semeval.sh
| |-- semeval.sh
| |-- dialogue.sh
| |-- ...
|-- data
|-- lit_models
|-- models
|-- get_label_word.py
|-- main.py
Use the comand below to get the answer words to use in the training.
python get_label_word.py --model_name_or_path bert-large-uncased --dataset_name semevalThe {answer_words}.ptwill be saved in the dataset, you need to assign the model_name_or_path and dataset_name in the get_label_word.py.
Download the data first, and put it to dataset folder. Run the comand below, and get the few shot dataset.
python generate_k_shot.py --data_dir ./dataset --k 8 --dataset semeval
cd dataset
cd semeval
cp rel2id.json val.txt test.txt ./k-shot/8-1You need to modify the k and dataset to assign k-shot and dataset. Here we default seed as 1,2,3,4,5 to split each k-shot, you can revise it in the generate_k_shot.py
There are three steps to train RetrieveRE model:
- Train the re model with robert-base.
- Use the trained re model to get knn data.
- Inference with the trained re model and knn data.
Train the Retrieve model on SEMEVAL with the following command:
bash scripts/semeval.sh
bash scripts/knn_scripts/get_knn_semeval.sh
bash scripts/knn_scripts/combine_knn_semeval.shAs the scripts for TACRED-Revist, Re-TACRED, Wiki80 included in our paper are also provided, you just need to run it like above example.
As the data format of DialogRE is very different from other dataset, Class of processor is also different. Train the RetieveRE model on DialogRE with the following command:
bash scripts/dialogue.sh
bash scripts/knn_scripts/get_knn_dialogue.sh
bash scripts/knn_scripts/combine_knn_dialogue.shIf you use or extend our work, please cite the paper as follows:
@inproceedings{DBLP:conf/sigir/ChenLZTHSC22,
author = {Xiang Chen and
Lei Li and
Ningyu Zhang and
Chuanqi Tan and
Fei Huang and
Luo Si and
Huajun Chen},
editor = {Enrique Amig{\'{o}} and
Pablo Castells and
Julio Gonzalo and
Ben Carterette and
J. Shane Culpepper and
Gabriella Kazai},
title = {Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt
Tuning},
booktitle = {{SIGIR} '22: The 45th International {ACM} {SIGIR} Conference on Research
and Development in Information Retrieval, Madrid, Spain, July 11 -
15, 2022},
pages = {2443--2448},
publisher = {{ACM}},
year = {2022},
url = {https://doi.org/10.1145/3477495.3531746},
doi = {10.1145/3477495.3531746},
timestamp = {Mon, 11 Jul 2022 12:19:25 +0200},
biburl = {https://dblp.org/rec/conf/sigir/ChenLZTHSC22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}