We support two built-in multi-hop knowledge graph question answering (KGQA) datasets:
webqsp
cwq
We first pre-compute and cache entity and relation embeddings for all samples to save time for later training and inference of retrievers.
We use gte-large-en-v1.5
for text encoder, hence the environment name.
conda create -n gte_large_en_v1-5 python=3.10 -y
conda activate gte_large_en_v1-5
pip install -r requirements/gte_large_en_v1-5.txt
pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
python emb.py -d D
where D
should be a dataset mentioned in "Supported Datasets".
We now train a retriever, employ it for retrieval (inference), and evaluate the retrieval results.
conda create -n retriever python=3.10 -y
conda activate retriever
pip install -r requirements/retriever.txt
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install torch_geometric==2.5.3
pip install pyg_lib==0.3.1 torch_scatter==2.1.2 torch_sparse==0.6.18 -f https://data.pyg.org/whl/torch-2.1.0+cu121.html
python train.py -d D
where D
should be a dataset mentioned in "Supported Datasets".
For logged learning curves, go to the corresponding Wandb interface.
Once trained, there will be a folder in the current directory of the form {dataset}_{time}
(e.g., webqsp_Nov08-01:14:47/
) that stores the trained model checkpoint cpt.pth
.
python inference.py -p P
where P
is the path to a saved model checkpoint. The predicted retrieval result will be stored in the same folder as the model checkpoint. For example, if P
is webqsp_Nov08-01:14:47/cpt.pth
, then the retrieval result will be saved as webqsp_Nov08-01:14:47/retrieval_result.pth
.
python eval.py -d D -p P
where D
should be a dataset mentioned in "Supported Datasets" and P
is the path to inference result, e.g., webqsp_Nov08-01:14:47/retrieval_result.pth
.