retrieve

Stage 1: Retrieval

Supported Datasets

We support two built-in multi-hop knowledge graph question answering (KGQA) datasets:

webqsp
cwq

1-1: Entity and Relation Embedding Pre-Computation

We first pre-compute and cache entity and relation embeddings for all samples to save time for later training and inference of retrievers.

Installation

We use gte-large-en-v1.5 for text encoder, hence the environment name.

conda create -n gte_large_en_v1-5 python=3.10 -y
conda activate gte_large_en_v1-5
pip install -r requirements/gte_large_en_v1-5.txt
pip install -U xformers --index-url https://download.pytorch.org/whl/cu121

Inference (Embedding Computation)

python emb.py -d D

where D should be a dataset mentioned in "Supported Datasets".

1-2: Retriever Development

We now train a retriever, employ it for retrieval (inference), and evaluate the retrieval results.

Installation

conda create -n retriever python=3.10 -y
conda activate retriever
pip install -r requirements/retriever.txt
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install torch_geometric==2.5.3
pip install pyg_lib==0.3.1 torch_scatter==2.1.2 torch_sparse==0.6.18 -f https://data.pyg.org/whl/torch-2.1.0+cu121.html

Training

python train.py -d D

where D should be a dataset mentioned in "Supported Datasets".

For logged learning curves, go to the corresponding Wandb interface.

Once trained, there will be a folder in the current directory of the form {dataset}_{time} (e.g., webqsp_Nov08-01:14:47/) that stores the trained model checkpoint cpt.pth.

Inference

python inference.py -p P

where P is the path to a saved model checkpoint. The predicted retrieval result will be stored in the same folder as the model checkpoint. For example, if P is webqsp_Nov08-01:14:47/cpt.pth, then the retrieval result will be saved as webqsp_Nov08-01:14:47/retrieval_result.pth.

Evaluation

python eval.py -d D -p P

where D should be a dataset mentioned in "Supported Datasets" and P is the path to inference result, e.g., webqsp_Nov08-01:14:47/retrieval_result.pth.

Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
data_files		data_files
requirements		requirements
src		src
README.md		README.md
emb.py		emb.py
eval.py		eval.py
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retrieve

retrieve

README.md

Stage 1: Retrieval

Table of Contents

Supported Datasets

1-1: Entity and Relation Embedding Pre-Computation

Installation

Inference (Embedding Computation)

1-2: Retriever Development

Installation

Training

Inference

Evaluation

Files

retrieve

Directory actions

More options

Directory actions

More options

Latest commit

History

retrieve

Folders and files

parent directory

README.md

Stage 1: Retrieval

Table of Contents

Supported Datasets

1-1: Entity and Relation Embedding Pre-Computation

Installation

Inference (Embedding Computation)

1-2: Retriever Development

Installation

Training

Inference

Evaluation