This repository contains code which accompanies the paper Capacity and Bias of Learned Geometric Embeddings for Directed Graphs (Boratko et al. 2021).
Code for the following papers will also be added shortly:
- Modeling Transitivity and Cyclicity in Directed Graphs via Binary Code Box Embeddings (Zhang et al. 2022)
- Learning Representations for Hierarchies with Minimal Support (Rozonoyer et al. 2024)
This code includes implementations of many geometric embedding methods:
- Vector Similarity and Distance
- Bilinear Vector Model (Nickel et al. 2011)
- ComplEx Embeddings (Trouillon et al. 2016)
- Order Embeddings (Vendrov et al. 2015) and Probabilistic Order Embeddings (Lai and Hockenmaier 2017)
- Hyperbolic Embeddings, including:
- "Lorentzian" - uses the squared Lorentzian distance on the Hyperboloid as in (Law et al. 2019), trains undirected but uses the asymmetric score function from (Nickel and Kiela 2017) to determine edge direction at inference
- "Lorentzian Score" - uses the asymmetric score above directly in training loss
- "Lorentzian Distance" - Hyperbolic model for directed graphs as described in section 2.3 of (Boratko et al. 2021)
- Hyperbolic Entailment Cones (Ganea et al. 2018)
- Gumbel Box Embeddings (Dasgupta et al. 2020)
- t-Box model as described in section 3 of (Boratko et al. 2021)
It also provides a general-purpose pipeline to explore correlation between graph characteristics and models' learning capabilities.
This repository makes use of submodules, to clone them you should use the --recurse-submodules
flag, eg.
git clone <repo-url> --recurse-submodules
After cloning the repo, you should create an environment and install pytorch. For example,
conda create -n graph-modeling python=3.8
conda activate graph-modeling
conda install -c pytorch cudatoolkit=11.3 pytorch
You can then run make all
to install the remaining modules and their dependencies. Note:
- This will install Python modules, so you should run this command with the virtual environment created previously activated.
- Certain graph generation methods (Kronecker and Price Network) will require additional dependencies to be compiled. In particular, Price requires that you use
conda
. If you are not interested in generating Kronecker or Price graphs you can skip this by usingmake base
instead ofmake all
.
This module provides a command line interface available with graph_modeling
.
Run graph_modeling --help
to see available options.
To generate a graph, run graph_modeling generate <graph_type>
, eg. graph_modeling generate scale-free-network
.
graph_modeling generate --help
provides a list of available graphs that can be generatedgraph_modeling generate <graph_type> --help
provides a list of parameters for generation
By default, graphs will be output in data/graphs
, using a subfolder for their graph type and parameter settings. You can override this with the --outdir
parameter.
You can train graph representations using the graph_modeling train
command, run graph_modeling train --help
to see available options. The only required parameter is --data_path
, which specifies either a specific graph file or a folder, in which case it will pick a graph in the folder uniformly randomly. The --model
option allows for a selection of different embedding models. Most other options apply to every model (eg. --dim
) or training in general (eg. --log_batch_size
). Model-specific options are prefaced with the model name (eg. --box_intersection_temp
). Please see the help text for the options for more details, and submit an issue if anything is unclear.
If you found the code contained in this repository helpful in your research, please cite the following paper:
@inproceedings{boratko2021capacity,
title={Capacity and Bias of Learned Geometric Embeddings for Directed Graphs},
author={Boratko, Michael and Zhang, Dongxu and Monath, Nicholas and Vilnis, Luke and Clarkson, Kenneth L and McCallum, Andrew},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021}
}