Skip to content

kswoo97/pcl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Source Code and Datasets for "Datasets, Tasks, and Training Methods for Benchmarking Large-scale Hypergraph Learning."

Publication Information

  • Title: Datasets, Tasks, and Training Methods for Benchmarking Large-scale Hypergraph Learning.
  • Authors: Sunwoo Kim, Dongjin Lee, Yul Kim, Jungho Lee, Taeho Hwang, and Kijung Shin.
  • Venue: Data Mining and Knowledge Discovery 2023. (+ Journal Track of ECML-PKDD 2023)

Dataset Description

We provide hypergraph datasets at the below links:

We provide

  • Each dataset's feature, node label, original hyperedge information, split hyperedge information (for task 1)
  • Each dataset's partitioned hypergraph (# of Partition $|P|$ / DBLP:4, Trivago:32, OGBN_MAG:128, AMINER and MAG: 256)
  • For DBLP, Trivago, OGBN_MAG, we also provide partitioned hypergraph P-IOS partition.

Refer to README_DATA.txt file for more details regarding datasets.

Code Description

Overview

In this repository, we provide source codes for

  • Obtaining performance on proposed task 1 (hyperedge disambiguation)
  • Obtaining performance on proposed task 2 (local clustering)

Dataset

For dataset one aims to run the code, files of the corresponding dataset in the above links should be located in scr/data folder. For example

src
  |_ data
      |_ aminer
            |_ aminer_X_vec.pt
            |_ aminer_y.pt
            |_ aminer_E.pt
            |_ ...

Hyperparameters

We provide a hyperparameter combination for the reproducibility of experimental results.
Refer to the best_hyperparameter directory, where we saved each dataset-model combination's best hyperparameter as .json files.

How to implement codes

One can simply run code with codes in src folder as follows;
For task 1 (hyperedge disambiguation)

python experiment1 -data "data" -model "learning-method" -device "GPU-device" -lr 0.001 -seed 0 

For task 2 (local clustering)

python experiment2 -data "data" -model "learning-method" -device "GPU-device" -lr 0.001 -seed 0 

For contrastive learning, additional arguments can be given as

python experiment1 -data "data" -model "learning-method" -device "GPU-device" -lr 0.001 -seed 0 -n_neg 1 -d_rate 0.3 -ep 25

Arguments are

  • -data : Dataset one wants to perform experiments. (Possible Options: "dblp", "trivago", "ogbn_mag", "aminer", "mag")
  • -model : Training Method one wants to perform experiments. (Possible Options: "mlp", "full_ssl", "full_cl", "part_ssl", "HCL", "HCL_PINS", "HCL_PIOS")
  • -device : GPU machine (e.g. "cuda:0")
  • -lr : Learning rate for method. (float / e.g. 0.001)
  • -n_neg : Applicable to CL methods. Number of negative samples used for CL training. (int / e.g. 2)
  • -d_rate : Applicable to CL methods. Feature & Incidence matrix dropping probability. (float / e.g. 0.3)
  • -ep : Applicable to CL methods. Contrastive encoder's checkpoint for contrastive training. (int / e.g. 25)
  • -seed : Dataset & model initialization seed. (int / e.g. 0)

About

Published as a journal paper at DAMI 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages