This repository supports our paper titled "Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs," which has been accepted by IJCAI 2024. You can access the paper and get bibtex here.
If any code or the datasets are useful in your research, please cite the following paper:
@inproceedings{ijcai2024p237,
title = {Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs},
author = {Li, Zhengdao and Cao, Yong and Shuai, Kefan and Miao, Yiming and Hwang, Kai},
booktitle = {Proceedings of the Thirty-Third International Joint Conference on
Artificial Intelligence, {IJCAI-24}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {Kate Larson},
pages = {2144--2152},
year = {2024},
month = {8},
note = {Main Track},
doi = {10.24963/ijcai.2024/237},
url = {https://doi.org/10.24963/ijcai.2024/237},
}
- Create a directory for data:
mkdir DATA
- Navigate to the
gnn_comparison
directory:cd gnn_comparison/
- Set the dataset name in
gnn_comparison/run_real_experiment.sh
, then execute:This script will automatically download the dataset into thebash run_real_experiment.sh
./DATA
directory and run the benchmark. If you only need to download the dataset, run:bash prepare_experiment.sh
- NOTE: Some bash parameters need to be set in
run_real_experiment.sh
, such asdats
(dataset names),model_set
(models to run, corresponding to each running config file *.yml), etc. You can run experiments in parallel by settingdats
andmodel_set
to multiple values, but we suggest running them one by one to avoid memory issues.
All configuration files are located in the gnn_comparison/*.yml
directory.
- Specify parameters including model name, batch size, learning rate, feature types, etc., in the
gnn_comparison/*.yml
files. - For simplicity, we have separated each main configuration into different config files, such as
config_GIN_attr.yml
,config_GCN_degree.yml
, etc.
- Specify the config file name in
run_real_experiment.sh
for real-world datasets orrun_syn_experiment.sh
for synthetic datasets. - Set config file parameters in
config_Baseline_[xxxx].yml
,config_GIN_[xxxx].yml
,config_GCN_[xxxx].yml
, etc. Check the details ingnn_comparison/*.yml
. - Run the benchmark:
or
bash gnn_comparison/run_real_experiment.sh
bash gnn_comparison/run_syn_experiment.sh
- NOTE: All log and result locations are specified in
run_real_experiment.sh
andrun_syn_experiment.sh
. The results are saved in the./results/
folder for further performance analysis. The folder name will be used for extracting statistics inplot_performance_gaps.ipynb
andplot_statistics.ipynb
.
- The results presented in the paper were generated in
plot_performance_gaps.ipynb
. - Some statistics of datasets can be found in
plot_statistics.ipynb
.
- All kernels are implemented in
kernel_baseline.ipynb
. - For parallel processing, run:
bash run_kernel_baseline.sh
- Generate datasets using
generate_regression_datasets.py
. - Run regression analysis in
regressor.ipynb
.
For further details, please refer to our paper here.