Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

This repository supports our paper titled "Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs," which has been accepted by IJCAI 2024. You can access the paper and get bibtex here.

If any code or the datasets are useful in your research, please cite the following paper:

@inproceedings{ijcai2024p237,
  title     = {Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs},
  author    = {Li, Zhengdao and Cao, Yong and Shuai, Kefan and Miao, Yiming and Hwang, Kai},
  booktitle = {Proceedings of the Thirty-Third International Joint Conference on
               Artificial Intelligence, {IJCAI-24}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Kate Larson},
  pages     = {2144--2152},
  year      = {2024},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2024/237},
  url       = {https://doi.org/10.24963/ijcai.2024/237},
}

Prepare Datasets:

Create a directory for data:
```
mkdir DATA
```
Navigate to the gnn_comparison directory:
```
cd gnn_comparison/
```
Set the dataset name in gnn_comparison/run_real_experiment.sh, then execute:
```
bash run_real_experiment.sh
```
This script will automatically download the dataset into the ./DATA directory and run the benchmark. If you only need to download the dataset, run:
```
bash prepare_experiment.sh
```
NOTE: Some bash parameters need to be set in run_real_experiment.sh, such as dats (dataset names), model_set (models to run, corresponding to each running config file *.yml), etc. You can run experiments in parallel by setting dats and model_set to multiple values, but we suggest running them one by one to avoid memory issues.

Configuration of Benchmark

All configuration files are located in the gnn_comparison/*.yml directory.

Specify parameters including model name, batch size, learning rate, feature types, etc., in the gnn_comparison/*.yml files.
For simplicity, we have separated each main configuration into different config files, such as config_GIN_attr.yml, config_GCN_degree.yml, etc.

Run Benchmark

Specify the config file name in run_real_experiment.sh for real-world datasets or run_syn_experiment.sh for synthetic datasets.
Set config file parameters in config_Baseline_[xxxx].yml, config_GIN_[xxxx].yml, config_GCN_[xxxx].yml, etc. Check the details in gnn_comparison/*.yml.

Run the benchmark:

bash gnn_comparison/run_real_experiment.sh

or

bash gnn_comparison/run_syn_experiment.sh

NOTE: All log and result locations are specified in run_real_experiment.sh and run_syn_experiment.sh. The results are saved in the ./results/ folder for further performance analysis. The folder name will be used for extracting statistics in plot_performance_gaps.ipynb and plot_statistics.ipynb.

Generate Performance Gap and Effectiveness of Benchmark Results

The results presented in the paper were generated in plot_performance_gaps.ipynb.
Some statistics of datasets can be found in plot_statistics.ipynb.

Graph Kernel Baselines

All kernels are implemented in kernel_baseline.ipynb.
For parallel processing, run:
```
bash run_kernel_baseline.sh
```

Regression

Generate datasets using generate_regression_datasets.py.
Run regression analysis in regressor.ipynb.

For further details, please refer to our paper here.

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
baseline_models		baseline_models
config		config
data_splits		data_splits
dataset_utils		dataset_utils
datasets		datasets
evaluation		evaluation
experiments		experiments
gnn_comparison		gnn_comparison
log		log
models		models
utils		utils
.gitignore		.gitignore
EndToEnd_Evaluation.py		EndToEnd_Evaluation.py
LICENSE		LICENSE
Launch_Experiments.py		Launch_Experiments.py
PrepareDatasets.py		PrepareDatasets.py
README.md		README.md
__init__.py		__init__.py
config.ini		config.ini
dev_models.py		dev_models.py
generate_regression_datasets.py		generate_regression_datasets.py
kernel_baseline.ipynb		kernel_baseline.ipynb
kernel_baselines.py		kernel_baselines.py
kill_running.sh		kill_running.sh
models.py		models.py
my_utils.py		my_utils.py
plot_performance_gaps.ipynb		plot_performance_gaps.ipynb
plot_statistics.ipynb		plot_statistics.ipynb
regressor.ipynb		regressor.ipynb
run_kernel_script.sh		run_kernel_script.sh
save_statistics.py		save_statistics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Prepare Datasets:

Configuration of Benchmark

Run Benchmark

Generate Performance Gap and Effectiveness of Benchmark Results

Graph Kernel Baselines

Regression

About

Releases

Packages

Languages

License

ICLab4DL/GNNBenchEffectiveness

Folders and files

Latest commit

History

Repository files navigation

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Prepare Datasets:

Configuration of Benchmark

Run Benchmark

Generate Performance Gap and Effectiveness of Benchmark Results

Graph Kernel Baselines

Regression

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages