This repository provides reference implementations of RiWalk as described in the paper:
RiWalk: Fast Structural Node Embedding via Role Identification.
Xuewei Ma, Geng Qin, Zhiyang Qiu, Mingxin Zheng, Zhe Wang.
IEEE International Conference on Data Mining, ICDM, 2019.
The RiWalk algorithm learns continuous representations for nodes in graphs. The learned representations capture structural similarities between nodes.
RiWalk decouples the structural embedding problem into a role identification procedure and a network embedding procedure.
The key idea of RiWalk can be illustrated as in the following picture.
Two nodes a and u residing far apart in a network have similar local topologies but totally different context nodes. However, after the role identification procedure, they have similar context and are indirectly densely connected, thus typical network embedding methods can be directly applied to learn structural embeddings.
This repository provides several different implementations of RiWalk:
- src/RiWalk: a python implementation of RiWalk-SP and RiWalk-WL as described in the paper.
- src/RiWalk-RW: a python implementation of RiWalk-RW-SP and RiWalk-RW-WL, two variations of RiWalk-SP and RiWalk-WL. RiWalk-RW traverses a subgraph induced by random walk sequences instead of breadth-first searching the local subgraph.
- src/RiWalk-C: a C implementation of RiWalk-SP and RiWalk-RW-SP.
The full list of command line options is available with
# RiWalk
python3 src/RiWalk/RiWalk.py --help
# RiWalk-RW
python3 src/RiWalk-RW/RiWalk-RW.py --help
# RiWalk-C
gcc -lm -pthread -Ofast -march=native -Wall -ffast-math -Wno-unused-result src/RiWalk-C/RiWalk.c -o src/RiWalk-C/RiWalk
src/RiWalk-C/RiWalk
We provide an example running script for the actor data set in train_actor.sh.
The supported input format is an edgelist:
node1_id_int node2_id_int
The output file has n+1 lines for a graph with n vertices. The first line has the following format:
num_of_nodes dim_of_representation
The next n lines are as follows:
node_id dim1 dim2 ... dimd
where dim1, ... , dimd is the d-dimensional representation learned by RiWalk.
We would like to thank the authors of node2vec, struc2vec, GraphWave and LINE for the open access of the implementations of their methods.
- Please send any questions you might have about the code and/or the algorithm to [email protected].