Kdd'20 presentation 223

Structural Patterns and
Generative Models of
Real-world Hypergraphs
Manh Tuan Do, Se-eun Yoon, Bryan Hooi, Kijung Shin
26th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining
KDD 2020
Contact: Manh Tuan Do (manh.it97@kaist.ac.kr)

Roadmap
1. Introduction <<
2. Decomposition
3. Structural Patterns
4. Generators
5. Conclusions
2/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)

Coauthorship Email Tags
Common patterns Underlying mechanisms
Motivation Structural Patterns GeneratorsDecomposition Conclusion
Hypergraphs

Roadmap
1. Motivation
2. Decomposition <<
4. Generators
5. Conclusions

• Hypergraphs: not straightforward to analyze
o complex representation
o lack tools
Only interactions
at the level of
nodes
1
2 3
4 5
7
6 1
2 3
4
7
5
6
Motivation for a New Tool
• Projection
o information loss
o no high-order level information

• Multi-level decomposition:
◦ representation by pair-wise graphs
◦ leveraging existing tools & measurements
◦ no information loss: original hypergraph is reconstructible
Our Tool: Decomposition
1 7
2 42 3
1 4 3 4
1 21 3
5 6
4 5
3 5
3 4
5
1 2
3
1 2
4
1 3
4
2 3
4
1
2 3
4
7
5
6

1
2 3
4 5
7
6
{1, 2, 3, 4}
{3, 4, 5}
{5, 6}
{1, 7}
1
2 3
4
7
5
6
3 4
5
1 2
3
1 2
4
1 3
4
2 3
4
1 2
3 4
Node level
Triangle level 4clique level
2 42 3
1 4 3 4
1 21 31 7
5 6
4 5
3 5
Edge level

{1, 2, 3, 4}
{3, 4, 5}
{5, 6}
{1, 7}
Maximum
hyperedge size: 𝑁
Hypergraph Decomposed graphs
Node-level
Edge-level
…..
(𝑁-1)-clique level
Decomposition
Reconstruction
1 7
2 42 3
1 4 3 4
1 21 3
5 6
4 5
3 5
3 4
5
1 2
3
1 2
4
1 3
4
2 3
4
1
2 3
4
7
5
6
1
2 3
4 5
7
6

Roadmap
1. Motivation
2. Decomposition
3. Structural Patterns <<
4. Generators
5. Conclusions

• 13 datasets from 6 domains
◦ Email: recipient addresses of an email
◦ Drug components: classes or substances within a single drug, listed
in the National Drug Code Directory
◦ Drug use: drugs used by a patient, reported to the Drug Abuse
Warning Network, before an emergency visit
◦ Online tags: tags in a question in Stack Exchange forums
◦ Online threads: users answering a question in Stack Exchange forums
◦ Coauthorship: coauthors of a publications
Real-word Datasets

Structural Patterns
P1. Degree distribution: heavy-tailed
P2. Connected component: giant
P3. Clustering coefficient: high
P4. Effective diameter: small
P5. Singular value distribution: heavy-tailed

P1+P5. Heavy-tailed Distributions
Abundant low-
degree nodes
A few high-
degree nodes
Degree Singular values
Degree and singular-value distributions are heavy-tailed
J. Leskovec, J. Kleinberg, Cornell, C. Faloutsos . 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD

Statistical tests: to confirm heavy-tailed distributions
Lilliefors Test(1)
• 𝐻0: distribution is exponential
• 𝐻1: distribution is not exponential
𝐻0 rejected at 2.5% significance level
Log likelihood ratio(2)
𝑟 = 𝑙𝑜𝑔
𝐿1
𝐿0
• 𝐿1 : likelihood of a heavy-tailed
distribution (power-law, log
normal)
• 𝐿0 : likelihood of the exponential
distribution
If 𝑟 > 0 : the distribution is more
likely to be heavy-tailed.
P1+P5. Heavy-tailed Distributions
(1) Hubert W Lilliefors. 1969. On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown. Journal of American Statististical
Association 64 (1969), 387–389.
(2) Jeff Alstott and Dietmar Plenz Bullmore. 2014. powerlaw: a Python package for analysis of heavy-tailed distributions. PloS one 9, 1 (2014)

P2. Giant Connected Component
A large proportion of nodes are connected
Connected components
Proportionofnodes
J. Leskovec, J. Kleinberg, Cornell, C. Faloutsos . 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD

P3. High Clustering Coefficient
Local clustering coefficient:
𝐶𝑖 =
2 ∗ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠 𝑎𝑡 𝑖
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑑𝑔𝑒𝑠 𝑎𝑡 𝑖
Clustering coefficient:
𝐶 =
1
|𝑉|
𝑖∈𝑉
𝐶𝑖
High likelihood of having links between “friends of friends”
Wedge at 𝑖: open triangle
𝑖

P4. Small Effective Diameter
d = 8
90%
of
pairs
Most pairs of connected nodes: reachable within a small distance
https://web.stanford.edu/class/cs224w/handouts/02-gnp-smallworld.pdf

Structural Patterns: Intuition
Real-world
graphs
P5. Singular value distribution: heavy-tailed
Hypergraph Decomposed
graphs
All decomposition
levels
Real-world graph
Decomposition
J. Leskovec, J. Kleinberg, Cornell, C. Faloutsos. 2005. Graphs over Time:
Densification Laws, Shrinking Diameters and Possible Explanations. In KDD

Structural Patterns
Degree distributions: heavy-tailed
Node level Edge level Triangle level 4clique level

Structural Patterns
Singular-value distributions: heavy-tailed

Structural Patterns
Giant connected components vary among datasets
If there is a giant connected component
• High Clustering Coefficient
• Small Effective Diameter
Proportion of nodes in the
largest connected component
Small numbers indicate the
absence of a giant connected
component

Roadmap
1. Motivation
2. Decomposition
4. Generators <<
5. Conclusions

Our Model: HyperPA
1
2
3
A new node: 4
1: 2 2: 2 3: 1
(1,2): 2 (1,3): 1 (2,3): 1
(1,2,3): 1
2 hyperedges
1st hyperedge: size 2 {2,4}
2nd hyperedge: size 3 {1,3,4}
{1,2}
{1,2,3}
Main idea: “Subsets get rich together”

Our Model: HyperPA
Main idea: “Subsets get rich together”
Introduce a new node
Add the newly formed hyperedges
Update subset degrees
Repeat the process
1
2
3
1: 3 2: 3 3: 2
(1,2): 2 (1,3): 2 (2,3): 1
(1,2,3): 1
4
4: 2
(1,3,4): 1
(1,4): 1 (2,4): 1
{1,2}
{1,2,3}
(3,4): 1
A new node: 5
{2,4}
{1,3,4}

HyperPA
Real HyperPA: considers degrees of groups of nodes
Our Model: HyperPA

NaivePA
Real
NaivePA: considers node degrees individually
Baseline: NaïvePA

Roadmap
1. Motivation
2. Decomposition
4. Generators
5. Conclusions <<

Conclusion
• Our contributions in this work:
◦ Decomposition Tool: convenient analysis of hypergraphs
◦ Structural Patterns: 5 patterns across decomposition levels
◦ HyperPA: generator reproducing the 5 structural patterns
P5. Singular-value distribution: heavy-tailed

Kdd'20 presentation 223

Recommended

More Related Content

What's hot (20)

Similar to Kdd'20 presentation 223 (20)

Recently uploaded (20)

Kdd'20 presentation 223