Presentation slides for the paper 'Structural Patterns and Generative Models of Real-world Hypergraphs'. Published in KDD2020 - ACM SIGKDD International Conference on Knowedge Discovery and Data Mining
Recurrent Neural Networks have shown to be very powerful models as they can propagate context over several time steps. Due to this they can be applied effectively for addressing several problems in Natural Language Processing, such as Language Modelling, Tagging problems, Speech Recognition etc. In this presentation we introduce the basic RNN model and discuss the vanishing gradient problem. We describe LSTM (Long Short Term Memory) and Gated Recurrent Units (GRU). We also discuss Bidirectional RNN with an example. RNN architectures can be considered as deep learning systems where the number of time steps can be considered as the depth of the network. It is also possible to build the RNN with multiple hidden layers, each having recurrent connections from the previous time steps that represent the abstraction both in time and space.
This document provides an overview of graph neural networks (GNNs). GNNs are a type of neural network that can operate on graph-structured data like molecules or social networks. GNNs learn representations of nodes by propagating information between connected nodes over many layers. They are useful when relationships between objects are important. Examples of applications include predicting drug properties from molecular graphs and program understanding by modeling code as graphs. The document explains how GNNs differ from RNNs and provides examples of GNN variations, datasets, and frameworks.
The document summarizes sampling methods from Chapter 11 of Bishop's PRML book. It introduces basic sampling algorithms like rejection sampling, importance sampling, and SIR. It then discusses Markov chain Monte Carlo (MCMC) methods which allow sampling from complex distributions using a Markov chain. Specific MCMC methods covered include the Metropolis algorithm, Gibbs sampling, and estimating the partition function using the IP algorithm.
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
The document provides an overview of recurrent neural networks (RNNs) and their advantages over feedforward neural networks. It describes the basic structure and training of RNNs using backpropagation through time. RNNs can process sequential data of variable lengths, unlike feedforward networks. However, RNNs are difficult to train due to vanishing and exploding gradients. More advanced RNN architectures like LSTMs and GRUs address this by introducing gating mechanisms that allow the network to better control the flow of information.
- The document introduces Deep Counterfactual Regret Minimization (Deep CFR), a new algorithm proposed by Noam Brown et al. in ICML 2019 that incorporates deep neural networks into Counterfactual Regret Minimization (CFR) for solving large imperfect-information games.
- CFR is an algorithm for computing Nash equilibria in two-player zero-sum games by minimizing cumulative counterfactual regret. It scales poorly to very large games that require abstraction of the game tree.
- Deep CFR removes the need for abstraction by using a neural network to generalize the strategy across the game tree, allowing it to solve previously intractable games like no-limit poker.
NICE: Non-linear Independent Components Estimation Laurent Dinh, David Krueger, Yoshua Bengio. 2014.
Density estimation using Real NVP
Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. 2017.
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma, Prafulla Dhariwal. 2018.
논문 리뷰 자료
The document discusses neural networks, including human neural networks and artificial neural networks (ANNs). It provides details on the key components of ANNs, such as the perceptron and backpropagation algorithm. ANNs are inspired by biological neural systems and are used for applications like pattern recognition, time series prediction, and control systems. The document also outlines some current uses of neural networks in areas like signal processing, anomaly detection, and soft sensors.
Part 1 of the Deep Learning Fundamentals Series, this session discusses the use cases and scenarios surrounding Deep Learning and AI; reviews the fundamentals of artificial neural networks (ANNs) and perceptrons; discuss the basics around optimization beginning with the cost function, gradient descent, and backpropagation; and activation functions (including Sigmoid, TanH, and ReLU). The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
This document provides an overview of artificial neural networks and their application as a model of the human brain. It discusses the biological neuron, different types of neural networks including feedforward, feedback, time delay, and recurrent networks. It also covers topics like learning in perceptrons, training algorithms, applications of neural networks, and references key concepts like connectionism, associative memory, and massive parallelism in the brain.
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
The document provides information about multi-layer perceptrons (MLPs) and backpropagation. It begins with definitions of perceptrons and MLP architecture. It then describes backpropagation, including the backpropagation training algorithm and cycle. Examples are provided, such as using an MLP to solve the exclusive OR (XOR) problem. Applications of backpropagation neural networks and options like momentum, batch vs sequential training, and adaptive learning rates are also discussed.
The document summarizes imitation learning techniques. It introduces behavioral cloning, which frames imitation learning as a supervised learning problem by learning to mimic expert demonstrations. However, behavioral cloning has limitations as it does not allow for recovery from mistakes. Alternative approaches involve direct policy learning using an interactive expert or inverse reinforcement learning, which aims to learn a reward function that explains the expert's behavior. The document outlines different types of imitation learning problems and algorithms for interactive direct policy learning, including data aggregation and policy aggregation methods.
The document discusses hypergraph motifs, which describe connectivity patterns between three connected hyperedges in a hypergraph. It proposes MoCHy, a family of parallel algorithms for counting instances of hypergraph motifs in large hypergraphs. Experimental results on real-world hypergraphs from different domains show that their motif distributions differ significantly from randomized hypergraphs, and MoCHy can efficiently count motifs in large hypergraphs.
This document discusses precision and recall, which are metrics used to evaluate the performance of classification models. Precision measures the proportion of predicted positive instances that are actually positive, while recall measures the proportion of actual positive instances that are correctly predicted to be positive. The document also presents formulas for calculating precision, recall, and the harmonic mean of precision and recall.
A Diffusion Wavelet Approach For 3 D Model Matchingrafi
The document presents a novel diffusion wavelet approach for 3D model matching. It combines diffusion maps with wavelets to extract multi-scale shape features from 3D models. Fisher's discriminant ratio is used to select discriminative wavelet coefficients for model representation. Models are retrieved by comparing their representation vectors at different wavelet scales. Results show the diffusion wavelet approach outperforms spherical harmonics and wavelets for 3D model retrieval.
The document discusses neural networks, including human neural networks and artificial neural networks (ANNs). It provides details on the key components of ANNs, such as the perceptron and backpropagation algorithm. ANNs are inspired by biological neural systems and are used for applications like pattern recognition, time series prediction, and control systems. The document also outlines some current uses of neural networks in areas like signal processing, anomaly detection, and soft sensors.
Part 1 of the Deep Learning Fundamentals Series, this session discusses the use cases and scenarios surrounding Deep Learning and AI; reviews the fundamentals of artificial neural networks (ANNs) and perceptrons; discuss the basics around optimization beginning with the cost function, gradient descent, and backpropagation; and activation functions (including Sigmoid, TanH, and ReLU). The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
This document provides an overview of artificial neural networks and their application as a model of the human brain. It discusses the biological neuron, different types of neural networks including feedforward, feedback, time delay, and recurrent networks. It also covers topics like learning in perceptrons, training algorithms, applications of neural networks, and references key concepts like connectionism, associative memory, and massive parallelism in the brain.
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
The document provides information about multi-layer perceptrons (MLPs) and backpropagation. It begins with definitions of perceptrons and MLP architecture. It then describes backpropagation, including the backpropagation training algorithm and cycle. Examples are provided, such as using an MLP to solve the exclusive OR (XOR) problem. Applications of backpropagation neural networks and options like momentum, batch vs sequential training, and adaptive learning rates are also discussed.
The document summarizes imitation learning techniques. It introduces behavioral cloning, which frames imitation learning as a supervised learning problem by learning to mimic expert demonstrations. However, behavioral cloning has limitations as it does not allow for recovery from mistakes. Alternative approaches involve direct policy learning using an interactive expert or inverse reinforcement learning, which aims to learn a reward function that explains the expert's behavior. The document outlines different types of imitation learning problems and algorithms for interactive direct policy learning, including data aggregation and policy aggregation methods.
The document discusses hypergraph motifs, which describe connectivity patterns between three connected hyperedges in a hypergraph. It proposes MoCHy, a family of parallel algorithms for counting instances of hypergraph motifs in large hypergraphs. Experimental results on real-world hypergraphs from different domains show that their motif distributions differ significantly from randomized hypergraphs, and MoCHy can efficiently count motifs in large hypergraphs.
This document discusses precision and recall, which are metrics used to evaluate the performance of classification models. Precision measures the proportion of predicted positive instances that are actually positive, while recall measures the proportion of actual positive instances that are correctly predicted to be positive. The document also presents formulas for calculating precision, recall, and the harmonic mean of precision and recall.
A Diffusion Wavelet Approach For 3 D Model Matchingrafi
The document presents a novel diffusion wavelet approach for 3D model matching. It combines diffusion maps with wavelets to extract multi-scale shape features from 3D models. Fisher's discriminant ratio is used to select discriminative wavelet coefficients for model representation. Models are retrieved by comparing their representation vectors at different wavelet scales. Results show the diffusion wavelet approach outperforms spherical harmonics and wavelets for 3D model retrieval.
This document discusses influenceability estimation in social networks. It describes the independent cascade model of influence diffusion, where each node has an independent probability of influencing its neighbors. The problem is to estimate the expected number of nodes reachable from a given seed node. The document presents the naive Monte Carlo (NMC) approach, which samples possible graphs and averages the number of reachable nodes over the samples. While NMC provides an unbiased estimator, it has high variance. The document aims to reduce the variance to improve estimation accuracy.
The document describes a study that tested whether feedback could help everyday people better interpret data visualizations. Participants were asked to compare sections of pie charts and bar charts and estimate percentages. Some participants received feedback on their estimates of pie charts. Those who received feedback improved at interpreting pie charts over the course of the experiment compared to those who did not receive feedback, suggesting feedback can help develop data literacy skills.
Reinforced concrete (RC) shear wall is one of the most widely adopted earthquake-resisting structural elements. Accurate prediction of capacity curves of RC shear walls has been of significant importance since it can convey important information about progressive damage states, the degree of energy absorption, and the maximum strength. Decades-long experimental efforts of the research community established a systematic database of capacity curves, but it is still in its infancy to productively utilize the accumulated data. In the hope of adding a new dimension to earthquake engineering, this study provides a machine learning (ML) approach to predict capacity curves of the RC shear wall based on a multi-target prediction model and fundamental statistics. This paper harnesses bootstrapping for uncertainty quantification and affirms the robustness of the proposed method against erroneous data. Results and validations using more than 200 rectangular RC shear walls show a promising performance and suggest future research directions toward data- and ML-driven earthquake engineering.
The document discusses ensemble clustering methods. It begins by comparing classification and clustering, noting that clustering differs in that ground truth labels are not known beforehand. It then discusses how ensemble clustering can improve upon single clustering algorithms by generating multiple partitions and combining them. The key steps are: 1) generating an ensemble of initial partitions from clustering the data multiple times, 2) aligning the initial partitions into metaclusters, and 3) voting to determine a final clustering assignment. This approach provides benefits of scalability and robustness over single clustering algorithms.
This document summarizes a research paper that proposes using kernel learning methods to detect rumors in microblog posts by modeling how information spreads as propagation trees. It introduces a propagation tree kernel (PTK) that calculates similarity between propagation trees by counting common subtrees. It also proposes a context-sensitive extension of PTK (cPTK) that considers propagation paths from the root node to subtrees. An evaluation on two Twitter datasets shows cPTK achieves the best rumor detection performance compared to other baselines.
Simplicial closure and higher-order link predictionAustin Benson
This document summarizes research on simplicial closure and higher-order link prediction in network science. It finds that groups of nodes often interact through complex trajectories before reaching "simplicial closure" where all nodes are jointly present in a simplex. Predicting these closed simplices is framed as a higher-order link prediction problem. Various score functions are proposed based on edge weights, node neighborhoods, and similarity measures. Scores combining local edge weight information consistently perform well, outperforming classical link prediction approaches. The results provide insights into higher-order structure and a framework for evaluating models of complex relational data.
Graph and language embeddings were used to analyze user data from Reddit to predict whether authors would post in the SuicideWatch subreddit. Metapath2vec was used to generate graph embeddings from subreddit and author relationships. Doc2vec was used to generate document embeddings based on language similarity between submissions and subreddits. Combining the graph and document embeddings in a logistic regression achieved 90% accuracy in predicting SuicideWatch posters, reducing both false positives and false negatives compared to using the embeddings separately. Next steps proposed using the embeddings to better understand similarities between related subreddits and predict risk factors in posts.
This document summarizes a lecture on statistical inference and exploratory data analysis. It includes announcements about the class, an overview of the data science workflow and statistical inference. The lecture covers modeling data and uncertainty, populations and samples, probability distributions and fitting models. It concludes with an introduction to exploratory data analysis and an activity to perform EDA in a Jupyter notebook.
A Longitudinal Perspective on the Relationship between Hypermedia Structure...Pierre Fastrez
1) The study examined how the structure of an educational hypermedia system called HyperDoc influenced learners' comprehension of its content over multiple sessions.
2) Participants used one of two versions of HyperDoc, which had the same information but different structural organizations, over 4 sessions of browsing and note-taking.
3) Analysis found the overall structure of participants' summaries did not significantly change to match the version of HyperDoc they used, and remained stable over time as their expertise grew.
Ability Study of Proximity Measure for Big Data Mining Context on ClusteringKamleshKumar394
This document summarizes a research paper on using proximity measures for clustering big data. It discusses the objectives of identifying proximity measures that can handle the volume, variety, and velocity of big data. It then provides background on big data and defines the 3Vs (volume, variety, velocity). Different types of clustering algorithms are described including partitioning, hierarchical, density-based, grid-based, and model-based. Finally, it outlines several taxonomies of proximity measures that can be used for clustering, including Minkowski distances, L1 distances, L2 distances, inner products, Shannon entropy, combinations, and intersections.
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...Waqas Nawaz
Waqas Nawaz Khokhar presented research on optimizing shortest path traversal and analysis for large graph clustering. The presentation outlined challenges with traditional graph clustering approaches for big real-world graphs. It proposed four optimizations: 1) a collaborative similarity measure to reduce complexity from O(n3) to O(n2logn); 2) identifying overlapping shortest path regions to avoid redundant traversals; 3) confining traversals within clusters to limit unnecessary graph regions; and 4) allowing parallel shortest path queries to reduce latency. Experimental results on real and synthetic graphs showed the approaches improved efficiency by 40% in time and an order of magnitude in space while maintaining clustering quality. Future work aims to address intermediate data explosion
This document describes research applying techniques from program analysis to automatically infer properties of code in the Maple computer algebra system. The researchers developed abstract interpretation frameworks tailored to Maple and used these to gather constraints about Maple code and values. By analyzing the entire Maple library with this approach, they were able to infer simple but useful properties, like type information, for parts of the library. This demonstrated the potential of applying formal methods to understand and reason about large, dynamically-typed code bases.
Abstract: The processing power of computing devices has increased with number of available cores. This paper presents an approach
towards clustering of categorical data on multi-core platform. K-modes algorithm is used for clustering of categorical data which
uses simple dissimilarity measure for distance computation. The multi-core approach aims to achieve speedup in processing. Open
Multi Processing (OpenMP) is used to achieve parallelism in k-modes algorithm. OpenMP is a shared memory API that uses
thread approach using the fork-join model. The dataset used for experiment is Congressional Voting Dataset collected from UCI
repository. The dataset contains votes of members in categorical format provided in CSV format. The experiment is performed for
increased number of clusters and increasing size of dataset.
The document proposes improvements to distributed graph pattern matching algorithms. It introduces a boundary filter technique that aims to shrink large data graphs by removing boundary nodes. These are nodes that only have one relationship in directed graphs. Experiments show the boundary filter approach significantly reduces running time compared to the original distributed tight simulation algorithm, while also improving accuracy by finding more matching subgraphs. The boundary filtering allows independent evaluation of each vertex and scales well to more complex graph patterns.
The document proposes and evaluates two techniques for attention in multi-source sequence-to-sequence learning: flat attention combination and hierarchical attention combination. Both techniques achieved comparable results to existing context vector concatenation approaches on tasks of multimodal translation and automatic post-editing. Hierarchical attention combination performed best on multimodal translation and allows inspecting individual input attentions. The techniques provide a way to model importance of each input sequence.
Content Moderation Services_ Leading the Future of Online Safety.docxsofiawilliams5966
These services are not just gatekeepers of community standards. They are architects of safe interaction, unseen defenders of user well-being, and the infrastructure supporting the promise of a trustworthy internet.
Comprehensive Roadmap of AI, ML, DS, DA & DSA.pdfepsilonice
This outlines a comprehensive roadmap for mastering artificial intelligence, machine learning, data science, data analysis, and data structures and algorithms, guiding learners from beginner to advanced levels by building upon foundational Python knowledge.
delta airlines new york office (Airwayscityoffice)jamespromind
Visit the Delta Airlines New York Office for personalized assistance with your travel plans. The experienced team offers guidance on ticket changes, flight delays, and more. It’s a helpful resource for those needing support beyond the airport.
Ethical Frameworks for Trustworthy AI – Opportunities for Researchers in Huma...Karim Baïna
Artificial Intelligence (AI) is reshaping societies and raising complex ethical, legal, and geopolitical questions. This talk explores the foundations and limits of Trustworthy AI through the lens of global frameworks such as the EU’s HLEG guidelines, UNESCO’s human rights-based approach, OECD recommendations, and NIST’s taxonomy of AI security risks.
We analyze key principles like fairness, transparency, privacy, robustness, and accountability — not only as ideals, but in terms of their practical implementation and tensions. Special attention is given to real-world contexts such as Morocco’s deployment of 4,000 intelligent cameras and the country’s positioning in AI readiness indexes. These examples raise critical issues about surveillance, accountability, and ethical governance in the Global South.
Rather than relying on standardized terms or ethical "checklists", this presentation advocates for a grounded, interdisciplinary, and context-aware approach to responsible AI — one that balances innovation with human rights, and technological ambition with social responsibility.
This rich Trustworthy and Responsible AI frameworks context is a serious opportunity for Human and Social Sciences Researchers : either operate as gatekeepers, reinforcing existing ethical constraints, or become revolutionaries, pioneering new paradigms that redefine how AI interacts with society, knowledge production, and policymaking ?
Understanding Large Language Model Hallucinations: Exploring Causes, Detectio...Tamanna36
This presentation delves into Large Language Model (LLM) hallucinations—incorrect or fabricated outputs that undermine reliability. It covers their causes (e.g., data limitations, transformer architecture), detection methods (like semantic entropy), prevention strategies (fine-tuning, RAG), and ethical concerns (misinformation, bias). The role of tokens and MLOps in managing hallucinations is explored, alongside the feasibility of hallucination-free LLMs. Designed for researchers, developers, and AI enthusiasts, it offers insights and practical approaches to enhance LLM accuracy and trustworthiness in critical applications like healthcare and legal systems.
5. • Hypergraphs: not straightforward to analyze
o complex representation
o lack tools
5/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
Only interactions
at the level of
nodes
1
2 3
4 5
7
6 1
2 3
4
7
5
6
Motivation for a New Tool
Motivation Structural Patterns GeneratorsDecomposition Conclusion
• Projection
o information loss
o no high-order level information
10. • 13 datasets from 6 domains
◦ Email: recipient addresses of an email
◦ Drug components: classes or substances within a single drug, listed
in the National Drug Code Directory
◦ Drug use: drugs used by a patient, reported to the Drug Abuse
Warning Network, before an emergency visit
◦ Online tags: tags in a question in Stack Exchange forums
◦ Online threads: users answering a question in Stack Exchange forums
◦ Coauthorship: coauthors of a publications
10/26Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
Motivation Structural Patterns GeneratorsDecomposition Conclusion
Real-word Datasets
11. 11/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
Structural Patterns
P1. Degree distribution: heavy-tailed
P2. Connected component: giant
P3. Clustering coefficient: high
P4. Effective diameter: small
P5. Singular value distribution: heavy-tailed
Motivation Structural Patterns GeneratorsDecomposition Conclusion
12. 12/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
P1+P5. Heavy-tailed Distributions
Abundant low-
degree nodes
A few high-
degree nodes
Degree Singular values
Motivation Structural Patterns GeneratorsDecomposition Conclusion
Degree and singular-value distributions are heavy-tailed
J. Leskovec, J. Kleinberg, Cornell, C. Faloutsos . 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD
13. 13/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
Motivation Structural Patterns GeneratorsDecomposition Conclusion
Statistical tests: to confirm heavy-tailed distributions
Lilliefors Test(1)
• 𝐻0: distribution is exponential
• 𝐻1: distribution is not exponential
𝐻0 rejected at 2.5% significance level
Log likelihood ratio(2)
𝑟 = 𝑙𝑜𝑔
𝐿1
𝐿0
• 𝐿1 : likelihood of a heavy-tailed
distribution (power-law, log
normal)
• 𝐿0 : likelihood of the exponential
distribution
If 𝑟 > 0 : the distribution is more
likely to be heavy-tailed.
P1+P5. Heavy-tailed Distributions
(1) Hubert W Lilliefors. 1969. On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown. Journal of American Statististical
Association 64 (1969), 387–389.
(2) Jeff Alstott and Dietmar Plenz Bullmore. 2014. powerlaw: a Python package for analysis of heavy-tailed distributions. PloS one 9, 1 (2014)
14. 14/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
P2. Giant Connected Component
A large proportion of nodes are connected
Connected components
Proportionofnodes
Motivation Structural Patterns GeneratorsDecomposition Conclusion
J. Leskovec, J. Kleinberg, Cornell, C. Faloutsos . 2005. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In KDD
15. 15/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
P3. High Clustering Coefficient
Local clustering coefficient:
𝐶𝑖 =
2 ∗ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠 𝑎𝑡 𝑖
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑒𝑑𝑔𝑒𝑠 𝑎𝑡 𝑖
Clustering coefficient:
𝐶 =
1
|𝑉|
𝑖∈𝑉
𝐶𝑖
High likelihood of having links between “friends of friends”
Motivation Structural Patterns GeneratorsDecomposition Conclusion
Wedge at 𝑖: open triangle
𝑖
16. 16/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
P4. Small Effective Diameter
d = 8
90%
of
pairs
Most pairs of connected nodes: reachable within a small distance
Motivation Structural Patterns GeneratorsDecomposition Conclusion
https://web.stanford.edu/class/cs224w/handouts/02-gnp-smallworld.pdf
17. 17/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
Structural Patterns: Intuition
Real-world
graphs
P1. Degree distribution: heavy-tailed
P2. Connected component: giant
P3. Clustering coefficient: high
P4. Effective diameter: small
P5. Singular value distribution: heavy-tailed
Hypergraph Decomposed
graphs
All decomposition
levels
Real-world graph
Decomposition
Motivation Structural Patterns GeneratorsDecomposition Conclusion
J. Leskovec, J. Kleinberg, Cornell, C. Faloutsos. 2005. Graphs over Time:
Densification Laws, Shrinking Diameters and Possible Explanations. In KDD
20. 20/28Structural Patterns and Generative Models of Real-world Hypergraphs (by Manh Tuan Do)
Structural Patterns
Giant connected components vary among datasets
Motivation Structural Patterns GeneratorsDecomposition Conclusion
If there is a giant connected component
• High Clustering Coefficient
• Small Effective Diameter
Proportion of nodes in the
largest connected component
Small numbers indicate the
absence of a giant connected
component