Week 12
The Erdös-Rényi model | The infinite relational model
The Erdös-Rényi model | The infinite relational model
Application in many fields: social graphs, biology, logistic, ...
Various purposes: understanding structures, identify clusters, link prediction
Set of nodes V and set of edges E
Graph G = { V, E }
Adjacency matrix X
Let's study a simple model for random graphs!
How does nodes connect to each other?
About the likelihood
Bernouilli distribution:
→ Based on summary statistics only
About the prior
About the posterior
→ Beta-Binomial conjugate model
About the predictive posterior
→ Used for checking by comparing network stats of samples with actual data
Nodes = members of the karate club
Edge = interaction between two members
Does it follow a random graph model?
Let's fit our Erdös-Rényi model!
Average ϕ = 0.14
Statistics from posterior predictive for checking the random network assumption
Sample ϕ values from the posterior predictive distribution
→ generate adjacency matrices → compute network statistics.
This karate club is not a random network... Let's find another model!
Nodes are not randomly connected anymore → we have clusters
Parameter K, the number of clusters
Latent variable z that represents the cluster assignment of nodes
Link probability for two nodes from two different clusters k and l
Each cluster follows an Erdös-Rényi model (random clusters).
About the likelihood
For each Xij, we recognize a Bernoulli distribution:
Indeed:
About the prior
Distribution of latent variable z:
with the hyperparameter α to control cluster weights
How do we choose hyperparameters K and α?
Stochastic Block Model = parametric model
In practice, we know about K when we explore the data...
Idea of the Infinite Relational Model [IRM]
Non-parametric version of the Stochastic Block Model
where K → ∞
Back to the distribution of z...
From label to partition distribution
z = [ 1, 2, 1, 3 ] and z' = [ 3, 1, 3, 2 ] describe the same partitioning
Instead of focusing on the cluster labels, we only consider whether a cluster is empty or not.
With K labels:
First cluster: K possible labels
Second cluster: K-1 possible labels
For the same partitioning, the number of possible configurations is
Let's develop the probability of cluster partitions:
We define the following parameter
A = Kα
If K → ∞ and α → 0 :
As a conclusion, we obtain the Chinese Restaurant Process formulation.
INTERPRETATION
"Rich gets richer": the probability to start a new cluster decreases as the number of nodes N increases
The parameter A controls the rate at which new clusters start to fill in
Larger A = more likely to join an empty table e.g. start a new cluster
Same link probability and likelihood as the Stochastic Block model:
About the Prior
About the Posterior
About the Predictive Posterior → Gibbs sampling with S samples
Recall of Gibbs sampling
The distribution of K, number of clusters, for each sample from the Gibbs sampling
Checking the Infinite Relational model fit with network statistics
As we did previously for the random graph model fit
Visualize the best partition, e.g. with the highest posterior probability
Partition with K = 6 clusters