Week 12

The Erdös-Rényi model | The infinite relational model

Motivation for modelling graphs

Application in many fields: social graphs, biology, logistic, ...
Various purposes: understanding structures, identify clusters, link prediction

Notations of graph theory

Set of nodes V and set of edges E
Graph G = { V, E }
Adjacency matrix X

An undirected graph G with its adjacency matrix

Let's study a simple model for random graphs!

The Erdös-Rényi model

How does nodes connect to each other?

Key Equations

About the likelihood

Bernouilli distribution:

→ Based on summary statistics only

About the prior

About the posterior

→ Beta-Binomial conjugate model

About the predictive posterior

→ Used for checking by comparing network stats of samples with actual data

Example with the Zachary’s Karate Club dataset

Nodes = members of the karate club
Edge = interaction between two members

Does it follow a random graph model?

Let's fit our Erdös-Rényi model!

Average ϕ = 0.14

Statistics from posterior predictive for checking the random network assumption

Sample ϕ values from the posterior predictive distribution
→ generate adjacency matrices → compute network statistics.

This karate club is not a random network... Let's find another model!

The stochastic block model

Nodes are not randomly connected anymore → we have clusters

Parameter K, the number of clusters
Latent variable z that represents the cluster assignment of nodes

Link probability for two nodes from two different clusters k and l

Each cluster follows an Erdös-Rényi model (random clusters).

About the likelihood

For each Xij, we recognize a Bernoulli distribution:

Indeed:

About the prior

Distribution of latent variable z:

with the hyperparameter α to control cluster weights

Go to the proof of p(z) expression

How do we choose hyperparameters K and α?

The infinite relational model

Motivation

Stochastic Block Model = parametric model
In practice, we know about K when we explore the data...

Idea of the Infinite Relational Model [IRM]

Non-parametric version of the Stochastic Block Model
where K → ∞

Back to the distribution of z...

The Chinese Restaurant Process (CRP)

From label to partition distribution

z = [ 1, 2, 1, 3 ] and z' = [ 3, 1, 3, 2 ] describe the same partitioning

Instead of focusing on the cluster labels, we only consider whether a cluster is empty or not.

With K labels:

First cluster: K possible labels
Second cluster: K-1 possible labels

For the same partitioning, the number of possible configurations is

Let's develop the probability of cluster partitions:

We define the following parameter

A = Kα

If K → ∞ and α → 0 :

As a conclusion, we obtain the Chinese Restaurant Process formulation.

INTERPRETATION

"Rich gets richer": the probability to start a new cluster decreases as the number of nodes N increases
The parameter A controls the rate at which new clusters start to fill in
Larger A = more likely to join an empty table e.g. start a new cluster