This article is the second in a series in which we discuss Markov Chain Monte Carlo techniques.
In the previous article we focused on understanding what Markov Chain Monte Carlo is and why it works. Here we look at Gibbs sampling, a Markov Chain Monte Carlo technique that can be directly applied to many modelling problems.
We saw that the key to this approach was in choosing a random walk that visited each outcome as frequently as the model suggested. By recording the position of the random walk at consecutive time intervals, we could generate a set of simulations that looked like a large sample from the model.
For example, a model might prescribe that two outcomes are possible, “A” and “B” say. In this case, if each outcome occurred with a probability of 50% then an appropriate random walk would transition between “A” and “B”, spending half of the time at each outcome.
Furthermore, by recording the positions of the walk we could get a large sample in which approximately half of the records would have outcome “A” and the other half outcome “B”.
Ultimately, we saw that a particular type of random walk was useful in this regard. In particular, we found that if the invariant distribution of a Markov Chain is equal to the modelled distribution, it would (aside from some unusual exceptions) visit each position with the desired long-run frequency.
The objective of the previous article was to ensure that we were comfortable with this key result, as it is the reason why all MCMC methods work. However, alone this simply presents us with a new problem:
How do I find a Markov Chain with invariant distribution equal to my modelled distribution?
The purpose of this article is to describe a technique that can be used to answer this question.
Please note, understanding the previous article is important for understanding what is discussed here. As a result, you may find it helpful to re-read or skim the previous post.
Narrowing Down our Search
After the last article, identifying the Markov Chain appropriate to your model may have felt like finding the proverbial needle in the haystack.
There is a whole world of possible Markov Chains out there and naively picking transition probabilities until we find the ones with the right invariant distribution is likely to end in frustration.
To make this search easier, MCMC techniques usually consider only the subset of Markov Chains that have a particular property. This property is known as “detailed balance”.
Detailed balance narrows our focus to those transition probabilities that, when applied to the modelled distribution, result in no net transfer of mass between each pair of outcomes.
This property is only subtly different to how we describe the effect of the transition probabilities on the invariant distribution. If you recall, the invariant distribution is the one for which the net transfer of mass is zero. Hence, detailed balance is similar but applies at a more granular level.
Clearly, as the total net transfer of mass between every outcome is the aggregate of the transfers between each pair of outcomes, it’s easy to see that if a distribution satisfies detailed balance it is also an invariant distribution.
Detailed balance is more formally defined by the following equation. In particular, for all “x” and “y” the following equation must be true:
Here, the left-hand side represents the amount of mass that is transferred from outcome x to outcome y, while the right-hand side represents the amount transferred from y to x.
The Gibbs sampling algorithm is one MCMC technique that leverages detailed balance in order to produce a Markov Chain with the desired invariant distribution.
The Gibbs Sampling Algorithm
Often MCMC is used when we want to simulate from a multivariate distribution. In these cases, when the conditional distribution of each variable given the other variables is known, Gibbs sampling can be used.
The Gibbs sampling algorithm is best understood by considering how it produces the next step in a Markov Chain. The animation below shows how this is done when Gibbs sampling is used to simulate from a bivariate Normal distribution.
In the animation the leading black dot represents the latest position of the random walk. In this case the random walk starts at the coordinate (0,0). To produce the next position, Gibbs sampling involves picking a new value for the x-coordinate and, afterwards, a new value for the y-coordinate.
As was alluded to above, the x-coordinate is sampled from the conditional distribution of x given the latest value of y and vice versa for the y-coordinate.
After each coordinate has been updated, the position is recorded as the next step in the walk. In this case, the second position of the walk is at (-0.19,-0.41).
To give you some confidence that the simulations generated by Gibbs sampling are appropriate, I’ve plotted the simulations on top of some bivariate Normal contours. As you can see, the spread of the black dots resembles the shape of the contours. Furthermore, for the reasons discussed in the last article, the resemblance becomes closer as the number of simulations increases.
This method of generating a Markov Chain may seem unfamiliar. In particular, you may be more familiar with having a set of transition probabilities that fully describe the probability of moving to each position given the current one.
In this case, however, we only know the transition probabilities corresponding to the sub-steps involved in updating each coordinate. Nevertheless, as the next step in the walk only depends on its current position, this process does generate a Markov Chain.
The key to the Gibbs sampler is that the updates to each coordinate satisfy detailed balance. To see this, let’s first consider the probability of moving from the coordinate (x, y) to (z, y), after an update to X:
In words, the transition probability from (x, y) to (z, y), given an update to X, simply follows from the conditional distribution of X given Y. Furthermore, from the definition of conditional probability, this transition probability can be written as:
Note that in the equation above and the ones below I write, for example, instead of . Both of these are supposed to denote the same thing, the probability that X is equal to x and Y is equal to y, however I’ve removed some bits to make the notation less clunky.
Inserting the modelled joint distribution and our final representation of the transition probabilities into both sides of the detailed balance equation we get:
Clearly both sides are equal, no matter what the value of x, y, or z.
While this shows that an update to the x-coordinate satisfies detailed balance, we can use this logic to show the same thing for the y-coordinate and/or given any number of variables.
Furthermore, as the update to each coordinate satisfies detailed balance, so does a Gibbs step. In particular, a Gibbs step is the result of updating every coordinate and because no mass is transferred while updating each coordinate, neither is any mass transferred once all of the coordinates are updated.
Thankfully, that’s all there is to Gibbs sampling.
In this article we discussed the Gibbs sampling algorithm, a Markov Chain Monte Carlo technique.
We saw that Gibbs sampling was particularly useful when we wanted to simulate from a multivariate distribution. In particular, the algorithm produces a new simulation by updating each modelled variable in turn and recording the outcome after all of the variables are updated.
The reason why Gibbs sampling works is that the updating process defines a Markov Chain for which the modelled distribution is the invariant distribution.
In subsequent articles we will cover other MCMC techniques but if you have any questions on this article please leave a comment below.