Gaussian Process regression

Among other things nonparametric Bayesian methods can simplify predictive modelling projects. Here I’ll give a brief introduction to one of these methods, Gaussian Process regression, and how it can help us make inferences about complicated trends in a quick and consistent way.

Introduction

Just as the outcomes of a coin flip can be thought of as a random variable, so can a Gaussian Process. However, while a coin flip produces scalar-valued outcomes (tails = 0, heads = 1), stochastic processes generate functions as outcomes.

In the graph below you can see a number of univariate random functions. Each line is one function and represents one outcome of the Gaussian Process.

app2

As you can see, the outcomes of a Gaussian Process are quite varied: the light blue line is pretty straight; the blue, purple and teal lines have one inflection point; and the gold, red, and green lines have two inflection points.

What gives the Gaussian Process its name is that any collection of function values (e.g. f(1), f(5.5), f(-2.2)) follows a multivariate Gaussian distribution. Furthermore, just as the Gaussian distribution is parameterised by a mean and (co)variance, a Gaussian process is parameterised by a mean function and covariance function.

The mean function represents the long-run average outcome of the random functions, while the covariance function determines how points of the function will diverge from the mean and whether the divergence will be of the same degree and direction at different points.

The plot above has been annotated with the form of the mean and covariance functions (m and k) used to generate the lines. In particular, the mean is the zero function and the covariance function decays from one, when the two points are equal, to zero as the points become more distant.

These choices result in smooth functions, centred around zero, where nearby points on the x-axis have similar values and distant points are relatively independent. Furthermore, with this covariance function almost any smooth function is possible and the propensity for certain types of functions to occur can be changed by varying the covariance function’s parameters.

A key goal of regression is to learn something about an unknown function underlying a set of noisy observations. For example, in simple linear regression we investigate the straight line functions that best map values of the covariates to noisy observations of the response.

In Bayesian analysis, simple linear regression proceeds by placing a prior on the parameters of the linear regression model (the intercept, slope and noise parameters). The posterior distribution of the parameters, conditional on the data, is then used to determine which straight lines could have plausibly generated the data.

As you saw in the plot above, a Gaussian Process can be used as a prior on a much larger set of functions. Furthermore Gaussian processes are convenient priors as, with Gaussian distributed noise, the posterior distribution is also a Gaussian process for which the mean and covariance functions are relatively easy to determine.

In the example below I show how these facts allow you to produce fast and convincing analyses of some weather data.

Example

Here I’ve used a Gaussian Process regression model to make predictions about future temperatures.

Historical temperatures were sourced from the met office’s historical dataset for the Oxford weather station. A subset of the data and the Gaussian Process forecast are summarised in the chart below.

app1

Historical month-end measurements are displayed as black points, with the last reading being at the end of August 2017. After this time the smooth black line represents the mean posterior predicted outcome. Dark and light blue shading has been added to delimit one and two standard deviation prediction intervals.

In this case I’ve again used the zero function for the mean but for the covariance I’ve used a periodic function.

This particular covariance function has three free parameters: a, the degree to which the underlying trend deviates from the mean function; b, the duration between points that are highly correlated; and c, the degree to which nearby points are independent or not. The independent and identically distributed Gaussian noise results in one further parameter.

Depending on the dataset in question, parameters can be inferred from the data (e.g. using MCMC techniques) or set based on prior beliefs about the form of the trend. Here I have set “b” equal to 365 days and I have used maximum likelihood estimation to choose values for the other parameters.

The key takeaway from this, however, is to note that the inferences made by using Gaussian Process regression are reasonable—the periodic pattern observable in the historical data is reflected in the modelled predictions—and could easily be applied to other datasets.

Final Word

In this article I gave you a taste of nonparametric Bayesian methods by introducing Gaussian Process regression. In particular, I described the basic premise behind Gaussian Process regression and then showed you how it might be applied to a dataset with a periodic trend.

If you have any questions about Gaussian Process regression, please do not hesitate to leave a comment below. In future articles I’ll discuss other nonparametric Bayesian methods and will highlight some of the benefits and pitfalls associated with them.