This article is the second in a series in which I discuss techniques that can be used when presented with incomplete data. In this article I introduce the likelihood function as a way of measuring goodness of fit.
Please note that this article is going to get technical and those without a numerical background may need to have Google on standby, a quiet room to sit in, and a strong coffee prepared. However, we all have to start somewhere and so I encourage you to stick with it, re-read and post questions in the comments section.
In the previous article we discussed a problem involving incomplete data. In the problem we were tasked with estimating future claim payments given data on historical payments. Furthermore, as a first step towards producing an estimate, we came up with a model that described the occurrence and amount of payments.
We assumed that the occurrence of payments could be described by a state transition model. The transition diagram associated with the model is shown below.
Given this model, the reason the data is incomplete is that it doesn’t conclusively tell us which of the states a claim is in. In particular, our data only tells us whether a payment is made or not. Consequently, if a payment isn’t made in the most recent period a claim could either be in the “Open with no payment” state or the “Closed with no payment” state.
In addition to the data and the model, I hinted at an “intuitive” approach you could use to estimate parameters. In this approach pseudo-data is generated from the model based on different values for the parameters and the selected parameter values are those where the pseudo-data is most similar to the actual data.
To illustrate this approach, below I show the actual data (left) and the four sets of pseudo-data (right) from the last article. I have put a star next to the parameter values that I thought generated the most similar pseudo-data.
The problem with this approach is that computers do not have an intuitive notion of similarity. Instead, computers make decisions by comparing numbers. Therefore, in order to automate the process, we need a numerical measure of how good different parameter values are. In this article I present the likelihood function as this measure.
As it is boring to use the term “likelihood function” repeatedly, I will use “likelihood” for short. Also, measuring how good parameter values are is referred to as measuring the “goodness of fit” of the parameters. For the same reason I will use “goodness” as an abbreviation of “goodness of fit”.
Why use the Likelihood & What is it?
Why use the likelihood? For instance, you may already know other measures of goodness and so using the likelihood may seem like reinventing the wheel.
Although the likelihood is only one of many measures, it is a good place to start as it represents a natural progression from our intuitive assessment of goodness. To show this, let’s first look at the intuitive approach again.
In the intuitive approach, goodness is assessed by judging how similar a plot of the pseudo-data is to a plot of the observed data.
Bringing rigour to this approach is difficult, as people will have different opinions on how similar two plots are. However, when the plots are identical we can be sure that everyone will consider them to be highly similar (I would hope!).
The likelihood follows from this, as it measures goodness by calculating the probability of the model generating pseudo-data that is identical to the observed data.
In other words, rather than judging by eye how similar one set of pseudo-data is to the actual data, with the likelihood we can calculate the proportion of the time that the pseudo-data will be identical to the actual data.
Hopefully the link between the likelihood and our intuitive assessment is apparent but please leave a comment if you are unsure.
Now that we understand why we might use the likelihood, we are prepared to discuss how it is constructed. For this purpose we will take our state transition model as a case study. I will first walk through the relatively simple task of constructing the likelihood imagining we had complete data. Following this I move on to our actual problem involving incomplete data.
The Likelihood in the case of Complete Data
In the case of our claim transition model, a complete dataset would include the state of each claim at each time after it has been reported. In order to visualise this it helps to use a tree diagram, as below (please click).
The diagram shows every possible combination of states a claim can be in after being reported, up until the fourth period. The states a claim can be in at each time are represented by nodes (circles). Furthermore, the nodes are arranged in rows as each row represents the time at which the claim is in the given state. Finally, the arrows between nodes represent how the state of the claim can change between times.
There are three different types of node to represent the three possible states of the claim. A claim can be closed (the node labelled with a “C”), it can be open at the end of a period when no payment occurs (“N”), or it can be open when a payment does occur (“P”). The state of the claim at reporting (“Time 0”) is represented by a red dot; other than being open, the state at reporting is not important in our model.
We can use the diagram to represent the data collected on a single claim after observing it for four periods. To illustrate this I have highlighted one “path” on the diagram.
For the highlighted claim, a payment was made in the first period and so it transitioned to the “P” node. No payments were made in the second and third periods, although the claim remained open, so it transitioned to the “N” node. Finally, in the last period it closed and so it transitioned to the “C” node. Although we don’t see the transitions beyond time 4, we know that a closed claim remains closed.
We can calculate the probability of a claim taking this path by multiplying together the probabilities of moving between each of the states, read from the state transition diagram. In particular, for the highlighted claim, the first transition has a probability of “(1-p)q”, the second transition a probability of “(1-p)(1-q)”, the third “(1-p)(1-q)”, and the fourth “p”. Consequently, the probability of a claim taking this path is “(1-p)q(1-p)(1-q)(1-p)(1-q)p”.
If we had only observed this one claim then this probability, considered as a function of “p” and “q”, is equal to the likelihood.
The Likelihood in the case of Incomplete Data
The only thing that differs between the cases of complete and incomplete data is the data that is recorded. The model is the same in both cases and each claim can still go down any of the paths. However, now our data only records whether a payment has been made or not. In particular, if no payment has been made, the claim could either be closed (“C”) or open (“N”).
As in the case of complete data, we can use the tree diagram to illustrate what incomplete data looks like. Let’s consider the paths relevant to one claim, the same claim we considered in the case of complete data.
As you can see, instead of highlighting one path, this time many paths are highlighted. This is because there are four possible paths the claim could have taken that would have resulted in a payment in the first period followed by three periods with no payments.
The probability of a claim taking a particular path is again the product of the probability of transitioning between each of the nodes. However, as the observed data would have been generated if any of these paths were taken, we must now sum the probabilities of going down each path.
This principle is true for any model. The likelihood in the presence of incomplete data is the sum of the complete data likelihood over all possible states of the unobserved data.
A Brief Comparison of the Complete & Incomplete Likelihoods
Now that we can construct the incomplete likelihood, you may be wondering how it compares to the complete likelihood.
To make this comparison, let’s consider the incomplete likelihood generated by the claim highlighted in our tree diagram and each of the complete likelihoods generated by a claim taking one of the possible paths our claim could have taken. In particular, the claim could have closed in the second, third or fourth periods, or not at all. Hence, we can construct four complete likelihoods for comparison.
A plot of each of these is contained in the gallery below.
In these graphs, “q” and “p” are plotted on the horizontal and vertical axes. Furthermore, the coloured regions identify parameter values that have a similar likelihood. The likelihood is high for regions coloured in dark red and low for those in dark blue.
You should be able to see that the incomplete likelihood shares features with each of the complete likelihoods. In particular, the regions with a high incomplete likelihood are also high for at least one of the complete likelihoods.
This shouldn’t come as a surprise, as the incomplete likelihood is simply the sum of these complete likelihoods.
In addition, we can see that each complete likelihood has a larger dark blue region compared to the incomplete likelihood. Recalling that the likelihood is a measure of goodness of fit, this indicates that the the complete likelihood identifies a smaller set of good parameters compared to the incomplete likelihood (i.e. it is more precise about which parameters are good).
Again, this seems reasonable as, all else being equal, having more information on historical claims should enable us to be more precise about which parameter values are good and which are bad.
Although this is a contrived example, as insurers typically have data on thousands of claims, hopefully it gives you some comfort that the incomplete likelihood behaves in an intuitive way.
In this article I introduced the likelihood as a measure of goodness of fit. In addition, I showed why the likelihood is a natural progression from our intuitive assessment of goodness.
After introducing it, we considered how to construct the likelihood in the case of complete and incomplete data. I then briefly compared the complete and incomplete likelihood and showed that they behave in an intuitive way.
In subsequent articles we will discuss different techniques that use the likelihood to estimate values for the parameters.
I hope you found this article useful and if you have any comments or questions please leave them in the comments section below.