Performance Review
Before we cheat and look at the true posterior distribution to see how things went, it's important to think about how we would assess our sampling without that crutch. Right now, we have 10 different samples, which may or may not be correct. How can we figure out whether this succeeded?
Intuitively, we want all of our chains to have found "the right answer". This means that they should all agree with each other. We can test this by plotting a histogram or box plot of each chain and comparing.
This is pretty good: these distributions look quite similar. Due to the nature of sampling, values at the far left (which have low probability) are going to be the most variable between the chains, because it requires some luck to sample them.
Ideally we could figure out a way of doing this without having to eyeball plots, which feels subjective and prone to error. (This is especially true when the person eyeballing it is the same person who just ran the analysis and wants to see it succeed!) How can we evaluate our results in an objective, reproducible way?
Performance Review
Before we cheat and look at the true posterior distribution to see how things went, it's important to think about how we would assess our sampling without that crutch. Right now, we have 10 different samples, which may or may not be correct. How can we figure out whether this succeeded?
Intuitively, we want all of our chains to have found "the right answer". This means that they should all agree with each other. We can test this by plotting a histogram or box plot of each chain and comparing.
This is pretty good: these distributions look quite similar. Due to the nature of sampling, values at the far left (which have low probability) are going to be the most variable between the chains, because it requires some luck to sample them.
Ideally we could figure out a way of doing this without having to eyeball plots, which feels subjective and prone to error. (This is especially true when the person eyeballing it is the same person who just ran the analysis and wants to see it succeed!) How can we evaluate our results in an objective, reproducible way?