The most common tool used here is called the Gelman-Rubin statistic, notated . is a dimensionless (i.e., can be compared across different datasets) measure of how well each chain in a set of chains captures the variation in the dataset.
To paraphrase Tolstoy: All convergent chains are alike—every divergent chain diverges in its own way. So, if we have good chains, they should be statistically indistinguishable, because they should all be good approximations of the same underlying distribution. Specifically, tests whether the variance of individual chains is what you'd expect given the variance of the entire sample. It's mathematically similar to how you would assess if ten different drugs had statistically significant effects, only here we want to see no effect.
If individual chains don't mix properly, then the combined variance is much larger than the individual chain variance. is the square root of that ratio, so it will be much larger than 1:
The most common tool used here is called the Gelman-Rubin statistic, notated . is a dimensionless (i.e., can be compared across different datasets) measure of how well each chain in a set of chains captures the variation in the dataset.
To paraphrase Tolstoy: All convergent chains are alike—every divergent chain diverges in its own way. So, if we have good chains, they should be statistically indistinguishable, because they should all be good approximations of the same underlying distribution. Specifically, tests whether the variance of individual chains is what you'd expect given the variance of the entire sample. It's mathematically similar to how you would assess if ten different drugs had statistically significant effects, only here we want to see no effect.
If individual chains don't mix properly, then the combined variance is much larger than the individual chain variance. is the square root of that ratio, so it will be much larger than 1: