There's a chicken-and-egg problem that rears its ugly head here. It's quite simple to fit an normal distribution to a dataset: just take the mean and standard deviation of the data, and the normal distribution with that mean and standard deviation will be the best fit. But if our dataset is clustered into different parts, we can't easily estimate the parameters of the different mixture model components without knowing which parts of the data fall into each category. But we can't estimate which data points fall into which category until we know what the categories look like!

This is a problem with a long history, and there are many approaches. Bayesian statistics provides an especially elegant approach that we'll be using, but I'd be remiss not to point out how this problem appears in a guise you may not recognize at first.

There's a chicken-and-egg problem that rears its ugly head here. It's quite simple to fit an normal distribution to a dataset: just take the mean and standard deviation of the data, and the normal distribution with that mean and standard deviation will be the best fit. But if our dataset is clustered into different parts, we can't easily estimate the parameters of the different mixture model components without knowing which parts of the data fall into each category. But we can't estimate which data points fall into which category until we know what the categories look like!

This is a problem with a long history, and there are many approaches. Bayesian statistics provides an especially elegant approach that we'll be using, but I'd be remiss not to point out how this problem appears in a guise you may not recognize at first.