Um, Ackshually...

  • You're computing CIs as Y±1.96SE\overline{Y} \pm 1.96SE, where SE=σsamplenSE = \frac{\sigma_{sample}}{\sqrt{n}}.
  • This isn't technically correct: it underestimates variation because it assumes σsample\sigma_{sample} is σ\sigma
  • For large values of nn (say, over 100 or over 200), this difference barely matters
  • The proper solution to this is:
    • If our population is Gaussian (or close), but we don't know the actual σ\sigma, we should use the Student's t distribution.
    • If our population is distributed weirdly, we can use techniques like bootstrapping (sampling with replacement from our data to generate fake datasets)

Extra credit (no, not really) for figuring out how to do either the Student's t method or the bootstrapping method for computing confidence intervals in R.

plot

Um, Ackshually...

  • You're computing CIs as Y±1.96SE\overline{Y} \pm 1.96SE, where SE=σsamplenSE = \frac{\sigma_{sample}}{\sqrt{n}}.
  • This isn't technically correct: it underestimates variation because it assumes σsample\sigma_{sample} is σ\sigma
  • For large values of nn (say, over 100 or over 200), this difference barely matters
  • The proper solution to this is:
    • If our population is Gaussian (or close), but we don't know the actual σ\sigma, we should use the Student's t distribution.
    • If our population is distributed weirdly, we can use techniques like bootstrapping (sampling with replacement from our data to generate fake datasets)

Extra credit (no, not really) for figuring out how to do either the Student's t method or the bootstrapping method for computing confidence intervals in R.

plot