An (1-alpha) confidence interval for the mean is an interval
(a, b) such that the mean of the population, µ, is inside
it (i.e. a < µ < b) with (1-alpha) "confidence."
The endpoints of the interval, a and b, take values that depend on the random sample selected the confidence interval also depends on the random sample selected.
To understand better the concept of confidence intervals there
is an excellent simulation at Confidence
This simulation shows that if one calculates 100 confidence intervals based on 100 random samples on the average 95 of them would contain the true value of the mean µ.
We let alpha = 0.05 to construct, as an example, a 95% confidence interval. We use the most popular formula for this suitable for sample sizes larger than 30, namely,
Confidence Interval for the Mean for Large Samples
So if for the selected sample the sample size is 36 with mean of 5 and standard deviation of 2 then the 95% is given by:
So the 95% confidence for the mean using this formula is (4.35, 5.65). Notice, that if we select another random sample of size 36 its mean and standard deviation would be different so we would obtain a different confidence interval.
We remark that if sigma is known instead of estimated by s
we can substitute sigma for s in the formula to get a correct
confidence interval for any sample size.
that is, we don't need the assumption that n is larger than 30.
Notice, that as seen bellow in the Normal
Calculator output the reason we choose -1.96 and 1.96 is because
the area between them under the normal curve is 0.95 = 95% - the level of confidence..
If one wanted a 99% confidence interval we would use -2.58
and 2.58 instead of -1.96 and 1.96 because as seen below in the
Normal Calculator output the area under the normal curve between these numbers is 0.99 = 99%.
Confidence Interval for the Mean for Small Samples
Here, we present a cofidence interval for the mean - that uses the t-distribution - that can be used when the sample size is smaller than 30. . For this method to be valid it is very important that the population be normal, that is, if the population is skewed or has outliers the calculation below will yield an incurate confidence interval. The reason that we need normality here is that n being less than 30 is too small for the Central Limit Theorem to do its magic of makiang the sample mean nearly normal. On the other hand, the reason that we don't need normality for the large sample method is that n being larger than 30 is enough for the sample mean being nearly normal.
What to do if the population is not normal and the sample is
smaller than thirty?
Some recommend that one uses a bootstrap confidence interval. with samples as small as ten.
A t-distribution canfidence interval is computed using the formula:
has DF = n -1 and comes from this table:
For example if DF = 18 then:
A better way to calculate this quantity is to use this t-distribution calculator.
See the calculation below of t(0.025) for 18 degrees of freedom.
For an example of the application of this confidence interval formula we use the data:
Data from Sample A: 9.22, 8.22, 9.72, 6.6,11.32, 6.48, 4.6, 7.16, 8.78, 14.1, 5.62, 8.84, 8.16, 7.94, 7.74, 3.88, 8.56, 12.46, 10
The computations using the formula are:
Since for these data the mean is 8.42 and the standard deviation
To see how to compute this confidence interval using JMP see Confidence Interval for the Mean
Is this confidence interval valid? Well, remember that for
it to be valid the population has to be normal.
Is this a reasonable assumption about the population?
We can check this by doing a normal plot of the data:
Thus, the normality assumotion seems to hold so the t-distribution confidence interval also seems to be valid.
To calculate another type of confidence interval for the mean
called a bootstrap confidence interval go to
Obtaining a Bootstrap Confidence Interval for the Mean with JMP. Our interval can be obtained from the following JMP output.
For a 95% confidencel interval the lower limit is the 2.5 percentile, that is, 7.267 and the upper limit is the 97.5 percentile, that is, 97.5.
Thus, the 95% bootstrap confidence interval is:
Recall that the 95% t-distribution confidence interval was:
Thus, they are very similar. This is remarkable since the assumption (i.e. normality) that is needed for the t-distribution confidence interval to be valid is not needed for the bootstrap confidence interval to be valid.
This strongly suggest that when one is trying to estimate the mean of a population using a random sample one should calculate both confidence intervals.