|
Lesson 2 Statistical formulae
There are three statistical formulae that we will be using over and over
in this class:
-
Our estimate of the expected value,
-
Our estimate of the variance of the sample.
-
Our estimate of the variance of the expected value.
Monte Carlo as a stream of estimates
Our basic view of a Monte Carlo process (properly designed to deliver an
unbiased estimate of a desired physical or mathematical "effect of interest")
is a black box that has a stream of random numbers as input and a stream
of estimates of the effect of interest:

As we saw in our previous FORTRAN example (estimation of ),
sometimes the estimates can be quite approximate, but with a long enough
stream of ,
we can get a good estimate.
The three formulae that we will develop here will help us gather important
information from the estimates.
Estimate of the expected value, 
The first, and most important, deals with the how we gather from the stream
of estimates the BEST POSSIBLE estimate of the expected value. The
resulting formula for
is:
(Eq. 2-1)
Thus, our best estimate is the unweighted average of the individual
estimates. This is not surprising, of course.
Let's compare this with two other situations:
Choosing from a continuous distribution, f(x), over a range (a,b) (i.e.,
x=a to x=b).
Choosing from a discrete distribution, where you can only choose from
M values, each of which has a probability of (e.g.,
throwing a die with 6 sides).
For a continuous distribution, the true mean, ,
is found from:

where we have assumed that f(x) is a true distribution, obeying
the following:
in the range (a,b) and

For the discrete distribution, the corresponding definition for
is:

and we, again, assume that the
obey the basic requirements for a probability distribution, in this case,
for i=1,2,..M

Example: For our example of finding ,
we were dealing with a binomial distribution (i.e., two possible outcomes):
Outcome 1 = Hit the circle:
Outcome 2 = Miss the circle:
Therefore, the expected value is:

Included in the text is a simple derivation that shows that Eq. 2.1 is
an unbiased estimate of ,
given that the individual estimates themselves are unbiased.
Estimate of the variance of the sample, 
As you know from statistics, the variance of a sample is the expected value
of the squared error. It is a measure of the amount of variation
we should expect from samples. The formulae, for the various distributions
we have considered, are:
For continuous distribution:
For a discrete distribution:
For a MC sample:
(Eq. 2-2)
Note that this last one has some features that need explaining:
It uses
instead of .
This is because we are getting an estimate of the variance.
The argument is
instead of x. This is to emphasize that we are discussing the variation
expected from individual trials.
The difference that is squared is the difference from
instead of from .
This is because we do not know ,
just our estimate for it.
It divides by N-1 instead of N, which might be expected. This
is basically due to the approximation in #3. I have had statisticians
try (and fail) to explain that it has something to do with the loss of
a degree of freedom when you use
for .
An interesting derivation from the book is the one from the text's equations
(7-94) through (7-99); it shows that Eq. 2-2 is an unbiased estimate of
the true variance of the sample.
Simplified calculational formula
The problem with using Eq. 2-2 is that you cannot begin to use it until
you know , which would mean
that you have to save all of the estimates, .
Fortunately, Eq. 2-2 can be reduced to a simpler form:
The advantage to this, of course, is that the individual estimates can
contribute to the two running summations and then be discarded.
Standard Deviation
As you will remember, the square root of the variance is the standard
deviation, which gives a measure of the variation expected in the individual .
Example: Back to our example of finding ,
using the probabilities from the previous example, the associated variance
of the sample would be:
Outcome 1 = Hit the circle:
Outcome 2 = Miss the circle:

Estimate of the standard deviation of the mean, 
The final formula is for the variance of the mean, .
(The student should be careful to avoid confusing this with the variance
of the sample itself (especially on a test)). The variance of the
mean refers to the expected amount of variation we should expect from various
estimates, ,
we might make of .
We should be careful here. Since it is our practice to run a Monte
Carlo calculation, consisting of N samples, only once, we need to realize
that what we are talking about here involves MANY runs of N samples each.
If we run a SERIES of Monte Carlo calculations, EACH of which involve N
estimates, and EACH of which would give us an estimate
of ,
then
involves the variation that we would expect in these series of estimates, .
Be sure you understand the difference between
and .
There is a compact derivation in the text's Eq. 7-82 to Eq. 7-92, which
shows that:

The square root of this variance is, again, the standard deviation,
this time the standard deviation of the mean:

Example: Back to our example of finding ,
using the probabilities from the previous example, the standard deviation
of the mean for a sample of N=10,000 would be:
Outcome 1 = Hit the circle:
Outcome 2 = Miss the circle:

In actual Monte Carlo practice, as we have seen, we do not know ,
but only our estimate of it, .
Because of this, we have a slightly different notation and formula:
(Eq. 2-3)
Using our previously determined formula gives us:

This value
is our estimate of the standard deviation of the mean .
It is the second most important output result of a Monte Carlo calculation
(the most important being our estimate of the mean itself, ),
giving us a measure of the confidence we can have in our estimate, .
The way that Monte Carlo results are generally reported (in terms of our
notation) is:

|