ne.gif (2791 bytes)     NE582 Monte Carlo
Return to Course Outline

 

Lesson 2 Statistical formulae

Required reading: pp. 313-321 of text

There are three statistical formulae that we will be using over and over in this class: 
  1. Our estimate of the expected value, 
  2. Our estimate of the variance of the sample.
  3. Our estimate of the variance of the expected value.

Monte Carlo as a stream of estimates

Our basic view of a Monte Carlo process (properly designed to deliver an unbiased estimate of a desired physical or mathematical "effect of interest") is a black box that has a stream of random numbers as input and a stream of estimates of the effect of interest:

wpeB.gif (2074 bytes)








As we saw in our previous FORTRAN example (estimation of pi.gif (868 bytes)), sometimes the estimates can be quite approximate, but with a long enough stream of wpeC.gif (877 bytes), we can get a good estimate.

The three formulae that we will develop here will help us gather important information from the estimates.

Estimate of the expected value, xhat.gif (870 bytes)

The first, and most important, deals with the how we gather from the stream of estimates the BEST POSSIBLE estimate of the expected value.  The resulting formula for xhat.gif (870 bytes) is:

wpeE.gif (1191 bytes)        (Eq. 2-1)








Thus, our best estimate is the unweighted average of the individual estimates.  This is not surprising, of course. 
 
 

Let's compare this with two other situations:

    Choosing from a continuous distribution, f(x), over a range (a,b) (i.e., x=a to x=b).

    Choosing from a discrete distribution, where you can only choose from M values, each of which has a probability of wpeD.gif (895 bytes)(e.g., throwing a die with 6 sides).

For a continuous distribution, the true mean, xbar.gif (865 bytes), is found from:

wpeE.gif (1290 bytes)

where we have assumed that f(x) is a true distribution, obeying the following:

wpe10.gif (1051 bytes) in the range (a,b) and

wpeF.gif (1245 bytes)

For the discrete distribution, the corresponding definition for xhat.gif (870 bytes) is:

wpe13.gif (1174 bytes)

and we, again, assume that the wpeD.gif (895 bytes) obey the basic requirements for a probability distribution, in this case,

wpe16.gif (972 bytes) for i=1,2,..M

wpe14.gif (1122 bytes)


Example:  For our example of finding pi.gif (868 bytes), we were dealing with a binomial distribution (i.e., two possible outcomes):

Outcome 1 = Hit the circle: wpe18.gif (1304 bytes)

Outcome 2 = Miss the circle: wpe22.gif (1370 bytes)

Therefore, the expected value is:

wpe28.gif (1809 bytes)


Included in the text is a simple derivation that shows that Eq. 2.1 is an unbiased estimate of xbar.gif (865 bytes), given that the individual estimates themselves are unbiased. 

Estimate of the variance of the sample, wpe3E.gif (1005 bytes)

As you know from statistics, the variance of a sample is the expected value of the squared error.  It is a measure of the amount of variation we should expect from samples.  The formulae, for the various distributions we have considered, are:

For continuous distribution: wpe2C.gif (1513 bytes)

For a discrete distribution: wpe2E.gif (1417 bytes)

For a MC sample: wpe38.gif (1521 bytes) (Eq. 2-2)

Note that this last one has some features that need explaining:

    It uses wpe31.gif (892 bytes) instead of wpe34.gif (890 bytes).  This is because we are getting an estimate of the variance.

    The argument is wpe39.gif (880 bytes) instead of x.  This is to emphasize that we are discussing the variation expected from individual trials.

    The difference that is squared is the difference from xhat.gif (870 bytes) instead of from xbar.gif (865 bytes).  This is because we do not know xbar.gif (865 bytes), just our estimate for it.

    It divides by N-1 instead of N, which might be expected.  This is basically due to the approximation in #3.  I have had statisticians try (and fail) to explain that it has something to do with the loss of a degree of freedom when you use xhat.gif (870 bytes) for xbar.gif (865 bytes)

An interesting derivation from the book is the one from the text's equations (7-94) through (7-99); it shows that Eq. 2-2 is an unbiased estimate of the true variance of the sample. 
 

Simplified calculational formula

The problem with using Eq. 2-2 is that you cannot begin to use it until you know , which would mean that you have to save all of the estimates, .  Fortunately, Eq. 2-2 can be reduced to a simpler form:
The advantage to this, of course, is that the individual estimates can contribute to the two running summations and then be discarded.

Standard Deviation

As you will remember, the square root of the variance is the standard deviation, which gives a measure of the variation expected in the individual xsubi.gif (877 bytes).

Example:  Back to our example of finding pi.gif (868 bytes), using the probabilities from the previous example, the associated variance of the sample would be:

Outcome 1 = Hit the circle: wpe18.gif (1304 bytes)

Outcome 2 = Miss the circle: wpe22.gif (1370 bytes)

wpe11.gif (2094 bytes)

wpe17.gif (1310 bytes)


Estimate of the standard deviation of the mean, wpe3D.gif (982 bytes)

The final formula is for the variance of the mean, wpe2F.gif (992 bytes).  (The student should be careful to avoid confusing this with the variance of the sample itself (especially on a test)).  The variance of the mean refers to the expected amount of variation we should expect from various estimates,xhat.gif (870 bytes), we might make of xbar.gif (865 bytes).

We should be careful here.  Since it is our practice to run a Monte Carlo calculation, consisting of N samples, only once, we need to realize that what we are talking about here involves MANY runs of N samples each.  If we run a SERIES of Monte Carlo calculations, EACH of which involve N estimates, and EACH of which would give us an estimate xhat.gif (870 bytes) of xbar.gif (865 bytes), then wpe2F.gif (992 bytes) involves the variation that we would expect in these series of estimates, xhat.gif (870 bytes).  Be sure you understand the difference between wpe2F.gif (992 bytes) and varx.gif (989 bytes).
 
 

There is a compact derivation in the text's Eq. 7-82 to Eq. 7-92, which shows that:

wpe32.gif (1246 bytes)

The square root of this variance is, again, the standard deviation, this time the standard deviation of the mean:

wpe36.gif (1754 bytes)



 
 
 
 
 


Example:  Back to our example of finding pi.gif (868 bytes), using the probabilities from the previous example, the standard deviation of the mean for a sample of N=10,000 would be:

Outcome 1 = Hit the circle: wpe18.gif (1304 bytes)

Outcome 2 = Miss the circle: wpe22.gif (1370 bytes)

wpe3F.gif (1777 bytes)








In actual Monte Carlo practice, as we have seen, we do not know wpe37.gif (968 bytes), but only our estimate of it, wpe38.gif (988 bytes).  Because of this, we have a slightly different notation and formula:

wpe39.gif (1279 bytes) (Eq. 2-3)

Using our previously determined formula gives us:




This value wpe3A.gif (982 bytes) is our estimate of the standard deviation of the mean wpe3B.gif (977 bytes).   It is the second most important output result of a Monte Carlo calculation (the most important being our estimate of the mean itself, xhat.gif (870 bytes)), giving us a measure of the confidence we can have in our estimate, xhat.gif (870 bytes).  The way that Monte Carlo results are generally reported (in terms of our notation) is:

wpe3C.gif (1048 bytes)


 




Return to Course Outline                                                                                                                   © 2002 by Ronald E. Pevey.  All rights reserved.