Summary from last time:

- The Sample Mean - Numerical measure
of the average or most probable value in some distribution. Can
be measured for any distribution knowing the mean value alone for
some sample is not very meaningful.
- The Sample Distribution - Plot of the
frequency of occurrence of ranges of data values in the sample.
The distribution needs to be represented by a reasonable number of
data intervals (counting in bins).
Refer to the
Rainfall Distribution example or
for another example of histograms and distributions go
here
- The Sample Dispersion - Numerical
measure of the range of the data about the mean value. Defined
such that +/- 1 dispersion unit contains 68% of the sample,
+/- 2 dispersion units contains 95% and +/- 3 dispersion units
contains 99.7%. This is schematically shown below:
Refer to document on dispersions for more detail.

For instance:

- The Probability that some event will be greater than 0 dispersion units above the mean is 50%
- The Probability that some event will be greater than 1 dispersion units above the mean is 15%
- The Probability that some event will be greater than 2 dispersion units above the mean is 2%
- The Probability that some event will be greater than
3 dispersion units above the mean is 0.1% (1 in 1000)

The calculation of dispersion in a distribution is very important because it represents a uniform way to determine probabilities and therefore to determine if some event in the data is expected (i.e. probable) or is significantly different than expected (i.e. improbable).

For the last 25 years the mean annual rain in Eugene is 51.5 inches with a dispersion of 8 inches.

During this same period, the mean annual rain in Seattle was 39.5 inches with a dispersion of 7 inches.

On average, does it rain significantly more in Eugene than Seattle?

Here is the wrong way to do this problem:

- If you follow the procedure before, you would note the difference
in mean rainfall between Seattle and Eugene is 12 inches. 12 inches
is 12/8 = 1.5 dispersion units and therefore not significant.

But this is not the correct procedure to use when comparing two separate distributions. It is only the correct procedure to use when comparing one data point to the rest of the same distribution.

In this example, the number of data points is 25 and the square root of 25 is 5. Hence, for Eugene, the error in the mean value of 51.5 inches is 8/5 = 1.6 inches.

The difference in mean rainfall between Seattle and Eugene is 12 inches which is 12/1.6 = 7.5 dispersion units.

Thus there is a highly significant difference in the mean annual rainfall between Eugene and Seattle.

Note this method is only an approximation. A more exact and proper way to compare two sample means will be given later.

Another way to look at this rainfall comparison is as follows:

We have already determined that 65 inches is not a significant amount of rainfall in Eugene compared to the normal value of 51.5 inches. Would 65 inches be a significant amount of rain in Seattle?

For the case of Seattle, 65 inches is 65-39.5 = 26.5 inches above normal. The dispersion in the Seattle data is 7 inches and so 26.5 inches is 26.5/7 = 3.8 dispersion units above the mean. This is highly significant which again reinforces the notion that there is a significant difference in mean rainfall between

Eugene and Seattle (note also this difference in community web pages).