Summary from last time:

- The Sample Mean - Numerical measure
of the average or most probable value in some distribution. Can
be measured for any distribution knowing the mean value alone for
some sample is not very meaningful.
- The Sample Distribution - Plot of the
frequency of occurrence of ranges of data values in the sample.
The distribution needs to be represented by a reasonable number of
data intervals (counting in bins).
Refer to the
Rainfall Distribution example or
for another example of histograms and distributions go
here
- The Sample Dispersion - Numerical
measure of the range of the data about the mean value. Defined
such that +/- 1 dispersion unit contains 68% of the sample,
+/- 2 dispersion units contains 95% and +/- 3 dispersion units
contains 99.7%. This is schematically shown below:
Refer to document on dispersions for more detail.

For instance:

- The Probability that some event will be greater than 0 dispersion units above the mean is 50%
- The Probability that some event will be greater than 1 dispersion units above the mean is 15%
- The Probability that some event will be greater than 2 dispersion units above the mean is 2%
- The Probability that some event will be greater than
3 dispersion units above the mean is 0.1% (1 in 1000)

The calculation of dispersion in a distribution is very important because it represents a uniform way to determine probabilities and therefore to determine if some event in the data is expected (i.e. probable) or is significantly different than expected (i.e. improbable).

Seattle | Eugene |
---|---|

## mean = 51.5 inches | ## mean = 39.5 inches |

## dispersion = 8.5 | ## dispersion = 7.0 |

On average, does it rain significantly more in Eugene than Seattle?

Here is the wrong way to do this problem:

- If you follow the procedure before, you would note the difference
in mean rainfall between Seattle and Eugene is 12 inches.

12 inches is 12/8 = 1.5 dispersion units and therefore not significant.

But this is not the correct procedure to use when comparing two
separate distributions.
*It is only the correct procedure to use
when comparing one data point to the rest of the same distribution.
*

Seattle | Eugene |
---|---|

## mean = 51.5 inches | ## mean = 39.5 inches |

## dispersion = 8.1 | ## dispersion = 7.0 |

## N = 25 | ## N = 25 |

## error in mean = 8.1/5 | ## error in mean = 7.0/5 |

## error in mean = 1.6 | ## error in mean = 1.4 |

The difference in mean rainfall between Seattle and Eugene is (51.5 - 39.5) = 12 inches which is 12/1.6 = 7.5 dispersion units difference in the mean value.

Thus there is a highly significant difference in the mean annual rainfall between Eugene and Seattle.

Note this method is only an approximation. A more exact and proper way to compare two sample means will be given later.

Another way to look at this rainfall comparison is as follows:

We have already determined that 65 inches is not a significant amount of rainfall in Eugene compared to the normal value of 51.5 inches. Would 65 inches be a significant amount of rain in Seattle?

For the case of Seattle, 65 inches is 65-39.5 = 26.5 inches above normal. The dispersion in the Seattle data is 7 inches and so 26.5 inches is 26.5/7 = 3.8 dispersion units above the mean. This is highly significant which again reinforces the notion that there is a significant difference in mean rainfall between

Eugene and Seattle (note also this difference in community web pages).