In class exercise. Class do the following:

- Draw a number from the Sacred Box of Sampling in the front of the Lecture room
- Return to your seat with that number and note your seat number
- Be prepared to tell the instructor your number if your seat numbers is randomly called
- That is all

In the Sacred Box of Sampling there are 170 numbers which define this Intrinsic Distribution .

The point of the in class exercises is to demonstarte that a random sampling process done for an intrinsic distribution which is normally distributed (i.e. a bell curve) will provide a robust estimate of the mean and dispersion after just a small number of samples.. While this can be proved with calculus (and in statistics is known as the Central Limit Theorem), the in class example is the probably the best means of demonstrating this.

For this sample as a whole:

- The mean = 175
- The dispersion = 23

The point of the demo in the class is to see how close we can come to recovering this population mean and dispersion from the sample mean and dispersion.

Summary from last time:

- The Sample Mean - Numerical measure
of the average or most probable value in some distribution. Can
be measured for any distribution knowing the mean value alone for
some sample is not very meaningful.
- The Sample Distribution - Plot of the
frequency of occurrence of ranges of data values in the sample.
The distribution needs to be represented by a reasonable number of
data intervals (counting in bins).
Refer to the
Rainfall Distribution example or
for another example of histograms and distributions go
here
- The Sample Dispersion - Numerical
measure of the range of the data about the mean value. Defined
such that +/- 1 dispersion unit contains 68% of the sample,
+/- 2 dispersion units contains 95% and +/- 3 dispersion units
contains 99.7%. This is schematically shown below:
Refer to document on dispersions for more detail.

For instance:

- The Probability that some event will be greater than 0 dispersion units above the mean is 50%
- The Probability that some event will be greater than 1 dispersion units above the mean is 15%
- The Probability that some event will be greater than 2 dispersion units above the mean is 2%
- The Probability that some event will be greater than
3 dispersion units above the mean is 0.1% (1 in 1000)

The calculation of dispersion in a distribution is very important because it represents a uniform way to determine probabilities and therefore to determine if some event in the data is expected (i.e. probable) or is significantly different than expected (i.e. improbable).

Seattle | Eugene |
---|---|

## mean = 51.5 inches | ## mean = 39.5 inches |

## dispersion = 8.5 | ## dispersion = 7.0 |

On average, does it rain significantly more in Eugene than Seattle?

Here is the wrong way to do this problem:

- If you follow the procedure before, you would note the difference
in mean rainfall between Seattle and Eugene is 12 inches.

12 inches is 12/8 = 1.5 dispersion units and therefore not significant.

But this is not the correct procedure to use when comparing two
separate distributions.
*It is only the correct procedure to use
when comparing one data point to the rest of the same distribution.
*

Seattle | Eugene |
---|---|

## mean = 51.5 inches | ## mean = 39.5 inches |

## dispersion = 8.1 | ## dispersion = 7.0 |

## N = 25 | ## N = 25 |

## error in mean = 8.1/5 | ## error in mean = 7.0/5 |

## error in mean = 1.6 | ## error in mean = 1.4 |

The difference in mean rainfall between Seattle and Eugene is (51.5 - 39.5) = 12 inches which is 12/1.6 = 7.5 dispersion units difference in the mean value.

Thus there is a highly significant difference in the mean annual rainfall between Eugene and Seattle.

Note this method is only an approximation. A more exact and proper way to compare two sample means will be given later.

Another way to look at this rainfall comparison is as follows:

We have already determined that 65 inches is not a significant amount of rainfall in Eugene compared to the normal value of 51.5 inches. Would 65 inches be a significant amount of rain in Seattle?

For the case of Seattle, 65 inches is 65-39.5 = 26.5 inches above normal. The dispersion in the Seattle data is 7 inches and so 26.5 inches is 26.5/7 = 3.8 dispersion units above the mean. This is highly significant which again reinforces the notion that there is a significant difference in mean rainfall between

Eugene and Seattle (note also this difference in community web pages).

Comparing Two Sample Means - Find the difference of the two sample means in units of sample mean errors. This works as follows:

- Sample 1 has mean M1 and error in the mean E1
- Sample 2 has mean M2 and error in the mean E2
Difference in terms of signifance is:

Simple Approximation:

- If E1 and E2 are similar then use
(M1-M2)/1.5E1
- If E1 > 2*E2 then use (M1-M2)/E1

Let's no apply this principle to some real data. The actual salmon count data:

This distribution, defined by 44 points, has a mean of 358,000 salmon with a dispersion of 82,000 salmon. The error in the mean is 12,000 (82000/(square root of 44))

Points to note about the distribution:

- The dispersion is fairly large. Is this intrinsic to the population
or a reflection of measuring errors because salmon counting is difficult
and unreliable.?
- There seems to be a hard lower limit in the data of around 225,000
salmon
- There is a tail towards very high salmon counts (> 500,000 salmon).
Tails like this have a significant impact on the mean value and might
represent some kind of anamoly in the data.
- Overall, the distribution is not real well fit by a bell curve but
the median value of 340,000 is similar to the mean so we can use our
principles of dispersion to calculate significant differences.

- If E1 and E2 are similar then use
(M1-M2)/1.5E1

Here is the distribution of the data with the last 5 years subtracted out, so there are 39 years worth of data:

This distribution, defined by 39 points, has a mean of 368,000 salmon with a dispersion of 81,000 salmon and a mean error of 13,000.

Note: The dispersion for the 39 year sample and the 44 year sample are similar this indicates that we have enough data to accurately determine the dispersion.

Over the last 5 years, the data are defined by an average of 278,000 salmon with a dispersion of 33,000 and a mean error of 15,000 = (33,000/(sqrt of 5)). Does this data show a significant decline of salmon?

Since the mean errors are similar we can use (M1-M2)/1.5E1 for an approximation:

- M1-M2 = 368,000 - 278,000 = 90,000
- 1.5E1 = 1.5*13,000 = 20,000
- difference is 90,000/20,000 = 4.5 dispersion units HIGHLY SIGNIFICANT!