
| Eugene | Seattle |
|---|---|
mean = 51.5 inches | mean = 39.5 inches |
dispersion = 8.5 | dispersion = 7.0 |
On average, does it rain significantly more in Eugene than Seattle?
Here is the wrong way to do this problem:
12 inches is 12/8 = 1.5 dispersion units and therefore not significant.
But this is not the correct procedure to use when comparing two separate distributions. It is only the correct procedure to use when comparing one data point to the rest of the same distribution.
Again, here is an example of that:
The figure below shows the histogram of rainfall in Eugene from 1900-2000. The bin width in this case is 2 inches of rainfall.
For this case, the 100 year data set gives a mean of 42 inches and a
dispersion of 9 inches. This means that 2/3 of the time, the mean
annual rainfaill in Eugene can be expected to be between 33 and 51
inches. (Note: The mean rainfall using just the last 30 years worth
of data is higher
we will show this below).
The official rainfall in Eugene in 1996 was 77 inches. Is this an expected 1 in a hundred year rainfall amount? Note, a 1 in 100 chance corresponds to 2.5 dispersion units. A 1 in 100 chance is 3 dispersion units.
therefore 1996 was not a statistical fluctuation in normal weather
patterns; it was a systematic departure.
The error in the mean can be thought of as a measure of how realiable a mean value has been determined. The more samples you have, the more reliable the mean is. But, it goes as the square root of the number of samples! So if you want to improve the reliability of the mean value you would have to get 100 times more samples. This can be difficult and often your stuck with what you got. You then have to make use of it.
Back to the Eugene/Seattle comparison based on the last 25 years worth of data (so N = number of samples = 25).
| Eugene | Seattle |
|---|---|
mean = 51.5 inches | mean = 39.5 inches |
dispersion = 8.1 | dispersion = 7.0 |
N = 25 | N = 25 |
error in mean = 8.1/5 | error in mean = 7.0/5 |
error in mean = 1.6 | error in mean = 1.4 |
The difference in mean rainfall between Seattle and Eugene is (51.5 - 39.5) = 12 inches which is 12/1.6 = 7.5 dispersion units difference in the mean value.
Thus there is a highly significant difference in the mean annual rainfall between Eugene and Seattle.
Note this method is only an approximation. A more exact and proper way to compare two sample means will be given later.
Comparing Two Sample Means - Find
the difference of the two sample means in units of sample mean
errors. This works as follows:
Difference in terms of signifance is:
In general, in more qualitative terms:
Simple Approximation:
Let's now apply this principle to some real data.
First back to rainfall data. I stated earlier that the mean annual
precipiation in Eugene was higher over the last 30 years than it has
been over the last 100 years. Let's see if that difference is significant.
To do this, we break the 100 year data set into two.
Is the difference in means significant?
In fact, one can also note that the actual dispersion between the two
data sets is similar (about 8 inches) which indicates similar year to
year variations, its just that the mean level has gone way up
Now let's focus on another example, based on Salmon Count Data at
Bonneville Dam.
The actual salmon count data:
This distribution, defined by 44 points, has a mean of 358,000
salmon with a dispersion of 82,000 salmon. The error in the mean
is 12,000 (82000/(square root of 44))
Points to note about the distribution:
Here is the distribution of the data with the last 5 years subtracted
out, so there are 39 years worth of data:
There has been some speculation and data that suggest there has been
a decline of salmon recently in the Columbia River System. What
do these data say.?
1900 - 1970 1970 - 2000 mean = 39.6 inches
mean = 49.9 inches
dispersion = 7.7
dispersion = 8.4
N = 70
N = 30
error in mean = 0.9
error in mean = 1.5
This is weird since Eugene is the only site in the PNW that
shows this kind of trend.


This distribution, defined by 39 points, has a mean of 368,000 salmon with a dispersion of 81,000 salmon and a mean error of 13,000.
Note: The dispersion for the 39 year sample and the 44 year sample
are similar
this indicates that we have enough data to accurately
determine the dispersion.
Over the last 5 years, the data are defined by an average of 278,000 salmon with a dispersion of 33,000 and a mean error of 15,000 = (33,000/(sqrt of 5)). Does this data show a significant decline of salmon?
Since the mean errors are similar we can use (M1-M2)/1.5E1 for an approximation:
HIGHLY
SIGNIFICANT!