For instance, suppose I go to my son's second grade class and I want to determine the mean age of the class. Well, a priori I know that kids of age 6 to 8 are in the second grade and so, to determine the mean age, I really only need to ask 2 or 3 kids, essentially to verify that this is a normal second grade class and not filled with small 40 year olds.
However, to determine the mean age of the students in this class would require considerably more samples, because the dispersion or range of ages is larger.
The measure of a dispersion is a somewhat mathematically complex procedure which you don't need to know. This will be done automatically for you using the provided tools. What you need to do is to understand how to interpret it.
The measure of dispersion assumes that the sample can be adequately represented by a normal or guassian distribution. We will discuss this in detail later but for now we can assume that most samples are adequately represented in this way. Let's return to the rainfall example and show the results of fitting the data to a mean value plus a dispersion. This is shown here:
The mean value for this data is 51.5 inches and the dispersion is 8 inches. So how do you interpret this?
A measure of dispersion is also a measure of probability. It is DEFINED in such a way that +/- 1 dispersion unit (usually called 1 sigma) contains 68% (about 2/3) of the sample. In the rainfall example given above, the fit to the data means that 68% of the time in eugene the average annual rainfall is between:
43.5 and 59.5 inches.
Furthermore, there is no significant difference for quantities that are separated by less than one dispersion unit. That is, it would be inappropriate to say that a 55 inch annual rainfall is significantly above normal.
A dispersion is then a measure of the statistical fluctuation around some mean quantity. You must have knowledge of this if you are to identify an event as being significant. Here is a simple guide:
23 | 33 | 43 | 25 | 37 | 51 |
---|
I find that the dispersion in this population is 23 years.
Now a sample size of 6 does not represent a sufficiently large sample to accurately determine a dispersion (as we will see later, minimum sample sizes are N=25--30). If I thought that 6 did form a representative sample then in the future I would expect to see people in my store covering the age range 35 +/- (2*23) since that represents 95% of the population, according to my statistical sampling. This implies a range of ages from -6 to 81 years. The last time I looked I didn't see any pre-born people in the store.
Hence, there are some physical limits that you can apply to some one's estimation of the dispersion in a population to check that value.
Following are plots showing normal distributions with differeing dispersions to give you a graphical feeling for how dispersion is equated with the range of the data.
In many cases, there are physical limits which slightly skew the distribution. For instance, if I measure the size of bacteria in a test tube, I know there will be no bacterize of size zero. If I measure flood levels, there will be no such thing as a zero size flood. Those kinds of data tend to produce distributions like this
Although its not symmetrical, the same general principles of dispersion measurement can be applied with high accuracy.