The Chi-Square Goodness of Fit Statistic
 
 

    This statistic enables us to test whether observed frequencies are close to
     those we would expect.

    1. Expected frequencies

            These expectations can come from many sources. For example, if we
            have an honest die we would expect 1/6 of all cases to show any of
            the 6 possible faces to come up.

So, for example, if we roll a die 60 times we would expect:  
X     f

1     10
2     10
3     10
4     10
5     10
6     10
       60
 

Or, as a null hypothesis we have:

The alternative hypothesis here is complicated!


If we actually roll a die 60 times we wouldn't expect each of the faces
to come up exactly 10 times, only "close to" 10 times. How many
times each face actually comes up is the observed frequency for that face. For example, let us say that we get the following results:
 
  X     fo     fe

1     12     10
2       9     10
3     10     10
4     11     10
5       8     10
6       9    10
                60
 

Are these observed frequencies close or far away from what we would expect if all sides have an equal probability of coming up?
 
 

    In 1900, Karl Pearson discovered a statistic that can tell us the
       answer to this question.
 


fo = the observed frequency

fe = the expected frequency

Although this is called a "goodness of fit" statistic, it actually measures "badness of fit"

This statistic has C-1 degrees of freedom, where C is the number of sides ("catagories") that the die has.
 
 

                        Thus, to calculate this statistic for the data above we do the
                        following: X     fo     fe     (fo-fe)     (fo-fe)2     (fo-fe)2/fe

1     12     10         2             4                 .4
2       9     10       -1             1                 .1
3     10     10         0             0               0.0
4     11     10         1             1                 .1
5       8     10        -2             4                 .4
6     10     10         0             0                 0.0
                60                                           1.0
 
 
 
 

Is this a big discrepancy or a small one? To answer this question we look up a "critical number" in a chi-square table.
 
 

The d.f. for this statistic equals C - 1, or in our case, 5.  Looking
up the magic number for 5 d.f. gives us 11.07 if type 1 error
= .05

Thus, our sample chi-square value (1.0) does not fall in the
rejection region (which starts at 11.07), and we fail to reject
the null hypothesis above.  That null hypothesis says that
the die is an honest die.
 


Another Example:

Assume for a moment that in the US, political affiliation is distributed in the following manner: Democrat = 52%, Republican = 40%, and Independent = 8%. Say you were interested in finding out of the political affiliation of UO students resembles that of the general population. In addition, you might interested in finding out if there is a relationship between a person's political affiliation and their stance on abortion.

The one thing that both of these questions have in common is that they involve nominal variables. A person is either a Democrat, Republican, or Independent and is either pro-choice or pro-life. The data you generate from these studies are frequencies. In other words, you end up with counts of the number of people in your sample who are Democrats, etc. or are pro-choice, etc.

One-Way Chi-Square: Goodness-of-Fit Test

This procedure allows you to examine data regarding various categories of one variable. Because it is used to compare the extent to which observed frequencies fit an expected, theoretical frequency distribution, it is called the goodness-of-fit test.

Consider one of the questions posed earlier; the extent to which political affiliations of UO students are similar to those of the general population. Say we asked 10 UO students their political affiliation. If they were similar to the general population, we would expect about 5 of them (52%) to be Democrats, 4 (40%) Republicans, and 1 (8%) to be Independent. However, say we actually found 7 Democrats, 1 Republican, and 2 Independents.

Democrat

Republican

Independent

Observed

7

1

2

Expected

5

4

1

The null hypothesis in this situation is that the observed differences in frequencies are due to chance and do not reflect a true difference in frequencies in the population.

The chi-square statistic is calculated using the previous "formula":

Where we do the sum over 3 categories:

so

chi2 = (7-5)2/5 + (1-4)2/4 + (2-1)2/1 = 4.05

We have 3-1 = 2 degrees of freedom here and from the previous table we see the critical value is 5.99. Since our value is not beyond the critical value, we fail to reject the null hypothesis. The political affiliation of UO students is not different than the general population.



One final example:

It was very dry in the 2001/2002 water year. Was that statistically significant?

Here is the data:

Normals are based on 30 year means)

October: Normal = 3.41 Actual = 3.06
November: Normal = 8.32 Actual = 1.61
December: Normal = 8.61 Actual = 3.98
January: Normal = 7.91 Actual = 1.46
February: Normal = 5.64 Actual = 1.69

Null hypothesis is that actual = normal.

Compute the sums:

October: (3.41 - 3.06)2 / 3.41 = 0.04

November: (1.61 - 8.32)2 / 8.32 = 5.41

December: (3.98 - 8.61)2 / 8.61 = 2.49

January: (1.46 - 7.91)2 / 7.91 = 5.26

February: (1.69 - 5.64)2 / 5.64 = 2.76

October contributes almost nothing to the signal, but all other months are significant. The sum is 15.06

Critical value for df = 5 (all the variables are independent) is 11.07 , so we reject the null hypothesis. Note we can even test that at the p = 0.01 level that's how extreme the drought was.

Another chi square table