Chi Square Test - is a test measures the probability that the difference between a table of observed values and a table of expected values could have occurred by chance. If the differences are large enough they are said to be significant. Significance levels are typically 5% and 1% depending on the application.
This test is used heavily in the social sciences to look at significant relationships between factors. In the table, one variable is used to categoriize the data into rows and another variable is used to categorize the data into columns. The result is a two dimensional table.
One key element of this test is to determine the expected values for each cell of the table. This is done by tabulating row totals and column totals for the data. Suppose we wanted to know the expected value for cell D3 of a table, the computation would be as follows:
(Total for row 3) * (Total for column D) / (Grand Total for the Table)
or
(118)*(98) / 324 = 35.6913
Excel looks for the user to have two tables included in the spreadsheet. The first includes the actual or observed values the second includes the expected (as calculated above). The Chi-Square Test command is as follows:
=CHITEST(actual data range, expected data range)
Example - Students were given a pretest and then enrolled is courses. The test results were categorized as low, medium, and high. The course performances were categorized as poor, fair, and good. Find the data in the worksheet provided. Test for a significant relationship between pretest results and performance in the course.
The actual and expected tables are in the worksheet. Note that all the calculations for the expected values can be done by Excel. Finally the Chi-Square test was performed on the data. The command loaded in C14 was:
=CHITEST(d3:f5, d9:f11)
Notice that only the table values were used for the test (column or row totals were not included).
The result of 5.44 E-07 translates into 0.000000544, which is much less than 1%. Therefore, the null hypothesis that these two variables (pretest results and course performance) are independent of each other must be rejected. We accept the alternative that there is a significant relationship between the two variables.