Image From NYU Web Publishing
The first thing to decide when you realize you are looking at categorical data with more than one variable is to determine if you want to perform a test for independence or a test for homogeneity.
A χ2 test for independence is appropriate when we are looking at one sample or populations with two variables. Both groups will be drawn from the same population.
A χ2 test for homogeneity is appropriate when we are looking at two separate samples to determine any difference between their respective populations.
Once you determine which test is appropriate, the next step is to write your hypotheses. Regardless of the test, be sure to include context in your hypotheses, either by using meaningful subscripts or identifying the parameters of interest.
Independence Example
When writing a set of hypotheses for a test for chi-squared test for independence, your null hypothesis is that there is no association between the two categorical variables in your given population. Your alternative hypothesis is that there IS an association between the two categorical variables of interest.
For example, let’s say that we are looking at how our favorite sport affects someone’s grade in an AP Statistics class. We could take a random sample of 100 students from your high school’s AP Statistics class and ask them what is their favorite sport, football, basketball or baseball, along with their letter grade for the class.
Our hypotheses would be as follows:
Ho: There is no association between sports preference and letter grade in AP Statistics for students at XYZ High School.
Ha: There is an association between sports preference and letter grade in AP Statistics for students at XYZ High School.
Since this problem involves one population (AP Statistics students at XYZ High School), this would require a test for independence.
Homogeneity Example
When writing a set of hypotheses for a test for chi-squared test for homogeneity, your null hypothesis is that there is no difference in the distribution of the categorical variables between population 1 and population 2. The alternate hypothesis would be that there is a difference between the distribution of the categorical variable between the two populations of interest.
For example, if we wanted to observe how the distribution of sports preference differs among AP Statistics students and AP Calculus students, we could take a random sample of 100 Stats students and 100 Calculus students and determine if the distribution of football, baseball, or basketball preference differs between these two groups.
Our hypotheses would be as follows:
Ho: There is no difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.
Ha: There is a difference in sports preference between AP Statistics and AP Calculus students at XYZ High School.
Since this problem involves two populations (AP Statistics students at XYZ High School and AP Calculus students at XYZ High School), this would require a test for homogeneity (we are looking to see if two populations are homogeneous in terms of sports preference)..
A test for homogeneity is also used in a randomized experiment since our sample is creating two “populations.” For instance, persons receiving new drug treatment and persons receiving placebo💉.
Chi-squared tests require two familiar conditions for inference:
For our test for independence, we need to verify that our data was collected using a simple random sample.
For our test for homogeneity, we need to verify that our data was collected using a stratified random sample or treatments were randomly assigned (experimental design).
Also, when sampling without replacement, we should check the 10% condition for independence.
For our large counts condition, we need to verify that all of our expected counts are at least 5 (similar to other chi-square tests).
🎥 Watch: AP Stats Unit 8 - Chi Squared Tests