Vectoronica
The Chi-square test and Interpreting the Chi-squared test 본문
The Chi-square test and Interpreting the Chi-squared test
SeungUk Lee 2021. 5. 22. 12:461. the chi square test for independence : it's possible to not only describe the relationship between two categorical variables, but also assess objectively whether one variable really influences the other, whether they're independent.
1-1) Hypotheses test : It also conducts a hypothesis test. The null hypothesis for this situation is that the variables are independent, the alternative hypothesis is that they're dependent. It's noteworthy that there's no possibility to formulate a one sided hypothesis here. If the variables were independent, we could calculate the joint frequencies by multiplying the marginal frequencies and then dividing by the overall sample size.
1-2) The chi-square test : First, The equation looks like this.
For all cells i in a contingency table, we calculate the observed minus expected avlue, square the results, next divide that by the expected count and sum all the values for each individual cell. The chi squared statistic follows a chi squared distribution. It's a distribution with only one parameter. Degrees of freedom. And that parameter colpletely determines the shape of that distribution.
As you can see, the distribution is always positive and there are a couple of interesting things.
- The higher value for the degrees of freedom, the less skewed the distribution becomes.
- The distribution becomes wider if the degrees of freedom gets higher.
- And it moves to the right.
** Note that the degrees of freedom parameter represents the mean of the distribution
Then, how do we calculate the right degrees of freedom ? In a table with r rows and c columns, there are r minus 1 times c minus 1 degrees of freedom. r means the number of rows, c means the number of columns.
Once you find a p value using a software or a table, you can conduct a hypothesis test. If the p value is beyond the limit, then you reject the null hypothesis. If it doesn't, then you can't reject the null hypothesis.
** The chi squared statistic is better described by the chi squared distribution if the sample size that is the total number of cases in the cross table, increases. The sample size should be that expected cell count in each cell is at least five.
2. Interpreting the Chi-squared test
2-1) the strength between two variables : without correcting for the number of cells, the size of the chi squared statistic can however not be interpreted by itself and it doesn't tell anything about the effect size, that is, the strength of the association. Several indices have been created to express the strength of association between two nominal variables, and the most populat is Cramer's V. The equation is this.
The value for Cramer's V ranges from zero to one, regardless of the size of a contingency table. The value of zero means that there is no association between the variables and the value of one means that there is a perfect association.
** Note that the less square the contingency table that is the more unequal the number of rows versus the number of columns, the larger the index tends to become without strong evidence of a meaningful association.
2-2) the pattern of the association : it helps us find out in which cells particularly high or low values are observed. For this purpose, the residuals per cell can be used, the residuals per cell are the difference between the observed and expected frequencies. And, these will have to be standardized.
the resulting standardized residuals follow a z-distribution. So their values can directly be interpreted as how many standard deviations from the observed frequencies are away from the expected frequencies.
'데이터 사이언스 > 통계' 카테고리의 다른 글
The regression equation and the regression model (0) | 2021.05.26 |
---|---|
Fisher's exact test and linear regression(regression line) (0) | 2021.05.24 |
Chi-squared as goodness of fit and The side notes of the Chi-squared test (0) | 2021.05.23 |
Controlling other variables and Contingency table (0) | 2021.05.21 |
two dependent proportions and two dependent means (0) | 2021.05.20 |