Chisquare Correlation Test for Nominal Data
#Statistics #Basics #Correlation
In this article, we will discuss the chisquare correlation test for detecting correlations between two series.
Steps
 Find out all the possible values of the two nominal series A and B;
 Count the cooccurrences of the combinations (A, B);
 Calculate the expected cooccurrences of the combinations (A, B);
 Calculate chisquare;
 Determine whether the hypothesis can be rejected.
Define the Series
Suppose we are analyzing two series A and B. Series A can take values $a_1$ and $a_2$, while series B can take values $b_1$ , $b_2$ and $b_3$.
$$ \begin{align} A &:= \{a1, a2\} \\ B &:= \{b1,b2,b3\} \end{align} $$
As an example, we will use the following A and B series for our calculations in this article.
index  A  B 

1  a1  b2 
2  a1  b2 
3  a1  b1 
4  a2  b1 
5  a2  b3 
6  a2  b2 
7  a1  b2 
8  a2  b2 
Count Coocurrences
To analyze correlations between the two series, we need to look at whether the values of series A and those of series B would occur together. For example, we would like to know the possibility of values for B if we have $a_1$ occurred.
Now we construct a contigency table to denote the ocurrences of the values, (A, B).
a1  a2  

b1  1  1 
b2  3  2 
b3  0  1 
where the cells are filled with the number of occurrences of the corresponding combinations. For example, the combination (a1, b1) occurred once, thus 1 in the first row first column.
This table records the observed frequencies, which we denote as table O and each cell is denoted as $o_{ij}$.
Pearson’s chisquare correlation is a smart idea.
First of all, we define an expectation table E. Each element of E is calculated as
$$ e_{ij} = \frac{ \text{number of } a_i * \text{ number of } b_j }{ \text{ total number of rows in original table } } $$
Now if we compare the original table with this one,
$$ o_{ij}  e_{ij} $$
we get the deviation from the expected table. With a few little twitches, we would define
$$ \chi^2 = \sum_{i,j} \frac{ (o_{ij}  e_{ij})^2 } { e_{ij} } $$
How to Use the Number Chisquare
The final question is how to use the result. We usually have a threshold $\chi_0^2$. Whenever our calculated value is larger than this one, we decide that our analysis rejects the hypothesis that the two columns are correlated. This value $\chi_0^2$ can be found in the textbooks.
Other Methods
L Ma (2018). 'Chisquare Correlation Test for Nominal Data', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/statistics/correlationanalysischisquare/.
Table of Contents
Current Ref:

wiki/statistics/correlationanalysischisquare.md