In Previous post, we crossed through feature selection for numerical values, In this post, we are going to see about feature selection for categorical values. And yes! the method is “ Chi Square Test of Independence”..
Chi Square - Formula
Degrees of freedom formula
(number of rows-1)(number of columns-1)
Chi Square - Explanation
- Let us consider Null Hypo and Alternate hypo,
- Null Hypo - Two Variables are Independent
- Alternate Hypo - Two Variables are not Independent
- Now calculate degrees of freedom and chi square Test of Independence.
- Calculate critical value with confidence interval.
- If Chi Square Test result is greater than Critical Value, then reject null hypothesis.
Note - Use Contigency table for better representation.
Example in detail
Below is the Contigency Table
And we need to calculate rowsum and columnsum like below
And we need to calculate Expected Value like below
EXPECTED VALUE = (ROWSUM*COLUMNSUM)/TOTALSUM
And we need to calculate the Chi Square value. Please see below
And finally the sum is below
How to calculate critical value with Degrees of freedom.. Explained below..
See the table, and here our degrees of freedom is 6 and Confidence Interval is 95% that is 0.95.. See the picture below and we obtain critical value.
The Final calculation is explained below.
Take some rest and come back for next page, such an interesting field,.. Loving this…