KNN is K-Nearest Neighbour
- K- Nearest Neighbour
- canbe used for classification and Regression
- It Predicts test cases for nearest neighbour
- It also slows down when the data increases.
- It is a kind of Lazy Learning which means it never stores pattern from learning data.
- Local Heuristic
- KNN is time consuming but it is accurate when compared to other algorithms.
Detailed Explanation
- Pick a number of neighbours you want to use for classification or Regression
- Choose a method to measure distances
- Keep a data set with records
- For Every new point identify the number of nearest neighbours we picked using the method we choose.
- Let them vote if it is a classification or take a mean/median for regression
- It calculates distance score.
KNN Calculation
- First Observation of (test-train)^2 = (x1-x2)^2+(y1-y2)^2, Likewise it calculates 100 distance score if we have 100 test observations.
- Finally it takes three minimum distance score and see what majority it is, whether yes or no, if yes- then assign yes to test case- This is for classification
- For Regression- Go for mean/median
If we are building KNN Model for classification, then k should be odd, ( If we give even example K=2, then we are saying KNN to choose two nearest neighbours, if it chooses both categories, yes or NO, then it will get confused. Thats why we need to give odd values for K, if we give k =3, then we choose major one)
k=1 -means select one nearest neighbour, this is how it works Formula for distance method P1
- Popular Distance Method
- Euclidean
- Manhattan
- Weighted Distance
If Data is continuous then, go for Euclidean, If Data is categorical and numerical, then go for
Manhatten
If you want to assign weightage to few Independent variables then go for Weighted Distance.
When continuous variables, we can calculate distance using formula, but how to calculate distance when comes to categorical variable, Like Logistic Regression, it creates dummies for categorical variable and then calculates distance
Collaborative Filltering
In Above Image,
KNN computes the distance between the two variables(users here), and if it finds low, then it recommends the movies watched by one person to other.
This is how the recommendation of e-commerce and other e-platforms work.