Unsupervised ML- Cluster Analysis
When we don’t have Target Variable, and we only have Independent Variables we go for this.
Cluster Analysis.
- Unsupervised Learning Technique
- Grouping similar data
- Cluster is a collection of items, similar between themselves, and dissimilar to the objects belonging to other clusters.
- Variance within the cluster should be less, and variance between the cluster should be high,
- We usually try to find the hidden patterns using this cluster analysis.
- We do this analysis when we don’t have target variables. Clustering Techniques
Example - Amazon Suggestion
Amazon suggests products, group the existing customers into clusters with similar habits.
When the new customer comes in it automatically generates suggestion based on the existing cluster.
Different types of clustering methods
Distance based Hierarchical Partitioning Probabilistic
Distance Based- Ex: K Means Algorithm- Which works better on almost all the dataset
We go for K Means,
For a particular Dataset- How Many clusters need to be built. We need to define the number of clusters to be built It depends on
- Domain
- Business understanding
- Statistical Method
- Ex: for Statistical Method. Elbow Method, Dendogram Elbow:
Considers the percentage of variance as a function of number of clusters
K Means clustering: 5 steps to build K Means Clustering
- Specify the desired number of clusters k
- Randomly assign each data point to cluster
- Calculates the centroid for every cluster
- Reassign each point to the closest cluster centroid(Grouping based on Min Distance)
- Recalculate the cluster
- Repeat step 4 & 5 until assigning datapoint to cluster remain constant
Before K Means- Data Preprocessing-Data Scaling is important if Data is continuous variables.
Feature Engineering needs to be performed.
These two has to be performed before K means clustering
K Means Clustering- Applications Data mining Information Retrieval Text Mining Web Analysis Marketing Medical Diagnostic