Cluster Analysis

Unsupervised ML- Cluster Analysis

When we don’t have Target Variable, and we only have Independent Variables we go for this.

Cluster Analysis.

Unsupervised Learning Technique
Grouping similar data
Cluster is a collection of items, similar between themselves, and dissimilar to the objects belonging to other clusters.
Variance within the cluster should be less, and variance between the cluster should be high,
We usually try to find the hidden patterns using this cluster analysis.
We do this analysis when we don’t have target variables. Clustering Techniques

Example - Amazon Suggestion

Amazon suggests products, group the existing customers into clusters with similar habits.

When the new customer comes in it automatically generates suggestion based on the existing cluster.

Different types of clustering methods

Distance based Hierarchical Partitioning Probabilistic

Distance Based- Ex: K Means Algorithm- Which works better on almost all the dataset

We go for K Means,

For a particular Dataset- How Many clusters need to be built. We need to define the number of clusters to be built It depends on

Considers the percentage of variance as a function of number of clusters

K Means clustering: 5 steps to build K Means Clustering

Specify the desired number of clusters k
Randomly assign each data point to cluster
Calculates the centroid for every cluster
Reassign each point to the closest cluster centroid(Grouping based on Min Distance)
Recalculate the cluster
Repeat step 4 & 5 until assigning datapoint to cluster remain constant

Before K Means- Data Preprocessing-Data Scaling is important if Data is continuous variables.

Feature Engineering needs to be performed.

These two has to be performed before K means clustering

K Means Clustering- Applications Data mining Information Retrieval Text Mining Web Analysis Marketing Medical Diagnostic