07.k means clustering

We input some unlabeled data, and the unsupervised learning algorithm returns back possible clusters of the data.

This indicates, we have data that only contains features and we want to see if there are any patterns in the data that would allow us to create groups or clusters.

In short, we have unlabeled data and attempt to discover possible labels through clustering.

K Means Clustering

K Means clustering is an unsupervised learning algorithm that will attempt to group similar clusters together in our data.

Typical clustering problem looks as below

  • Cluster similar documents
  • Cluster Customers based on Features
  • Market segmentation
  • Identify similar physical groups

The overall goal is to divide data into distinct groups such that observations within each group are similar.

The K Means Algorithm

  • Choose a number of clusters 'K'
  • Randomly assign each point to a cluster
  • Until clusters stop changing, repeat the following:
  • For each cluster, compute the cluster centroid by taking the mean vector of points in the cluster
  • Assign each data point to the cluster for which the centroid is the closest.

  • There is no easy answer for choosing the best 'K' value.

  • One way is the elbow method

elbow

K depends more on the context of situation otherwise domain knowledge.

K Means example1

computeCost() function is not available in K Means.. Have to check this again