machine learning hands on clustering

17
MACHINE LEARNING Clustering

Upload: dragoscrintea

Post on 18-Jul-2015

126 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Machine learning hands on clustering

MACHINE LEARNING Clustering

Page 2: Machine learning hands on clustering

WHAT’S IN THE MENU - RECOMMENDATIONS

1. Why so popular

2. Supervised vs Unsupervised Learning

3. Topic2

4. Topic3

5. Topic4

6. Wrap-up

Page 3: Machine learning hands on clustering

MACHINE LEARNING

http://videolectures.net/Top/Computer_Science/Machine_Learning/

Page 4: Machine learning hands on clustering

WHY IS MACHINE LEARNING (CS 229) THE MOST POPULAR COURSE AT STANFORD? - ANDREW NG

Page 5: Machine learning hands on clustering

WHAT CAN YOU TELL ME ABOUT X?

Supervised vs unsupervised learning

Typical methods: regression and classification

Given an object with observed set of features X1, …., Xn

having an response Y, the goal is to predict Y using X1,

…., Xn

Typical methods: principal component analysis (PCA),

expectation maximization (EM) and clustering (k-means

and its variations)

Given an object with observed set of features X1, …., Xn,

the goal is to discover relationships or groups between

variables or observations. Clustering algorithms try to find

natural grouping in data and therefore similar datasets.

Page 6: Machine learning hands on clustering

APPLICATIONS

Market segmentation : given market research results, how you can find the best

customer segments

Anomaly detection : find fraud, detect network attacks, or discover problems in

servers or other sensor-equipped machinery. Is important to be able to find new

types of anomalies that have never seen before.

Healthcare: accident prone factor of the area to hospital assignment, gene clustering

Page 7: Machine learning hands on clustering

GROUPING UNLABELED ITEMS USING K-MEANS CLUSTERING

SWAT

Strengths :

� Will always converge

� Scales well

Weakness :

� Can converge at local minima

� Slow on very large datasets

� Choosing the wrong k

Advantages :

� Easy to implement

Page 8: Machine learning hands on clustering

GROUPING UNLABELED ITEMS USING K-MEANS CLUSTERING

Page 9: Machine learning hands on clustering

SIMILARITY

There are several ways on measuring similarity between observations.

Manhattan distance

Euclidian distance

Cosine distance

Page 10: Machine learning hands on clustering

K-MEANS PSEUDO CODE

Randomly create k points for starting centroids

----------------------------------------------------------------

For every point assigned to a centroid

Calculate the distance between the centroid and point

Assign the point to the cluster with the lowest distance

----------------------------------------------------------------

For every cluster calculate the mean of the points in that cluster

Assign the centroid to the mean

While any point has changed cluster assignment Repeat until convergence

Cluster assignment

step

Move centroid

step

Page 11: Machine learning hands on clustering

COST FUNCTION & RANDOM INITIALIZATION

for i = 1 to 100 {

randomly initialize k-means

run k-means and get centroids positions c(1 to m) and µ(1 to K)

compute cost function J(c(1 to m), µ(1 to K))

}

Pick clustering that gave lowest J(c(1 to m), µ(1 to K))

Cluster assignment step: minimize J c(1 to m) while holding µ(1 to K) fixed

Move centroid step: minimize J with respect to µ(1 to K)

Page 12: Machine learning hands on clustering

PERFORMANCE CONSIDERATION

K-means

The K-means has the computational complexity of O(iKnm),

i is the number of iterations,

K the number of clusters,

n the number of observations,

m the number of features.

Improvements:

•Reducing the average number of iterations.

•Parallel implementation of K-means by leveraging Hadoop or Spark.

•Reducing the number of outliers and possible features by noise filtering with a smoothing

algorithm.

•Decreasing the dimensions of the model.

Page 13: Machine learning hands on clustering

FRAMEWORKS

Java : Weka, Mahout, spark

Python: scikit-learn, py-spark, Pylearn2 (Theano)

C ++: Shogun

.NET: Encog

https://github.com/josephmisiti/awesome-machine-learning

Page 14: Machine learning hands on clustering

PLATFORMS - IBM BLUEMIX

Page 15: Machine learning hands on clustering

PLATFORMS – MICROSOFT AZURE ML

Page 16: Machine learning hands on clustering

REFERENCES

http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/

http://www-bcf.usc.edu/~gareth/ISL/

Page 17: Machine learning hands on clustering

BOOKS