introduction to clustering algorithm

25
Clustering Algorithm COMPLEX NETWORK ALGORITHM AMIR HADIFAR 1

Upload: hadifar

Post on 13-Feb-2017

176 views

Category:

Education


0 download

TRANSCRIPT

1

Clustering Algorithm COMPLEX NETWORK ALGORITHMAMIR HADIFAR

2Objectives

At the end of this presentation you will understand : Understand data science and it’s application Get overview of Machine Learning Learn some type of clustering algorithm Implementation clustering with R

3Data science and it’s Applications

Extract knowledge or insight from data From speech-recognition and search engine to health-care and

humanities These scenarios involves :

Storing , organizing and integrating huge amount of unstructured data Processing and Analyzing data Extracting Knowledge , insight and predict future from data

Processing , Analyzing , Extracting knowledge and insight done through Machine Learning

4Data science and it’s Applications

5Machine Learning

Field of study that gives computers the ability to learn without being explicitly programmed

Classified into three broad category : Supervised Learning Unsupervised Learning *Reinforcement Learning

6Machine Learning Category

Supervised learning Decision tree learning Classification …

Unsupervised learning Clustering Association rule learning …

7Cluster definition

Cluster analysis or clustering grouping similar object together ( called cluster)

Type of Clustering Intra-class similarity Inter-class similarity

8Clustering Scenario

The following scenarios implement clustering :

Market segmentation Summarized news ( cluster and then find centroid ) City planning Image segmentation

9Methods of clustering

Partitioning methods (Centroid models ) Hierarchical methods (Connectivity models ) Density-based methods Grid-based methods Model-based methods Constraint-based methods

10Partitioning method

database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data which satisfy following : Each group contains at least one object Each object must belong to exactly one group

Points to remember This method create initial partitioning Use iterative relocation technique to improve partitioning

11K-Mean or Lyold’s algorithm

12Other K-mean variant

K-mean++ K-mean stream Mini batch k-mean K-medoids Fuzzy k-means Many others

13K-mean Clustering with R

14Hierarchical Clustering

Agglomerative Bottom up

Divisive Top down

15Calculate distance between points

Single linkage Complete linkage Average linkage

16H Clustering with R

17Density based Methods

Areas of higher density consider as cluster Sparse areas usually consider as noise It use two basic idea

Density reachable Density connectivity

18DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

19DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

20DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

Advantage Does not require a-priori specification of number of clusters. Able to identify noise data while clustering. is able to find arbitrarily size and arbitrarily shaped clusters

Disadvantage Fails in case of neck type of dataset. Does not work well in case of high dimensional data

21Grid based algorithm

Using multi-resolution grid data structure Clustering complexity depends on number of grid cell and not objects Space into finite number cells that form a grid structure on which all of

the operation for clustering is performed Clique , STING , WaveCluster

22Clique ( CLustering-In-QUEst

Clique is used for clustering high-dimensional data High dimensional data means have many attrs Clique identifies the dense unit in subspace

23StackOverFlow Analysis Using R

24StackOverFlow Analysis Using R

25StackOverFlow Analysis Using R