finding unsupervised learning - cornell universityunsupervised learning pantelis p. analytis...

40
Unsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Unsupervised Learning Pantelis P. Analytis March 19, 2018 1 / 40

Upload: others

Post on 26-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Unsupervised Learning

Pantelis P. Analytis

March 19, 2018

1 / 40

Page 2: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

1 Introduction

2 Finding structure in graphs

3 Clustering analysis

4 Dimensionality reduction

2 / 40

Page 3: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

What’s unsupervised learning?

Most of the data available on the internet do not havelabels. How can we make sense of it?

3 / 40

Page 4: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Finding structure in graphs

4 / 40

Page 5: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Finding structure in graphs

5 / 40

Page 6: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Organizing the web

First attempts to organize the web were based on humancurated directories (Yahoo, looksmart).People also used methods from information retrieval touncover relevant documents.Yet he web has a deluge of untrusted documents, spam,random webpages, advertisements etc.

6 / 40

Page 7: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Elements of the PageRank algorithm

Solution: Use social feedback to rank the quality ofdocuments.You can see links as vote. A page is more important whenit has more incoming links.For instance www.nytimes.com has numerous incomingnotes, as opposed to www.inkefalonia.grLinks from important questions countmore—Recursiveness.

7 / 40

Page 8: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

The iterative PageRank algorithm

At t = 0, assume an initial probability distribution:

PR(pi ; 0) = 1N .

At each time step, the computation yields:

PR(pi ; t + 1) = 1−dN + d

∑pj∈M(pi )

PR(pj ;t)L(pj )

8 / 40

Page 9: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

At t = 0, assume an initial probability distribution:

PR(pi ; 0) = 1N .

At each time step, the computation yields:

PR(pi ; t + 1) = 1−dN + d

∑pj∈M(pi )

PR(pj ;t)L(pj )

9 / 40

Page 10: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

10 / 40

Page 11: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Page Rank Equilibrium

11 / 40

Page 12: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

PageRank: The spider trap

12 / 40

Page 13: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

PageRank: The spider trap

13 / 40

Page 14: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

The Scaled PageRank algorithm

Scaled PageRank Update Rule

Apply basic PR rule.

Scale all values down by factor s.

Divide the 1-s leftover units of PR evenly over nodes.14 / 40

Page 15: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

What’s unsupervised learning?

Most of the data available on the internet do not havelabels. How can we make sense of it?

15 / 40

Page 16: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Clustering: the k-means algorithm

Input: K , set of points x1, ..., xn

Place centroids c1, ..., ck randomly

Then repeat until convergence:

For each point xi find the nearest centroid cj and assignthat point to that clusterIn math notation: argminj D(xi , cj)For each cluster j = 1, ...,K find the new centroid of allpoints xi assigned to cluster j in previous step.In math notation: cj(a) = 1/nj

∑xi→cj

xi (a) for a = 1, ..., d

Stop when the algorithm has converged i.e. none of theitems changes cluster.

16 / 40

Page 17: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Converging to clusters

17 / 40

Page 18: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

How do we select k?

There are diminishing returns in the size of differentclusters.

An intuitive approach suggests picking the after which thedistance reduction flattens out.

18 / 40

Page 19: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Hierarchical Clustering

19 / 40

Page 20: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Agglomerative vs. divisive

Agglomerative clustering starts from the bottom andmoves to larger clusters.

Divisive clustering starts with one cluster which isgradually disintegrated into smaller ones.

20 / 40

Page 21: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Agglomerative vs. divisive

How do we determine the nearness of clusters?

Complete linkage: D(X ,Y ) = maxx∈X ,y∈Y d(x , y)

Single linkage: D(X ,Y ) = minx∈X ,y∈Y d(x , y)

Average linkage: 1|X ||Y |

∑x∈X

∑y∈Y d(x , y).

21 / 40

Page 22: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Agglomerative Clustering

Pick k upfront, stop when we have k clusters.

Stop when a cluster with low cohesion is created(diameter, radius or density-based approaches).

22 / 40

Page 23: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

Step 0: Randomly position the grid’s neurons in the dataspace.

23 / 40

Page 24: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

Step 1: Select one data point, either randomly orsystematically cycling through the dataset in order

24 / 40

Page 25: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

Step 2: Find the neuron that is closest to the chosen datapoint. This neuron is called the Best Matching Unit(BMU).

25 / 40

Page 26: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

Step 3: Move the BMU closer to that data point. Thedistance moved by the BMU is determined by a learningrate, which decreases after each iteration.

26 / 40

Page 27: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

Step 4: Move the BMU’s neighbors closer to that datapoint as well, with farther away neighbors moving less.Neighbors are identified using a radius around the BMU,and the value for this radius decreases after each iteration.

27 / 40

Page 28: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

28 / 40

Page 29: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

29 / 40

Page 30: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

30 / 40

Page 31: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

31 / 40

Page 32: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

Update the learning rate and BMU radius, beforerepeating Steps 1 to 4. Iterate these steps until positionsof neurons have been stabilized.

32 / 40

Page 33: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Kohonen’s self-organizing maps

33 / 40

Page 34: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Principal component analysis

34 / 40

Page 35: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Principal component analysis

35 / 40

Page 36: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Principal component analysis

36 / 40

Page 37: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Principal component analysis

Often used to accelerate supervised learning.

Visualization

37 / 40

Page 38: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Principal component analysis

38 / 40

Page 39: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Principal component analysis

39 / 40

Page 40: Finding Unsupervised Learning - Cornell UniversityUnsupervised Learning Pantelis P. Analytis Introduction Finding structure in graphs Clustering analysis Dimensionality reduction Elements

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Dimensionality reduction in recommender systems

40 / 40