different types of data e.g. continuous data:height categorical data ordered (nominal):growth rate...

6
Different types of data e.g. Continuous data : height Categorical data ordered (nominal) : growth rate very slow, slow, medium, fast, very fast not ordered : fruit colour yellow, green, purple, red, orange Binary data : fruit / no fruit

Upload: rachel-green

Post on 28-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit

Different types of datae.g.

Continuous data : height

Categorical data

ordered (nominal) : growth rate very slow, slow, medium, fast, very

fast

not ordered : fruit colour yellow, green, purple, red, orange

Binary data : fruit / no fruit

Page 2: Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit

Similarity matrix

We define a similarity between units – like the correlation between continuous variables.

(also can be a dissimilarity or distance matrix)

A similarity can be constructed as an average of the similarities between the units on each variable.

(can use weighted average)

This provides a way of combining different types of variables.

Page 3: Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit

relevant for continuous variables:

Euclidean

city block or Manhattan

Distance metrics

A

B

A

B

(also many other variations)

Page 4: Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit

Similarity coefficients for binary data

simple matching

count if both units 0 or both units 1

Jaccard

count only if both units 1

(and many other variants)

simple matching can be extended to categorical data

Page 5: Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit

Clustering methods

hierarchical

divisive

put everything together and split

monothetic / polythetic

agglomerative

keep everything separate and join the most similar points (classical cluster analysis)

non-hierarchical

k-means clustering

Page 6: Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit

Agglomerative hierarchical

Single linkage or nearest neighbourfinds the minimum spanning tree: shortest tree that

connects all pointschaining

Complete linkage or furthest neighbourCompact clusters of approximately equal size.(makes compact groups even when none exist)

Average linkage methodsbetween single and average linkage