mev: joe white

Analysis of Multiple Experiments

TIGR Multiple Experiment Viewer (MeV)

Joseph White DFCIJanuary 24,2008

MeV

• Stand-alone java application for analysis

• New version: 4.1

• Not database centric; uses TDMS files

• Writes TDMS files

• Primarily for normalized data

• MeV does not currently write MAGE-TAB

• Download MeV from: tm4.org

Outline

• Description of MeV• How MeV treats expression• Some essential concepts• Demo: basic operations in MeV

– New file loader– ANOVA example

• Demo of MeV new features– Affymetrix file reader– Non-parametric tests– CGH

• GCOD

The Expression Matrix is a representation of data from multiplemicroarray experiments.

Each element is a log ratio(usually log 2 (Cy5 / Cy3) )

Red indicates a positive log ratio, i.e, Cy5 > Cy3

Green indicates anegative log ratio , i.e.,Cy5 < Cy3

Black indicates a logratio of zero, i. e., Cy5 and Cy3 are very close in value

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6

Gene 1

Gene 2

Gene 3

Gene 4

Gene 5

Gene 6

Gray indicates missing data

Expression Vectors-Gene Expression Vectors

encapsulate the expression of a gene over a set of experimental conditions or sample types.

-0.8 0.8 1.5 1.8 0.5 -1.3 -0.4 1.5

-2

0

2

1 2 3 4 5 6 7 8Log2(cy5/cy3)

Expression Vectors As Points in‘Expression Space’

Experiment 1

Experiment 2

Experiment 3

Similar Expression

-0.8

-0.60.9 1.2

-0.3

1.3

-0.7Exp 1 Exp 2 Exp 3

G1

G2

G3

G4

G5

-0.4-0.4

-0.8-0.8

-0.7

1.3 0.9 -0.6

Distance and Similarity

-the ability to calculate a distance (or similarity, it’s inverse) between two expression vectors is fundamental to clustering algorithms

-distance between vectors is the basis upon which decisions are made when grouping similar patterns of expression

-selection of a distance metric defines the concept of distance

Distance: a measure of similarity between genes.

Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6

Gene A

Gene B

x1A x2A x3A x4A x5A x6A

x1B x2B x3B x4B x5B x6B

Some distances: (MeV provides 11 metrics)

1. Euclidean: i = 1 (xiA - xiB)26

2. Manhattan: i = 1 |xiA – xiB|6

3. Pearson correlation

p0

p1

Distance is Defined by a Metric

-2

0

2

log2

(cy5

/cy3

)

Euclidean Pearson(r*-1)Distance Metric:

4.2

1.4

-1.00

-0.90D

D

Normal distribution

X = μ (mean of the distribution)

σ = std. deviationof the distribution

Current MeV Algorithms

• Hierarchical Clustering• K Means clustering• Support Trees for HCL• EASE (annotation clustering• Self-organizing maps• K-Nearest Neighbors• Support Vector Machines• Relevance Networks• Template Matching• PCA• CGH• Bayesean Networks

• T-test• ANOVA

– One and two factor

• SAM• Non-parametric tests

– Wilcoxon

– Fisher Exact Test

– Mack-Skillings

– Kruskat-Wallins

• BRIDGE

Demos

• File loaders

• HTA data: ANOVA

• Affymetrix data: SAM

• Non-Parametric tests

• CGH

GeneChip Oncology DatabaseBreast Cancer

(10%)

CNS Tumor (14%)

Head and Neck (12%)

Leukemia (28%)

Lung Cancer (6%)

Prostate Cancer (10%)

Ovarian Cancer (4%)

Other (16%)

GeneChip Oncology Database

0

400

800

1200

1600

HG-U133A HG_U95A(v2) Hu6800 Other

Nu

mb

er o

f C

hip

s

0

5

10

15

20

25

> 200 100 ~ 200 50 ~ 100 20 ~ 50 < 20

Number of Chips per Study

Nu

mb

er o

f S

tud

ies

GCOD statistics

• Studies: 52• Hybridizations: 4591• Analysis Result sets: 12,637• Signal values: 204,296,195• Samples: 3644• Probesets: 160,817

eg. (HG-U133A: 22,293)

(HG_U133_Plus_2: 54,684)

• Arraydesigns: 9• Accessions: 54,414

MeV Team

• Eleanor Howe

• Sarita Nair

• Raktim Sinha

• [email protected]

mev: joe white

Business