exhaustive signature algorithm

24
Exhaustive Signature Algorithm Guy Harari

Upload: lynton

Post on 16-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Exhaustive Signature Algorithm. Guy Harari. Outline. ISA biclustering algorithm Bimax biclustering algorithm Exhaustive Signature Algorithm Results and future work. ISA algorithm. Was developed by Sven Bergmann in 2003. Goal: find genes/conditions having correlated expression. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exhaustive Signature Algorithm

Exhaustive Signature Algorithm

Guy Harari

Page 2: Exhaustive Signature Algorithm

Outline

• ISA biclustering algorithm• Bimax biclustering algorithm• Exhaustive Signature Algorithm• Results and future work

Page 3: Exhaustive Signature Algorithm

ISA algorithm

• Was developed by Sven Bergmann in 2003.• Goal: find genes/conditions having correlated

expression.• Frequently used, compared and improved.• Good results in real data.

Page 4: Exhaustive Signature Algorithm

ISA - details

• Input – expression matrix , initial gene set.• Compute by normalizing each column.• For each condition– z-test avg. normalized expression in gene subset

against avg. expression in condition.– If above a threshold, select the condition.

• Do the same for resulting condition set.• Repeat until convergence of gene set.

GE

E

Page 5: Exhaustive Signature Algorithm

ISA - drawbacks

• Initial gene set should be given.• Few biclusters for specific parameter value.• Parameter values are hard to optimize.• Expression values aren’t normally distributed.• Genes might not be independent.

Page 6: Exhaustive Signature Algorithm

Exhaustive approach

• Use Bimax algorithm to find seeds.• For each seed apply ISA with random

parameters.• Drop similar seeds while running.• Drop similar biclusters from ISA.• Observation: applying the algorithm

separately for positive and negative values improves results.

Page 7: Exhaustive Signature Algorithm

Bimax algorithm

• Input – expression matrix• Binarize matrix (1 value for b% highest and

lowest values).• Goal – find all submatrices which: – Contain only 1’s.– Are inclusion-maximal.

• Method:– Drop areas in matrix with 0’s only.– Recursively apply Bimax on other areas.

Page 8: Exhaustive Signature Algorithm

Bimax - illustration

1 0 1 1 0 1 0 0

0 0 0 0 1 0 1 1

1 0 0 1 0 1 0 0

0 0 1 1 1 0 1 1

0 1 0 0 0 0 0 1

0 0 0 1 0 1 0 1

1 1 1 0 1 1 0 1

Page 9: Exhaustive Signature Algorithm

Bimax - illustration

1 0 1 1 0 1 0 0

0 0 0 0 1 0 1 1

1 0 0 1 0 1 0 0

0 0 1 1 1 0 1 1

0 1 0 0 0 0 0 1

0 0 0 1 0 1 0 1

1 1 1 0 1 1 0 1

Page 10: Exhaustive Signature Algorithm

Bimax - illustration

1 1 1 1 0 0 0 0

0 0 0 0 1 0 1 1

1 0 1 1 0 0 0 0

0 1 1 0 1 0 1 1

0 0 0 0 0 1 0 1

0 0 1 1 0 0 0 1

1 1 0 1 1 1 0 1

Page 11: Exhaustive Signature Algorithm

Bimax - illustration

1 1 1 1 0 0 0 0

1 0 1 1 0 0 0 0

0 0 0 0 1 0 1 1

0 1 1 0 1 0 1 1

0 0 0 0 0 1 0 1

0 0 1 1 0 0 0 1

1 1 0 1 1 1 0 1

Page 12: Exhaustive Signature Algorithm

Bimax - illustration

1 1 1 1 0 0 0 0

1 0 1 1 0 0 0 0

0 1 1 0 1 0 1 1

0 0 1 1 0 0 0 1

1 1 0 1 1 1 0 1

0 0 0 0 1 0 1 1

0 0 0 0 0 1 0 1

Page 13: Exhaustive Signature Algorithm

Bimax - drawbacks

• Information loss due to binarization.• Binarization parameter is hard to control.• Runtime depends linearly on no. of biclusters.• Usually returns millions of biclusters.• Poor results on real data.

Page 14: Exhaustive Signature Algorithm

Exhaustive Signature Algorithm

• Apply Bimax on the input expression matrix.• Keep biclusters that:– Do not overlap with other biclusters.– Have low p-value w.r.t a bicluster score.

• Sort resulting biclusters by size.• Begin with the largest, apply ISA for each one.• Keep new biclusters that do not overlap with

previous ones.• Stop if more than N biclusters found.

Page 15: Exhaustive Signature Algorithm

ESA – details

• Overlaps – use Jaccard index, take the larger.• Score – average abs. Pearson correlation

between gene pairs.• P-value:– Randomize input matrix using edge shuffling.– Apply ESA on randomized matrix.– Keep score distribution of all biclusters found.– P-value = right tail of score distribution of resulting

biclusters.

Page 16: Exhaustive Signature Algorithm

ESA – details

• Observation: anti-correlated genes usually do not pass enrichment tests simultaneously.

• So apply ESA separately on positive and negative expression values.

• Also change ISA: – For positive run, test: score>threshold– For negative run, test: –score>threshold

Page 17: Exhaustive Signature Algorithm

ESA - experiments

• Apply the algorithms: SAMBA, Bimax, ISA,ESA and ESANP (negative and positive values separately).

• Datasets:– Gasch 2001 (yeast heat shock)– Whitfield 2002 (human cell cycle)

• Evaluation: GO, TF and KEGG enrichment tests

Page 18: Exhaustive Signature Algorithm

Results – Yeast, GO

20 30 40 50 60 70 80 90 1000

5

10

15

20

25

30

35

40

45

50

ESANPSAMBAISABimaxESA

-log(pval)

#Terms

Page 19: Exhaustive Signature Algorithm

Results – Yeast, TF

0 5 10 15 20 25 30 35 40 45 500

2

4

6

8

10

12

14

16

18

20

ESANPSAMBAISABimaxESA

-log(pval)

#TFs

Page 20: Exhaustive Signature Algorithm

Results – Yeast, KEGG

0 5 10 15 20 25 30 35 400

5

10

15

20

25

ESANPSAMBAISABimaxESA

-log(pval)

#PWs

Page 21: Exhaustive Signature Algorithm

Results – Human, GO

0 5 10 15 20 250

5

10

15

20

25

30

35

40

45

50

ESANPSAMBAISABimaxESA

-log(pval)

#Terms

Page 22: Exhaustive Signature Algorithm

Results – Human, KEGG

0 5 10 15 20 25 300

1

2

3

4

5

6

7

8

9

ESANPSAMBAISABimaxESA

-log(pval)

#PWs

Page 23: Exhaustive Signature Algorithm

Conclusions

• ESA exploits both Bimax’s power and ISA’s accuracy.

• ESA avoids ISA’s parameter selection.• ESA avoids ISA’s seed generation.• ESA reduces #biclusters from Bimax.• ESA shows good results on real data.

Page 24: Exhaustive Signature Algorithm

Future work

• Test the algorithm on other datasets.• Initiate binarization parameter automatically.• Evaluate results with other criteria.• Avoid bias towards large biclusters.