introduction to

59
1 Introduction to Bioinformatic s

Upload: baker-orr

Post on 30-Dec-2015

32 views

Category:

Documents


3 download

DESCRIPTION

Introduction to. Bioinformatics. Introduction to Bioinformatics. LECTURE 9: Clustering gene expression * Chapter 9: The genomics of wine-making. 9.1 Chateau Hajji Feruz Tepe * Wine making dates back to at least 5000 BC, based on archeological finds in Iran: Hajji Feruz Tepe. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to

1

Introduction to

Bioinformatics

Page 2: Introduction to

2

Introduction to Bioinformatics.

LECTURE 9: Clustering gene expression

* Chapter 9: The genomics of wine-making

Page 3: Introduction to

3

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

9.1 Chateau Hajji Feruz Tepe* Wine making dates back to at least 5000 BC, based on archeological finds in Iran: Hajji Feruz Tepe .

Overview of Neolithic houses at Hajji Feruz Tepe that yielded six wine jars in the floor along one wall of the room.

Page 4: Introduction to

4

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

* Wine making dates back to at least 5000 BC, based on archeological finds in Iran: Hajji Feruz Tepe .

One of six jars once filled with wine from the Neolithic residence at Hajji Feruz Tepe (Iran).

Chemical analysis of patches of a reddish residue covering the interior of this vessel showed that this originally was resinated wine.

Page 5: Introduction to

5

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

* Recipe for wine making:

1. fruit juice (or other sugar-rich liquid)

2. yeast: Saccharomyces cerevisiae

Page 6: Introduction to

6

Page 7: Introduction to

7

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

Yeast (Saccharomyces cerevisiae) is a unicellular fungus found naturally in grapevines and responsible of wine-making fermenting sugars and producing alchool.

Page 8: Introduction to

8

Page 9: Introduction to

9

Page 10: Introduction to

10

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

From being budded off from its parent cell, to reproducing its own offspring, each yeast cell goes through a number of typical steps that also involve changes in gene expression, turning whole pathways on and off.

Page 11: Introduction to

11

Page 12: Introduction to

12

Page 13: Introduction to

13

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 14: Introduction to

14

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

Remember, a gene is an on-off switch and RNa and proteins are messengers between the genes.

If a gene is ‘on’ the gene is ‘expressed’. The degree to which the gene is expressed is called the expression level of the gene.

If a gene is off, it can be said that it has expression level zero.

Page 15: Introduction to

15

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

Today the study of such phenomena is possible through the technology of microarray that can measure the expression level of every gene in a cell.

With the gene expression data, genes can be clustered on the basis of the similarity of their expression profiles.

Page 16: Introduction to

16

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

* With water, sugar and flour, yeast ferments the sugars in the dough and produces carbon dioxide CO2 (this causes the dough to rise). In this process it produces alcohol as a by-product (originally perhaps as near-toxic protection!).

* When the sugar supply is exhausted S. cerevisiae must find a new source of energy: when oxygen is available it shifts to respiration: alcohol now becomes the source of energy.

* This state change is called the diauxic shift

Page 17: Introduction to

17

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

* S. cerevisiae is (one of) the most studied organism in biology

* S. cerevisiae is a complex unicellular Eukaryote

* 12.5 Mbp genome in 16 linear chromosomes (except mitochondriae) containing 6400 genes (2000 more than E. coli).

Page 18: Introduction to

18

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

Page 19: Introduction to

19

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

* S. cerevisiae can be regarded as a complex factory transforming many raw materials to final materials, involving many ‘conveyor belts’ between the genes

* Such a conveyor belt of coupled expressed genes is called a genetic pathway

* The diauxic shift means that the whole system has to be transformed from the old process to the new process, meaning that entire new pathways are formed, and old pahways are shut-off.

Page 20: Introduction to

20

Introduction to Bioinformatics9.1 CHATEAU HAJJI FERUZ TEPE

* Therefore it is usefull to monitor the genome-wide expression of S. cerevisiae in time, including the diauxic shift.

* Such a conveyor belt of coupled expressed genes is called a genetic pathway

* This monitoring can be done with microarrays, the foremost important tools in bioinformatics.

* Other dynamical processes as the Cell Cycle can also be studied with microarrays.

* This requires the data analysis of the microarrays – here we study the clustering of expression profiles: time series of expression levels.

Page 21: Introduction to

21

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

9.2 Monitoring cellular communication

* Purpose of microarrays: snap-shot of the expression levels in the cell.

* Expressed gene = DNA → mRNA → proteins ….

* In the cell therefore expressed genes cause high numbers of mRNA molecules.

* Idea of microarrays: measure the concentrations of mRNA, and reverse-compute the DNA belonging to this mRNA.

* As RNA can be spliced due to exons, the backward computed DNA is not entirely equal to the real DNA: it is called cDNA: complementary DNA.

Page 22: Introduction to

22

Introduction to Bioinformatics9.2 MONITORING CELLULAR COMMUNICATION

* The cDNA computed from mRNA hints to an expressed gene, the cDNA is stored as an EST: Expressed Sequence Tag.

* EST sequencing can identify genes that are ‘missed’ with ab initio gene-finding methods, such as ORF-finder.

Page 23: Introduction to

23

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

9.3 Microarray technologies

* A microarray is an array of sensitive spots, each containing a stretch of DNA, e.g. based on an EST

* Hybridization (=chemical binding) of the DNA with components in the substrate indicates the presence of the associated mRNA

* The hybridization can be made visible by inserting fluoriscent molecules on the DNA (red, green) and later illuminating them with a suitable laser

Page 24: Introduction to

24

Page 25: Introduction to

25

Until recently we lacked tools to observe genome-wide expression

1989 saw the introduction of the microarray technique by Stephen Fodor

But only in 1992 this technique became generally available – but still very costly

Page 26: Introduction to

26Stephen Fodor

Microarray

Microarray-developper

developped microarray

Page 27: Introduction to

27

Introduction to Bioinformatics9.3 MICROARRAY TECHNOLOGIES

Page 28: Introduction to

28

Introduction to Bioinformatics9.3 MICROARRAY TECHNOLOGIES

Example of an Affymetrix microarray simulation. Example of the simulated single-channel oligonucleotide microarray slide image (crop from top left corner) (a). We have used an Affymetrix .cel file as the ground truth data. Thus the text about the slide type is observable. Real Affymetrix slide image is shown for comparison (b).

Page 29: Introduction to

29

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

9.4 The diauxic shift and yeast gene expression

* In 1997 DeRisi et alum used microarrays to measure the genome-wide expression on S. cerevisiae during the diauxic shift.

* 9 initial hours of growth, 6 hours before the diauxic shift, and 6 hour there after.

* They compared the mRNAs in the array at t time-steps before the diauxic shift, and compared those with the mRNA-levels at time 0.

Page 30: Introduction to

30

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

* This experiment gave a set of 43.000 ratios: seven time-points (t1, t2,…, t7) of 6400 gene expression levels normalized o their start value.

* This is the reference design in microarray literature

Page 31: Introduction to

31

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

* This experiment typically provides a time series that is small relative to the size of the genome ; here m=7 timepoints for n=6400 genes.

* This is due to the cost of an array: ~ 1000 euro/array

* With this kind of experiment we can in principle also reconstruct the gene regulatory networks

Page 32: Introduction to

32

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

9.4.1 Data Description

* First analyse the relative change in activity

* Less than 5% of the genes change more than 1.5-fold, or less then 0.67-fold.

* fold-change: f = new_value/old_value; if f > 1 the fold-chance is f, if f < 1 then the fold-change is – 1/f

* Example: x0 = 1, x1 = 0.3333, fold-change is -3, x0 = 1, x1 = 3, fold-change is +3.

Page 33: Introduction to

33

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

9.4.1 Data Description

* Now select only those genes with an absolute fold-change above a certain threshold:

abs(fold-change) > threshold

Page 34: Introduction to

34

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

9.4.1 Data Clustering

* Next, cluster the genes relative to their expression levels.

* High intra-cluster similarity and low inter-cluster similarity.

* Use a distance/similarity measure and a clustering algorithm.

Page 35: Introduction to

35

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Data Clustering

1. Define a suitable Distance Measure d(x1,x2), e.g. Pearson’s correlation coefficient, or a normalized distance like the Mahalanobis distance, or a metric like the generalized p-norm.

2. Define a clustering criterion, e.g.:

C = ∑ij in same cluster dij - ∑ij in different cluster dij.

3. Apply a suitable clustering algorithm, e.g. hierarchical, or K-means clustering.

Page 36: Introduction to

36

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Hierarchical clustering

Page 37: Introduction to

37

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

K-means clustering

Page 38: Introduction to

38

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Gene function and Clustering

1. Genes with similar expression profiles have similar functions.

2. Define a clustering criterion, e.g.:

C = ∑ij in same cluster dij - ∑ij in different cluster dij.

3. Apply a suitable clustering algorithm, e.g. hierarchical, or K-means clustering.

Page 39: Introduction to

39

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Gene function and Clustering

1. Single linkage = min i,j ||x[i] – y[j]||.

2. Average linkage = mean i,j ||x[i] – y[j]||.

3. Centroid distance: dAB = ||mA – mB||

Page 40: Introduction to

40

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

9.4.3 Data Visualisation

* In a tree using Hierarchic clustering.

* In a plane using MDS

Page 41: Introduction to

41

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Gene function and Clustering

2. Multi Dimensional Schaling

Page 42: Introduction to

42

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Gene function and Clustering

1. Hierarchical clustering: level of cut-off

Page 43: Introduction to

43

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Pre-processing

* Select only genes with ‘enough’ fold-change

* Delete missing values

Page 44: Introduction to

44

Page 45: Introduction to

45

Page 46: Introduction to

46

Page 47: Introduction to

47

Introduction to Bioinformatics9.4 THE DIAUXIC SHIFT AND YEAST GENE EXPRESSION

Page 48: Introduction to

48

Heatmap timesteps →

ge

ne

in h

iera

rch

ica

l clu

ste

r →

Page 49: Introduction to

49

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 50: Introduction to

50

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

9.5 CASE STUDY: Cell-cycle regulated genes

* A set of microarrays over the cell-cycle of yeast.

Page 51: Introduction to

51

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

From being budded off from its parent cell, to reproducing its own offspring, each yeast go through a number of typical step that also involve changes in gene expression, turning whole pathways on and off.

Page 52: Introduction to

52

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Here we examine the expressions of the entire yeast genome through two rounds of the cell cycle.

The temporal expression of genes are measured by microarray at 24 time points every five hours. In detail we have the expression profile of about 6400 genes.

Page 53: Introduction to

53

Introduction to Bioinformatics9.5 THE CELL CYCLE

Page 54: Introduction to

54

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 55: Introduction to

55

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 56: Introduction to

56

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 57: Introduction to

57

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 58: Introduction to

58

Introduction to BioinformaticsLECTURE 9: CLUSTERING GENE EXPRESSION

Page 59: Introduction to

59

END of LECTURE 9