2 outline review of major computational approaches to facilitate biological interpretation of ...

Post on 22-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 8: Biological Knowledge Assembly and Interpretation

Ju Han Kim Division of Biomedical Informatics, Seoul National University College of Medicine,

Seoul, Korea,

Presenter: Zhen Gao

2

Outline

Review of major computational approaches to facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

3

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

4

FAA: Functional Annotation Analysis GO: Gene Ontology Pathway DEG: Differentially Expressed Genes GSEA: Gene Set Enrichment Analysis Biological Interpretation and Biological

Semantics Concept lattice analysis

Glossary

Pathway and Ontology-Based Analysis

GO and biological pathway-based analysis: one of the most powerful methods for inferring

the biological meanings of expression changes list of genes obtained by:

differential expression analysis co-expression analysis (or clustering)

6

Pathway and Ontology-Based Analysis

7

8

Attributes can be applied for FAA:

transcription factor binding clinical phenotypes like disease associations MeSH (Medical Subject Heading) terms microRNA binding sites protein family memberships chromosomal bands, etc GO terms biological pathways

Pathway and Ontology-Based Analysis

9

Features may have their own ontological

structures

GO has a structure as a DAG (Directed Acyclic Graph)

Pathway and Ontology-Based Analysis

10

DEGs:

Pathway and Ontology-Based Analysis

11

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

12

DEGs: 3 techniques which help obtain DEGs:

t-test Wilcoxon’s rank sum test ANOVA

Need to note that multiple-hypothesis-testing problem should be properly managed

Pathway and Ontology-Based Analysis

13

Co-expression analysis

Pathway and Ontology-Based Analysis

14

Co-expression analysis

puts similar expression profiles together and different ones apart

Returning genes that are assumed to be co-regulated

Clustering algorithms: hierarchical-tree clustering partitional clustering

Pathway and Ontology-Based Analysis

15

Pathways are powerful resources for the

understanding of shared biological processes E.g.: KEGG, MetaCyc and BioCarta (signaling

pathways)

Pathway and Ontology-Based Analysis

16

MetaCyc:

an experimentally determined non-redundant metabolic pathway database

It is the largest collection containing over 1400 metabolic pathways

Pathway and Ontology-Based Analysis

17

Ontology / GO:

providing a shared understanding of a certain domain of information

controlled vocabularies

DAG structures with 3 vocabularies of GO: Molecular Function (MF) Cellular Compartment (CC) Biological Process (BP)

Pathway and Ontology-Based Analysis

18

Common Gos:

MIPS: integrated source, protein properties, variety of complete genomes

MeSH: clinical including disease names OMIM (Online Mendelian Inheritance in Man) UMLS (Unified Medical Language System)

Pathway and Ontology-Based Analysis

19

GO enrichment test: For example

if 20% of the genes in a gene list are annotated with a GO term ‘apoptosis’

only 1% of the genes in the whole human genome fall into this functional category

Pathway and Ontology-Based Analysis

20

Common statistical tests:

Chi-square binomial hypergeometric tests

Pathway and Ontology-Based Analysis

21

hypergeometric test:

Pathway and Ontology-Based Analysis

22

Avoid pitfalls when using hypergeometric test

Choice of background, that makes substantial impact on the result. All genes having at least one GO annotation all genes ever known in genome databases all genes on the microarray

GO has a hierarchical tree (or graphical) structure while hypergeometric test assumes independence of categories

Pathway and Ontology-Based Analysis

23

Common Tools

DAVID ArrayX- Path Pathway Miner EASE GOFish GOTree etc.

Pathway and Ontology-Based Analysis

24

25

Gene Set-Wise Differential Expression Analysis

26

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

27

Evaluates coordinated differential expression

of gene groups

Gene Set Enrichment Analysis (GSEA) The first developed in this category evaluates for each a pre-defined gene set the

significant association with phenotypic classes

Gene Set-Wise Differential Expression Analysis

28

Difference between FAA and GSEA:

FAA: find over-represented GO terms from a interesting gene list

GSEA: obtain the pre-defined gene list first and test the changes under different conditions.

Gene Set-Wise Differential Expression Analysis

29

30

Advantages of gene set-wise differential expression

analysis: successfully identified modest but coordinated

changes in gene expression that might have been missed by conventional ‘individual gene-wise’ differential expression analysis.

(many tiny expression changes can collectively create a big change)

straightforward biological interpretation because the gene sets are defined by biological knowledge

Gene Set-Wise Differential Expression Analysis

31

Enrichment Score (ES) is calculated by evaluating the

fractions of genes in S (‘‘hits’’) weighted by their correlation and the fractions of genes not in S (‘‘misses’’) present up to a given position i in the ranked gene list, L, where N genes are ordered according to the correlation,

Gene Set-Wise Differential Expression Analysis

32

Typical gene sets:

regulatory-motif function-related disease-related sets

Database: MSigDB:

6769 gene sets classified into five different collections Has some interesting extensions

Gene Set-Wise Differential Expression Analysis

33

Differential Co-Expression Analysis

34

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

35

Co-expression analysis:

determines the degree of co-expression of a cluster of genes under a certain condition

Differential co-expression analysis: determines the degree of co-expression difference of a

gene pair or a gene cluster across different conditions

Differential Co-Expression Analysis

36

3 major types:

(a) differential co-expression of gene cluster(s) (b) gene pair-wise differential co- expression (c) differential co-expression of paired gene sets

Differential Co-Expression Analysis

37

38

Type (a), identify differentially co-expressed gene

cluster(s) between two conditions Let conditions and genes be denoted by J and I,

respectively. The mean squared residual of model is a measurement of co-expression of genes:

Differential Co-Expression Analysis

39

Differential Co-Expression Analysis

Type (a) cont.

40

Type (b)

Differential Co-Expression Analysis

41

Type (b), identify differentially co-expressed gene pairs

Techniques: F-statistic A meta-analytic approach

Differential Co-Expression Analysis

42

Note that identification of differentially co-expressed

gene clusters or gene pairs usually do not use a pre-defined gene sets or pairs.

Thus the interpretation may also be improved by ontology and pathway-based annotation analysis.

Differential Co-Expression Analysis

43

Type (c), dCoxS (differential co-expression of gene sets)

algorithm identifies gene set pairs differentially co-expressed across different conditions

Biological pathways can be used as pre-defined gene sets and the differential co-expression of the biological pathway pairs between conditions is analyzed.

Differential Co-Expression Analysis

44

Type (c) cont.

To measure the expression similarity between paired gene-sets under the same condition, dCoxS defines the interaction score (IS) as the correlation coefficient between the sample-wise entropies. Even when the numbers of the genes in different pathways are different, IS can always be obtained because it uses only sample-wise distances regardless of whether the two pathways have the same number of genes or not.

Differential Co-Expression Analysis

45

Type (c) cont.

Differential Co-Expression Analysis

46

Biological Interpretation and Biological Semantics

47

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

48

Biomedical semantics provides rich descriptions for

biomedical domain knowledge.

Motivation for Biological Semantics: GO has limitations:

The result of GO is typically a long unordered list of annotations

Most of the analysis tools evaluate only one cluster at a time time-consuming to read the massive annotation lists hard to manually assemble Many annotations are redundant

Biological Interpretation and Biological Semantics

49

Introducing BioLattice:

a mathematical framework based on concept lattice analysis organize traditional clusters and associated annotations

into a lattice of concepts A graphical summary considers gene expression clusters as objects and

annotations as attributes

Thus, complex relations among clusters and annotations are clarified, ordered and visualized.

Biological Interpretation and Biological Semantics

50

Another advantage of BioLattice is that heterogeneous

biological knowledge resources can be added

Biological Interpretation and Biological Semantics

51

52

Tool to construct BioLattice:

The Ganter algorithm http:// www.snubi.org/software/biolattice/

Biological Interpretation and Biological Semantics

53

54

Review of major computational approaches to

facilitate biological interpretation of high-throughput microarray and RNA-Seq experiments.

Conclusion

55

Input: Microarray / RNA seq

DEG: Differentially Expressed Genes

co-expression / clustering

Gene Set-Wise Differential Expression Analysis

Differential Co-Expression Analysis

Interest gene, genes list, gene pair or gene list pair

FAA: Functional Annotation Analysis:Gene Ontology (GO) or Pathway analysis

Gene list with annotations

Visualization, sematic assembling and knowledge learning:Concept lattice analysis : BioLattice

56

top related