microarray gene expression data analysis a.venkatesh cbbl functional genomics chapter: 07

17
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07

Upload: hubert-oconnor

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Microarray Gene Expression Data Analysis

A.VenkateshCBBL

Functional Genomics

Chapter: 07

7.1 Introduction

7.2 Normalization of Microarray of gene expression data.

7.3 Data Analysis.

7.4 Identification of Differential Expressed Genes.

7.5 Identification of Co-expressed Genes.

7.6 Application for pathway inference.

7.7 Summary

Contents

Microarray technology provides a powerful tool that enables researchers to observe simultaneously mRNA expression levels of thousands of genes.

The expression data can observe directly which genes are differentially expressed under a particular experimental condition.

Thus, indicating possible functional connections between the inputs and certain components of the gene network.

The Gene network may include protein-protein interaction using two hybrid system.

Introduction

Linear models have often been used to model multiplicative errors.

One simple technique for achieving this is through visualizing the two-dimensional (2D) scatter plot

In which signal intensities under two different conditions are plotted on a two-dimensional plane with the two intensity values representing the x-axis and y-axis coordinates.

Normalization of Microarray of gene expression

Deviation from this symmetry generally indicates systematic errors.

The same can be said about the overall intensity values of the two channels from the same microarray slide.

To “correct” systematic errors, one generally needs to model the relationship between the correct data and the erroneous data.

These models could be either linear or nonlinear models, depending on the complexity of such a relationship.

Linear models have often been used to model multiplicative errors and addictive errors

The goal here is to compare a series of expression data for the same gene under different conditions. The different time points, and to decide how the gene’s expression levels change as a function of time or biological conditions.

“Artificially” adjusting the expression levels clearly will not make this analysis any easier.

One way to minimize the effects of error correction is to adjust the intensity values on both x and y data sets simultaneously. can multiply the x intensities all by √a and divide all y values by √a.

An error corrections has achieved and “minimized” the effects of incorrect adjustments.

There are many different approaches to data transformation, among which the mostcommonly used in the microarray field is to take the logarithm of the expression data,mainly for the following reasons.

First, the variation of log-transformed intensities and log transformed ratios of intensities is less dependent on absolute magnitude.

Log transformation could equalize variability in the wildly varying microarray raw data.Second, log transformation could even out highly skewed distributions and thus bring the data closer to a normal distribution.

Data Analysis

Data Transformation

Principal component analysis (PCA) is a multivariate technique for examiningrelationships among a group of data points in Euclidean space.

It has been widely used in the analysis of gene expression data for various purposes, including the identification of outliers in a data set, reduction of dimensionality.

The basic idea of k principal component analysis is to find an orthogonal transformation of multidimensional data points in a k-dimensional space that would maximize the scattering of projections of the data points in the new space.

Principle component analysis

Identification of Differentially Expressed Gene

When analyzing microarray data, one needs to understand what contributes to theobserved data.

Schematic of expression profiles of three genes (represented by square, closed circle, and diamond symbols) under six different conditions or over six time points.

•The basic idea of data clustering is to partition a data set into non overlapping subsets (or clusters) such that data points of the same cluster are “highly” correlated, whereas data points from different clusters are not.

•Four classes of genes with distinct gene expression profiles for each class of genes.

Basics of Gene Expression Data Clustering

Schematic of two types of data clustering problems. The x- and y-axes represent expression levels at two different time points (or under two different conditions).

(a) Data set with two apparent clusters. (b) Three data clusters in a noisy background.

There are a number of popular clustering techniques for gene expression data.

They include K-means clustering, hierarchical clustering and self-organizing maps.

The following is a list of a few available computer software programs for geneexpression and other data clustering, based on (or including) the K-means algorithm.

1. Gene Spring2. Spotfire3. Expression Profiler

Clustering of Gene Expression Data

The hierarchical data clustering. (a) Set of data points in two-dimensional space. (b) Representation of a clustering tree of the data set.

The objective of this clustering paradigm is to provide a hierarchical view of a clustering problem at different levels of resolution.

At the highest resolution, every data point forms a cluster by itself.

At the lowest resolution, the whole data set forms one cluster. In between, each cluster at a particular level is formed by merging the two closest clusters at a higher (resolution) level.

Self-organizing Maps Self-organizing maps represent a class of neural networks often used for data clustering.

The SOM approach has a similar objective to that of a K-means approach.

It tries to identify a good representative for a group of nearby (similar) data points and to group data points around these representatives.

SOM

A great deal of information could be revealed about genes through the rationale design of microarray gene expression experiments and sensible data interpretation of gene expression patterns.

The role assignments of genes to a particular gene network are possible when additional information from other sources, like genomic or proteomic data, is available.

A number of major efforts in building public databases for gene expression data are underway, to facilitate the functional inference of genes and gene networks in a systematic manner.

E.g. GEO(Gene Expression Omnibus, NCBI, Saccharomyces cerevasiae.

Summary

Thank you!!!