differential gene expression with the limma package

21
Differential Gene Expression with the limma package 20 March 2012 Functional Genomics

Upload: russ

Post on 24-Feb-2016

102 views

Category:

Documents


2 download

DESCRIPTION

Differential Gene Expression with the limma package. 20 March 2012 Functional Genomics. Linear regression. Fit a straight line through a set of points such that the distance from the points to the line is minimized. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Differential Gene Expression with the  limma  package

Differential Gene Expression with the limma package

20 March 2012Functional Genomics

Page 2: Differential Gene Expression with the  limma  package

Linear regression• Fit a straight line through a set of points such

that the distance from the points to the line is minimized The slope of the line is

adjusted to minimize the squares of the vertical distance of the points from the line.

The line represents the model, the distances between the points and the line are the residuals.

The simple regression minimizes the sum of the squares of the residuals…this is the method of least squares.

Page 3: Differential Gene Expression with the  limma  package

Y = Y0 + β Z

Y is expression of gene XY0 is mean expression of normal tissue, t β is difference of expression of normal, compared to tumor, tissueZ is group variable (0 for normal; 1 for tissue)

Assume you have a data set of gene expression in tumor vs normal tissue.

This is a simple mathematical expression of what is being calculated for a linear model.

Page 4: Differential Gene Expression with the  limma  package

Multivariate linear regression

Y = Y0 + β Z + ϒ W

Y is expression of gene XY0 is mean expression of normal t β is difference of expression of normal, compared to tumor, tissueZ is group variable (0 for normal; 1 for tissue)

ϒ = age affectW = age group

Suppose you have another variable…such as age…you can add that right in!

Page 5: Differential Gene Expression with the  limma  package

Multivariate linear regression

Y = Y0 + β Z + ϒW + δZ*W

Y is expression of gene XY0 is mean expression of normal tβ is difference of expression of normal, compared to tumor, tissueZ is group variable (0 for normal; 1 for tissue)

ϒ = age affectW = age group

Add a component to look for age by tissue interaction effects: δZ*W

You can ask for differences in gene expression due to tissue, due to age, and due to an age by tissue interaction.

Page 6: Differential Gene Expression with the  limma  package

limma

• R package for differential gene expression that uses linear modeling for each gene in your data set

• Expression data will be log-intensity values for Affy data

• Designed to be used in conjunction with the affy package

Page 8: Differential Gene Expression with the  limma  package

limma checklist

• Assumes you’ve done an experiment and have CEL files (if you’ve done single color Affy arrays)

• Assumes you have data/information about the arrays (Targets)

• Assumes you have normalized your data and have an exprSet object

Page 9: Differential Gene Expression with the  limma  package

Name FileName TargetMT1 MTP1_Ackerman.CEL MTMT2 MTP2_Ackerman.CEL MTMT3 MTP3_Ackerman.CEL MTWT1 WTP1_Ackerman.CEL WTWT2 WTP2_Ackerman.CEL WTWT3 WTP3_Ackerman.CEL WT

This is my targets file for limma using the Ackerman data.

Note that I renamed the CEL files compared to what was originally in my home directory.

Page 10: Differential Gene Expression with the  limma  package

new('exprSet',exprs = ...., # Object of class matrixse.exprs = ...., # Object of class matrixphenoData = ...., # Object of class phenoDataannotation = ...., # Object of class characterdescription = ...., # Object of class MIAMEnotes = ...., # Object of class character) Slotsexprs:

Object of class "matrix" The observed expression levels. This is a matrix with columns representing patients or cases and rows representing genes.

se.exprs:Object of class "matrix" This is a matrix of the same dimensions as exprs which contains standard error estimates for the estimated expression levels.

phenoData:Object of class "phenoData" This is an instance of class phenoData containing the patient (or case) level data. The columns of the pData slot of this entity represent variables and the rows represent patients or cases.

annotationA character string identifying the annotation that may be used for the exprSet instance.

description:Object of class "MIAME". For compatibility with previous version of this class description can also be a "character". The clase characterOrMIAME has been defined just for this.

notes:Object of class "character" Vector of explanatory text

http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/Biobase/html/exprSet-class.html

ExpressionSet object

slotNames()

Page 11: Differential Gene Expression with the  limma  package

Running limma

• Need to create an exprSet object using the affy package– Or some other method…depends on the array platform

• Need a design matrix– Representation of the different RNA targets which have

been hybridized to the array• Can have a contrast matrix

– Uses information in the design matrix to do comparisons of interest

– Don’t always need a contrast matrix…..

Page 12: Differential Gene Expression with the  limma  package

library(affy)library(limma)library(makecdfenv)

Array.CDF = make.cdf.env("MoGene-1_0-st-v1.cdf")CELData=ReadAffy()CELData@cdfName="Array.CDF"

slotNames(CELData)pData(CELData)

eset=rma(CELData)pData(eset)

strain=c("MT","MT","MT","WT","WT","WT")design=model.matrix(~factor(strain))colnames(design)=c("MT","WT")

fit=lmFit(eset,design)fit=eBayes(fit)options(digits=2)topTable(fit, coef=2, n=40, adjust="BH")

Page 13: Differential Gene Expression with the  limma  package
Page 14: Differential Gene Expression with the  limma  package

Time Series

Page 15: Differential Gene Expression with the  limma  package

• Differential gene expression methods don’t work well for time series – Assumption of independence of observations

doesn’t hold in time series• BETR takes correlations/dependencies into

account to detect changes in gene expression that are sustained over time

• http://bioc.ism.ac.jp/2.5/bioc/html/betr.html• http://bioc.ism.ac.jp/2.5/bioc/vignettes/betr/in

st/doc/betr.pdf

Page 16: Differential Gene Expression with the  limma  package
Page 17: Differential Gene Expression with the  limma  package

Running BETR

• Need a data frame that describes the arrays• Need to specify the conditions/contrasts

Page 18: Differential Gene Expression with the  limma  package

betr() function usage and arguments

Page 19: Differential Gene Expression with the  limma  package

The file describes a three time point time series of diaphragm development.

This annotation file has the list of CEL files, associates them with a time point, and indicates which arrays are replicates (must be an event number)

In this example, this file is called “samples3.txt”

These data ARE available in GEOGSE35243

Page 20: Differential Gene Expression with the  limma  package

library(betr)library(affy)library(Biobase)

test = read.AnnotatedDataFrame("samples3.txt", sep="\t", quote="")

test.data = ReadAffy(phenoData=test)

norm.data = rma(test.data)

prob.data=betr(eset=norm.data, twoColor=FALSE, twoCondition=NULL, +timepoint=as.numeric(pData(norm.data)$time), +replicate=as.character(pData(norm.data)$rep), alpha=0.05)

write.table(prob.data, file=”betr_results.txt”, sep=”\t”)

Page 21: Differential Gene Expression with the  limma  package

Next time

• pbx1 assignment…..find location of the probes in another one of the probesets for zebrafish.

• Read limma documentation• Run limma on your data set• Be sure you have your Galaxy account set up