small sample aic lecture, slides only

69
Information-theoretic analysis of -omics data An introduction David R. Bickel University of Ottawa 17 November 2008 David Bickel (uOtta wa) Infor mation theory 17 November 2008 1 / 11

Upload: davidbickelcom9925277

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 1/69

Information-theoretic analysis of -omics dataAn introduction

David R. Bickel

University of Ottawa

17 November 2008

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 1 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 2/69

Today’s class

Di¤erential gene/protein/metabolite expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 3/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 4/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 5/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

Medical: drug or chemotherapy applied to some patients

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 6/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

Medical: drug or chemotherapy applied to some patientsBasic: hormone or other chemical added to some cell cultures

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 7/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

Medical: drug or chemotherapy applied to some patientsBasic: hormone or other chemical added to some cell culturesOther examples?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 8/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

Medical: drug or chemotherapy applied to some patientsBasic: hormone or other chemical added to some cell culturesOther examples?

How much information or evidence is in the measurements

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 9/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

Medical: drug or chemotherapy applied to some patientsBasic: hormone or other chemical added to some cell culturesOther examples?

How much information or evidence is in the measurements

for di¤erential expression?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 10/69

Today’s class

Di¤erential gene/protein/metabolite expression

Which genes express di¤erently between treatment and control?Examples of "treatments"

Medical: drug or chemotherapy applied to some patientsBasic: hormone or other chemical added to some cell culturesOther examples?

How much information or evidence is in the measurements

for di¤erential expression?for equivalent expression?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 2 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 11/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 12/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 13/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 14/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 15/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 16/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

Pi k h di¤ i ll d

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 17/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the truedistribution

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

Pi k h di¤ i ll d

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 18/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the truedistribution

Gene expression ratios measured by microarrays

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

Pi k h di¤ i ll d

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 19/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the truedistribution

Gene expression ratios measured by microarrays

A sample from the treatment group and a sample from the control

group are hybridized to the same microarray slide

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

Pi k th di¤ ti ll d

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 20/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the truedistribution

Gene expression ratios measured by microarrays

A sample from the treatment group and a sample from the control

group are hybridized to the same microarray slideEach gene’s expression ratio is a measurement of its expression in thetreatment group relative to its expression in the control group

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

Pi k th di¤ ti ll d

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 21/69

Pick the di¤erentially expressed genes

What is di¤erential gene/protein/metabolite expression?

An average expression ratio of 1 indicates equivalent expressionTwo types of di¤erential expression

An average expression ratio less than 1 indicates under-expressionAn average expression ratio greater than 1 indicates over-expression

"Average expression" is over the population, not just the observed data

The histogram of a large expression data set resembles the truedistribution

Gene expression ratios measured by microarrays

A sample from the treatment group and a sample from the control

group are hybridized to the same microarray slideEach gene’s expression ratio is a measurement of its expression in thetreatment group relative to its expression in the control group

Based on the expression data, which genes are di¤erentially expressed?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 3 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 22/69

data set #1 data set #2 data set #4 data set #6

data (n = 3)data (n = 6)model (n = 3)model (n = 6)evidence (n = 3)

evidence (n = 6)For each data set, indicate whether the gene is equivalently expressed (E)or di¤erentially expressed (D) according to the plot of the data, accordingto the model , and according to the evidence  for each number of observations (3 or 6). Equivalent expression means the average  expression

ratio is 1.

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 4 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 23/69

Statistical models

p  stands for the number of unknown parameters in a model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 24/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 25/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 26/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 27/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

One unknown parameter (p = 1)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 28/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

One unknown parameter (p = 1)

Di¤erential expression model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 29/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

One unknown parameter (p = 1)

Di¤erential expression model

Unknown variability of expression

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 30/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

One unknown parameter (p = 1)

Di¤erential expression model

Unknown variability of expressionUnknown expression ratio

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 31/69

Statistical models

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

One unknown parameter (p = 1)Di¤erential expression model

Unknown variability of expressionUnknown expression ratio

Two unknown parameters (p = 2)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Statistical models

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 32/69

p  stands for the number of unknown parameters in a model

Equivalent expression model

Unknown variability of expressionExpression ratio known to be 1

One unknown parameter (p = 1)Di¤erential expression model

Unknown variability of expressionUnknown expression ratio

Two unknown parameters (p = 2)

How do the model plots change your initial assessments?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 5 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 33/69

g p y

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 34/69

g p y

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

More complex models tend to …t data better than simple models,even if the simple models are better

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 35/69

g p y

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

More complex models tend to …t data better than simple models,even if the simple models are better

Overly complex models make poor generalizations

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 36/69

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

More complex models tend to …t data better than simple models,even if the simple models are better

Overly complex models make poor generalizationsA sample of patients may not represent the population

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 37/69

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

More complex models tend to …t data better than simple models,even if the simple models are better

Overly complex models make poor generalizationsA sample of patients may not represent the populationA single experiment may not re‡ect typical biological processes

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 38/69

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

More complex models tend to …t data better than simple models,even if the simple models are better

Overly complex models make poor generalizationsA sample of patients may not represent the populationA single experiment may not re‡ect typical biological processes

Fit

Complexity

= Evidence

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Balancing complexity and …t

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 39/69

The di¤erential expression model (p = 2) is more complex than theequivalent expression model (p = 1)

More complex models tend to …t data better than simple models,even if the simple models are better

Overly complex models make poor generalizationsA sample of patients may not represent the populationA single experiment may not re‡ect typical biological processes

Fit

Complexity

= Evidence

How does balancing …t with complexity change your assessments?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 6 / 11

Quality of model …t to the data

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 40/69

n = sample size

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 7 / 11

Quality of model …t to the data

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 41/69

n = sample size

number of measured expression ratios

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 7 / 11

Quality of model …t to the data

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 42/69

n = sample size

number of measured expression ratios

MSE = mean of squared errors of the model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 7 / 11

Quality of model …t to the data

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 43/69

n = sample size

number of measured expression ratios

MSE = mean of squared errors of the model

degree to which the model disagrees with the observed data (log scale)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 7 / 11

Quality of model …t to the data

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 44/69

n = sample size

number of measured expression ratios

MSE = mean of squared errors of the model

degree to which the model disagrees with the observed data (log scale)

Fit =

1p MSE

n

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 7 / 11

Quality of model …t to the data

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 45/69

n = sample size

number of measured expression ratios

MSE = mean of squared errors of the model

degree to which the model disagrees with the observed data (log scale)

Fit =

1p MSE

n

degree to which the model …ts the observed data (assuming a normal

distribution)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 7 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 46/69

n = sample size

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 47/69

n = sample size

number of measured expression ratios

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 48/69

n = sample size

number of measured expression ratios

p = model dimension

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 49/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 50/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 51/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 52/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p c  = p +p (p + 1)

2 (n p + 1)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 53/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p c  = p +p (p + 1)

2 (n p + 1)

e¤ective number of parameters in the model (corrected for small n)

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 54/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p c  = p +p (p + 1)

2 (n p + 1)

e¤ective number of parameters in the model (corrected for small n)

Complexity = 2.718p c 

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

Model complexity

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 55/69

n = sample size

number of measured expression ratios

p = model dimension

number of unknown parameters in the model

p = 1 for the equivalent expression model

p = 2 for the di¤erential expression model

p c  = p +p (p + 1)

2 (n p + 1)

e¤ective number of parameters in the model (corrected for small n)

Complexity = 2.718p c 

Fit

Complexity= Evidence

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 8 / 11

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 56/69

Answers

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 57/69

How do our analyses compare to the truth?

If a statistical method says an equivalently expressed gene isdi¤erentially expressed, is the method useless?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 9 / 11

Answers

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 58/69

How do our analyses compare to the truth?

If a statistical method says an equivalently expressed gene isdi¤erentially expressed, is the method useless?If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 9 / 11

Answers

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 59/69

How do our analyses compare to the truth?

If a statistical method says an equivalently expressed gene isdi¤erentially expressed, is the method useless?If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?The advantage of obtaining more data

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 9 / 11

Answers

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 60/69

How do our analyses compare to the truth?

If a statistical method says an equivalently expressed gene isdi¤erentially expressed, is the method useless?If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?The advantage of obtaining more data

The best possible assessment given the available data

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 9 / 11

Answers

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 61/69

How do our analyses compare to the truth?

If a statistical method says an equivalently expressed gene isdi¤erentially expressed, is the method useless?If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?The advantage of obtaining more data

The best possible assessment given the available data

How con…dent should you be in your assessments?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 9 / 11

Answers

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 62/69

How do our analyses compare to the truth?

If a statistical method says an equivalently expressed gene isdi¤erentially expressed, is the method useless?If a statistical method says a di¤erentially expressed gene is

equivalently expressed, is the method useless?The advantage of obtaining more data

The best possible assessment given the available data

How con…dent should you be in your assessments?

Should you obtain more data before making an assessment?

David Bickel (uOttawa) Information theory 17 N ovem ber 2008 9 / 11

The expression data sets

data set #1 data set #2 data set #4 data set #6

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 63/69

# # # #

ratio 1 2 1 1.4

expression equivalent di¤erential equivalent di¤erential

n = 10 0.

44/1.

38 0.

14/0.

09 0.

14/0.

17 0.

19/0.

37

n = 25 0.29/0.71 0.03/0.002 4.77/1.00 0.05/0.04

n = 100 36/691 104

2 10716/32 0.03/0.01

Key

n is the number of observed expression ratios.

Each ratio is Evidence di¤erentially expressedEvidence equivalently expressed , the weight of evidence

favoring di¤erential expression over equivalent expression.

* misleading evidence for di¤erential expression

** misleading evidence for equivalent expressionDavid Bickel (uOttawa) Information theory 17 November 2 00 8 10 / 1 1

Further study

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 64/69

The method presented is based on the Akaike information criterion(AIC) after correcting it for small numbers of measurements

David Bickel (uOttawa) Information theory 17 N ove mb er 2008 11 / 11

Further study

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 65/69

The method presented is based on the Akaike information criterion(AIC) after correcting it for small numbers of measurements

AICc  = 2 ln (Evidence)

David Bickel (uOttawa) Information theory 17 N ove mb er 2008 11 / 11

Further study

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 66/69

The method presented is based on the Akaike information criterion(AIC) after correcting it for small numbers of measurements

AICc  = 2 ln (Evidence)Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)

David Bickel (uOttawa) Information theory 17 N ove mb er 2008 11 / 11

Further study

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 67/69

The method presented is based on the Akaike information criterion(AIC) after correcting it for small numbers of measurements

AICc  = 2 ln (Evidence)Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)Kenneth Burnham and David Anderson, Model Selection and 

Multi-Model Inference 

David Bickel (uOttawa) Information theory 17 N ove mb er 2008 11 / 11

Further study

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 68/69

The method presented is based on the Akaike information criterion(AIC) after correcting it for small numbers of measurements

AICc  = 2 ln (Evidence)Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)Kenneth Burnham and David Anderson, Model Selection and 

Multi-Model Inference 

These slides and …gures will be on the lab website

David Bickel (uOttawa) Information theory 17 N ove mb er 2008 11 / 11

Further study

8/14/2019 Small sample AIC lecture, slides only

http://slidepdf.com/reader/full/small-sample-aic-lecture-slides-only 69/69

The method presented is based on the Akaike information criterion(AIC) after correcting it for small numbers of measurements

AICc  = 2 ln (Evidence)Software packages with the AIC but without the correction may be

unreliable for small numbers of observations (n < 40)Kenneth Burnham and David Anderson, Model Selection and 

Multi-Model Inference 

These slides and …gures will be on the lab website

www.statomics.com

David Bickel (uOttawa) Information theory 17 N ove mb er 2008 11 / 11