statistical model for count data speaker : tzu-chun lo advisor : yao-ting huang

Post on 18-Dec-2015

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Statistical model for count dataSpeaker : Tzu-Chun LoAdvisor : Yao-Ting Huang

Outline

•Why use statistical model•Target

▫Gene expression•Binomial distribution

▫Poisson distribution•Over dispersion•Negative binomial

▫Chi-square approximation•Conclusion

Statistics model

•A statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data.

Population

sample Information :

Inference

Make a decision : Hypothesis testing

designer consumer

We have to choose astatistics model for sample(mean, variance)

We

Height, weight, etc.

(mean, variance) size

Target

• Gene expression▫ We like to use statistical model to test an observed difference in read counts is significant.

Look like asignificantregion

How about thisCan we sure ?

Noise or not

Count data

•A type of data in which the observations can

take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•An individual piece of count data is often

termed a count variable.Binomial

Poisson

Negative binomial

All of themare this type

Binomial distribution•The number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.•Notation :

 

Binomial distributionEx : p=0.8 , (1-p)=0.2 , times : 3 , success : 2 (1 1 0) (1 0 1) (0 1 1) f(2)=0.384

33 goals110 shotsin this season

Success : 0.3Fail : 0.7

What is the probabilityif he scored 6 goals in 10 shots

Binomial distribution

•Exactly six goals

•Most three goals

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

binomial(n=10,p=0.3)

goals

probability

0 1 2 3 4 5 6 7 8 9 106

Poisson distribution

•Expresses the probability of a given number

of events occurring in a fixed interval. •Notation : •

Poisson distribution

•Suppose interval : goals per game

e = 2.718281828…

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

binomial(n=10,p=0.3)

goals

probability

0 1 2 3 4 5 6 7 80

1

2

3

poissonraw data

Poisson

•Total : 11 games •Score : 33 goals•(33/11) = 3 goals per game•Poisson : •Raw data : •We could test inaccurately in this case by

poisson

Games

goals

Goals of game

0 1 2 3 4 5 6 7

Poisson 0.5

1.6

2.5

2.5

1.8

1.1

0.6

0.2

Raw data 1 2 2 2 2 0 1 1

•The presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model.

Overdispersion

Negative binomial

•Gamma-poisson (mixture) distribution

Negative binomial

Parameter estimation

Approximate control limits

•Chi-square approximation

𝑣=2𝜇1+𝜇𝑘

Example

= 67.0

Conclusion

•Conclusion

•Thanks for attention

Statistics model

•Suitable type▫Which distribution should we use

•Parameters ▫Get some information from data

•Inference ▫What do we want to know▫How could we make a decision

Hypothesis testing

Statistics model

•Suitable type▫Binomial distribution

•Parameters ▫n = 10, p = 0.7

•Inference▫2 successes

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

dbinom(0:10, n=10, p=0.3)

goals

probability

Multinomial distribution

•The analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of possible outcomes.•http://en.wikipedia.org/wiki/Multinomial_

distribution

Trinomial distribution

Count data

•A type of data in which the observations can

take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•We tend to use fixed fractions of genes.

The probability that reads appearedin this region

The number of read countsin this interval

(Binomial distribution) (Poisson distribution)

Poisson example

0 1 2 3 4 5 6 7 8 90

1

2

3

poissonraw data

Negative binomial

top related