Statistical model for count dataSpeaker : Tzu-Chun LoAdvisor : Yao-Ting Huang
Outline
•Why use statistical model•Target
▫Gene expression•Binomial distribution
▫Poisson distribution•Over dispersion•Negative binomial
▫Chi-square approximation•Conclusion
Statistics model
•A statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data.
Population
sample Information :
Inference
Make a decision : Hypothesis testing
designer consumer
We have to choose astatistics model for sample(mean, variance)
We
Height, weight, etc.
(mean, variance) size
Target
• Gene expression▫ We like to use statistical model to test an observed difference in read counts is significant.
Look like asignificantregion
How about thisCan we sure ?
Noise or not
Count data
•A type of data in which the observations can
take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•An individual piece of count data is often
termed a count variable.Binomial
Poisson
Negative binomial
All of themare this type
Binomial distribution•The number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.•Notation :
Binomial distributionEx : p=0.8 , (1-p)=0.2 , times : 3 , success : 2 (1 1 0) (1 0 1) (0 1 1) f(2)=0.384
33 goals110 shotsin this season
Success : 0.3Fail : 0.7
What is the probabilityif he scored 6 goals in 10 shots
Binomial distribution
•Exactly six goals
•Most three goals
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
binomial(n=10,p=0.3)
goals
probability
0 1 2 3 4 5 6 7 8 9 106
Poisson distribution
•Expresses the probability of a given number
of events occurring in a fixed interval. •Notation : •
Poisson distribution
•Suppose interval : goals per game
e = 2.718281828…
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
binomial(n=10,p=0.3)
goals
probability
0 1 2 3 4 5 6 7 80
1
2
3
poissonraw data
Poisson
•Total : 11 games •Score : 33 goals•(33/11) = 3 goals per game•Poisson : •Raw data : •We could test inaccurately in this case by
poisson
Games
goals
Goals of game
0 1 2 3 4 5 6 7
Poisson 0.5
1.6
2.5
2.5
1.8
1.1
0.6
0.2
Raw data 1 2 2 2 2 0 1 1
•The presence of greater variability (statistical dispersion) in a data set than would be expected based on a given simple statistical model.
Overdispersion
Negative binomial
•Gamma-poisson (mixture) distribution
Negative binomial
Parameter estimation
Approximate control limits
•Chi-square approximation
𝑣=2𝜇1+𝜇𝑘
Example
= 67.0
Conclusion
•Conclusion
•Thanks for attention
Statistics model
•Suitable type▫Which distribution should we use
•Parameters ▫Get some information from data
•Inference ▫What do we want to know▫How could we make a decision
Hypothesis testing
Statistics model
•Suitable type▫Binomial distribution
•Parameters ▫n = 10, p = 0.7
•Inference▫2 successes
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
0.25
dbinom(0:10, n=10, p=0.3)
goals
probability
Multinomial distribution
•The analog of the Bernoulli distribution is the categorical distribution, where each trial results in exactly one of some fixed finite number k of possible outcomes.•http://en.wikipedia.org/wiki/Multinomial_
distribution
Trinomial distribution
Count data
•A type of data in which the observations can
take only the non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking.•We tend to use fixed fractions of genes.
The probability that reads appearedin this region
The number of read countsin this interval
(Binomial distribution) (Poisson distribution)
Poisson example
0 1 2 3 4 5 6 7 8 90
1
2
3
poissonraw data
Negative binomial