binomial setting and distributions binomial distributions are models for some categorical variables,...

Download Binomial setting and distributions Binomial distributions are models for some categorical variables, typically representing the number of successes in

If you can't read please download the document

Upload: magdalen-powers

Post on 18-Jan-2018

229 views

Category:

Documents


0 download

DESCRIPTION

We express a binomial distribution for the count X of successes among n observations as a function of the parameters n and p: X ~ B(n,p).  The parameter n is the total number of observations.  The parameter p is the probability of success on each observation.  The count of successes X can be any whole number between 0 and n. The CDC estimates that a third of adult men are obese. In a random sample of 10 adult men, each man is either obese or not. The variable X is the number of obese men among those 10 men sampled, our count of “successes.” For each man, the probability of success, “obese,” is 1/3. The number X of obese men among 10 men has the binomial distribution B(n = 10, p = 1/3). Binomial parameters

TRANSCRIPT

Binomial setting and distributions Binomial distributions are models for some categorical variables, typically representing the number of successes in a series of n independent trials. The observations must meet these requirements: the total number of observations n is fixed in advance each observation falls into just one of two categories: success and failure the outcomes of all n observations are statistically independent all n observations have the same probability p of success Applications for binomial distributions Binomial distributions describe the possible number of times that a particular event will occur in a sequence of observations. In a clinical trial, a patients condition may improve or not. The binomial distribution describes the number of patients who improved (not how much better they feel) among the study participants. Is a child obese or not (based on their body mass index)? The binomial distribution describes the number of obese children in a random sample of school-age children. In a quality control study, we assess the number of defective items in a lot of goods, irrespective of the type of defect. We express a binomial distribution for the count X of successes among n observations as a function of the parameters n and p: X ~ B(n,p). The parameter n is the total number of observations. The parameter p is the probability of success on each observation. The count of successes X can be any whole number between 0 and n. The CDC estimates that a third of adult men are obese. In a random sample of 10 adult men, each man is either obese or not. The variable X is the number of obese men among those 10 men sampled, our count of successes. For each man, the probability of success, obese, is 1/3. The number X of obese men among 10 men has the binomial distribution B(n = 10, p = 1/3). Binomial parameters Binomial probabilities The number of ways of arranging k successes in a series of n observations (with constant probability p of success) is the number of possible combinations (unordered sequences). This can be calculated with the binomial coefficient: R: choose(n,k) The binomial coefficient n_choose_k uses the factorial notation !. The factorial n! for any strictly positive whole number n is: n! = n (n 1) (n 2) 3 2 1 where k = 0, 1, 2,..., or n The binomial coefficient counts the number of ways in which k successes can be arranged among n observations. The binomial probability P(X = k) is this count multiplied by the probability of any specific arrangement of the k successes: XP(X)P(X) 012kn012kn Total1 The probability that a binomial random variable takes any range of values is the sum of each probability for getting exactly that many successes in n observations. P(X 2) = P(X = 0) + P(X = 1) + P(X = 2) The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is estimated to be about 8%. In a group of 25 Caucasian American males, what is the probability that exactly five are color blind? P(x = 5) = [n! / k!(n k)!] p k (1 p) n-k = (25! / 5!(20)!) = [21*22*23*24*24*25 / 1*2*3*4*5] = 53,130 * * = Use technology > dbinom(5,25,.08) [1] The probability that exactly 2 adults in the sample have depression is ???? The incidence of major depression in adults is about 10%. A random sample of 50 adults will be tested for depression. The variable X is the number of individuals diagnosed with depression among all 50 and has the binomial distribution Bin(n = 50, p = 0.1). A) B) C) D) E) 0.112 Binomial mean and variance The center and spread of the binomial distribution for a count X are defined by the mean and standard deviation The incidence of major depression in adults is about 10%. A random sample of 50 adults will be tested for depression. The variable X is the number of individuals diagnosed with depression among all 50 and has the binomial distribution Bin(n = 50, p = 0.1). Thus, Effect of changing p when n is fixed Binomial distributions are skewed when p is close to 0 or close to 1 (especially if the sample is small). Effect of changing n for a fixed value of p Normal approximation to binomial Binomial distribution can be approximated by a Normal distribution, when both np 10 and n(1 p) 10. The approximation can be improved by using a continuity correction to take into account the fact that the Normal distribution is continuous. Hint: P(X=x) = P(x-.5 X x+.5) Count of adults diagnosed with depression in a sample of 20 adults, Bin(n = 20, p = 0.1). No Normal approximation Why?? The incidence of major depression in adults is about 10%. Count of adults diagnosed with depression in a sample of 100 adults, Bin(n = 100, p = 0.1). Normal approximation OK Why? Binomial, n=20, p=0.1 Binomial, n=100, p=0.1 The frequency of color blindness (dyschromatopsia) in the Caucasian American male population is about 8%. We take a random sample of size 125 from this population. What is the probability that 6 individuals or fewer in the sample are color blind? Distribution of the count X: B (n = 125, p = 0.08) np = 10 P(X 6) = pbinom(6,size=125,prob=.08) in R [1] or about 12% Normal approximation: N (np = 10, np(1 p) = 3.033) P(X 6) = pnorm(6, mean=10, sd=3.033) = or about 9% Or z = (x - )/ = (6 10)/3.033 = P(X 6) = from Table B The Normal approximation is reasonable, but not quite close to 12%. Here p =.08 is not close to 0.5, but np = 10 just meets the criterion. Using a continuity correction greatly improves the approximation: P(X 6) = P(X6.5) = pnorm(6.5, mean=10, se=3.033) = Distributions for the color blindness example. n = 50 n = 125 n = 1000 The larger the sample size the better the Normal approximation fits the binomial distribution. The Poisson distributions A Poisson distribution describes the count X of occurrences of an event in fixed, finite intervals of time or space when occurrences are all independent, and the probability of an occurrence is the same over all possible intervals. Think of the Poisson distribution as describing the number of items in containers. Items Containers Radioactive decays Weeds Fleas Cardiovascular deaths Second Acre of farm land Dog County / year If we divide a natural lawn into 1 ft 2 quadrants, we can count how many dandelions are in each quadrant. Dandelions seeds are wind-spread. The probabilities of a quadrant containing 0,1,2,3 dandelions are given by a Poisson distribution: (i) independence of dandelions: the presence of one dandelion in a quadrant does not make the presence of another more or less likely. (ii) homogeneity of quadrants: each quadrant is equally susceptible to contain dandelions. Poisson probabilities If is the population mean number of occurrences for a specified interval of time or space, then the Poisson probability distribution of observing k occurrences (k = 0, 1, 2, ) at constant (> 0) is: The Poisson distribution has mean and standard deviation : Effect of changing : The Poisson distribution is skewed when 5. The number of deer crossing a road at night during mating season in a particular rural area can be modeled with a Poisson distribution. A local survey conducted over 4 nights found a total of 20 deer crossings. Based on this information, what is the probability that fewer than three deer would cross on a given night during mating season in this area? To compute this probability using the Poisson distribution, we need to know . In this case = 20 / 4 = 5 deer crossings per night. > ppois(2,lambda=5) [1] x Historical records over 20 years in a particular town indicate an average of 4 severe rainstorms per year. Modeling the occurrences of severe rainstorms with the Poisson distribution, the probability that there would be no severe rainstorm next year is P(X = 0) = (4) 0 e 4 / 0! = Probability of 5 severe rainstorms next year P(X = 5) = (4) 5 e 4 / 5! = Probability of 1 or more severe rainstorms next year P(X > 1) = 1 P(X = 0) = 1 = Probability of more than 5 severe rainstorms next year P(X > 5) = 1 P(X 5) = 1 = x P(X=x)P(Xx) % 1.832% % 9.158% % % % % % % % % % % % % % % % % % % % % % % % % % %