lecture notes feb10

Upload: minjie-zhu

Post on 07-Jul-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 Lecture Notes Feb10

    1/4

    Substitute Class - Lecture Notes

    Abhijit Kiran Valluri

    February 10, 2016

    1 Birthday Problem

    The birthday problem, also often referred to as the birthday paradox is the problem of determining theprobability that at least one pair of individuals, in a group of   n, have the same birthday. Using thepigeonhole principle, one can readily see that the probability reaches 1 when we have a group of 367people or more (considering the extra day, February 29, in a leap year). But what is surprising to mostpeople, at first glance, is that the probability reaches 1/2 with just a small group of 23 people. Most

    people typically guess that there should be at least 100 people before this happens. In fact, there is noparadox here at all, and the confusion is merely due to the counter-intuitiveness of the problem.

    1.1 Calculating the probability

    Before we calculate the probability, we should note that we make the following assumptions: (1) all thedays of the year are equally likely to be the birthday of a person, (2) we have a representative sampleof the population. That is, we have not cherry-picked people in the group of   n   people such that anyparticular day, week, month is more likely to be a birthday within the group. This, together with the firstassumption implies that we can simply assume that each person is equally likely to have his/her birthdayto be on any day of the year.

    Then, let us define the event

     A =

     {At least two people have the same birthday

    }. Now, consider the

    complement of this event, Ac. The birthdays of all the individuals are mutually independent events.Hence, for Ac to occur, we can reason as follows: person 1 can have their birthday on any of the 366 daysas this does not cause any clashes; person 2 can have their birthday on any of the remaining 365 days;similarly person 3 can have their birthday on any of the remaining 364 days and so on.

    Then, the probability of event Ac is

    P  (Ac) = 366366

     × 365366

     × · · · ×  367 − n366

      ,   (1)

    when there are  n  people. When n  = 23, P  (Ac) ≈ 0.4937 and hence P  (A) >  0.5. Therefore, we only needabout 23 people in the group to have more than 50% chance of someone having the same birthdays. If there are 70 people, then this probability is about 99 .9%.

    2 Binary Symmetric Communication Channel

    A binary symmetric channel (BSC) is a very common communications channel used in information theoryand communication theory related courses as a learning tool and to develop some fundamental theorems incommunication theory. Essentially, during a single channel use, the channel takes in a binary input {0, 1},and outputs a binary bit, with a small probability of  flipping  the bit, given by the crossover probability.

    Q:  How do we infer the input from the output in such a channel?

    1

    https://en.wikipedia.org/wiki/Birthday_problemhttps://en.wikipedia.org/wiki/Pigeonhole_principlehttps://en.wikipedia.org/wiki/Binary_symmetric_channelhttps://en.wikipedia.org/wiki/Binary_symmetric_channelhttps://en.wikipedia.org/wiki/Pigeonhole_principlehttps://en.wikipedia.org/wiki/Birthday_problem

  • 8/18/2019 Lecture Notes Feb10

    2/4

    Figure 1: A binary symmetric channel, with crossover probability  p

    A: Well, the input and output are often represented by random variables,  X  and Y   respectively. Giventhe above figure and the crossover probability, we can compute the conditional probability of  Y   given  X as:

    P (Y   = 0|X  = 0) = P (Y   = 1|X  = 1) = 1 − p,   (2)P (Y   = 0|X  = 1) = P (Y   = 1|X  = 0) = p.   (3)

    Using the Bayes’ theorem, we can get the probability distribution of  X  given  Y   as follows:

    P (X  = x|Y  = 0) =   P (Y   = 0|X  = x)P (X  = x)P (Y   = 0|X  = x)P (X  = x) + P (Y   = 0|X  = x)P (X  = x) ,   (4)

    and likewise for the other probabilities. Now, the distribution of   X   is called the   a priori   probabilitydistribution, which gives us the distribution of the symbols in the input. The above defined probabilitydistribution of  X   given  Y   is the  a posteriori  probability. This idea is the basis for the so called MAP

    estimate (maximum  a posteriori ) that is used in estimation and detection theory.

    3 Examples of PMFs, PDFs and CDFs

    Let us take a look at some common probability mass functions (PMFs) and probability density functions(PDFs) and their corresponding cumulative distribution functions (CDFs).

    3.1 Binomial

    X  ∼ B(n, p) (5)

    A binomial random variable,  X , is obtained when we perform  n  Bernoulli trials, i.e., a success/failureexperiment such as a coin toss. The probability of success in each trial is  p, and we repeat this  n  times.We then count the total number of successes in the  n  trials. Clearly, this is a discrete random variable.

    We see that the   n   Bernoulli trials are independent. Hence, the probability of  X   =  k   is simply theprobability of there being exactly  k  successes in  n   trials. We can choose these  k   successes to be any of the  n trials in

    n

    k

     ways, and hence we get the following

    P (X  = k) =

    n

    k

     pk (1 − p)n−k .   (6)

    Clearly, the sum of the above terms from  k = 0 to  n is 1.

    2

  • 8/18/2019 Lecture Notes Feb10

    3/4

    3.2 Geometric

    X  ∼ Geo( p) (7)

    The geometric random variable is also a discrete random variable. It is the discrete analogue of the exponential random variable. There are two slightly different but equally valid ways of defining thegeometric random variable. I’ll tell you one of those two. We define the geometric random variable as thenumber of Bernoulli trials needed to get one success. The support set is

    {1, 2, 3, . . .

    }. The parameter, p  is

    the probability of success for the Bernoulli trials. So, considering a coin toss and head to be success andtail to be failure, we keep tossing the coin till we see the head. The probability that we see the head onthe kth trial is the probability of seeing tails in the first  k − 1 trials and then seeing a head. This gives us

    P (X  = k) = (1 − p)k−1 p.   (8)Summing the above from 1 to infinity gives us 1, as expected.

    3.3 Uniform (continuous)

    X  ∼ U (a, b) (9)

    The uniform random variable can be defined for both the discrete and continuous case. We will lookat the continuous version. The uniform random variable is defined on an interval [a, b]. The probability of any single point within this region is not a meaningful way to define this quantity because its probabilityis zero, as a point is a set of measure zero1.  So, instead we can say that the probability of any portionof the line segment from  a to  b   is the length of that portion of the segment, divided by the length of thewhole segment,  b − a.

    More useful is its PDF. This is given as

    f (x) =

      1

    b−a  x ∈ [a, b]

    0 otherwise.(10)

    The CDF can be obtained from the PDF as

    F (x) =

       x

    −∞

    f (u)du =

    0   x < ax−1

    b−a  x ∈ [a, b)

    1   x ≥ b.(11)

    3.4 Exponential

    X  ∼ exp(λ) (12)

    The exponential random variable is a continuous random variable. It is the continuous analogue of the geometric distribution that we saw earlier. A very important property of the exponential distributionis that it is memoryless.

    The parameter λ > 0 is also called the rate. The PDF of the random variable is given by

    f (x) =

    λe−λx x ≥ 00   x

  • 8/18/2019 Lecture Notes Feb10

    4/4

    3.5 Gaussian (or Normal)

    X  ∼ N (µ, σ2) (15)The Gaussian random variable, also called the Normal random variable, is a very common random

    variable – probably the  most important one that you will learn in probability theory, at least for engineers.It appears almost everywhere. And oftentimes, engineers will stick in a Gaussian random variable for an

    unknown distribution just because it is so nice to work with! And if it works, all the merrier!The parameters,  µ  and  σ2 are respectively the mean and variance of the distribution. The PDF is

    given by

    f (x) =  1√ 

    2πσ2exp

    −(x − µ)

    2

    2σ2

    , x ∈ R.   (16)

    Unfortunately, the CDF of the Gaussian random variable cannot be expressed in terms of elementaryfunctions and hence has no closed form expression.

    Due to the shape of the PDF, the normal distribution is sometimes also called the “bell curve”. Thephrase, “grading on a curve ” usually refers to using the “bell curve”, namely the normal distribution todistribute the grades to the point scale!

    Since the Gaussian distribution is so widespread in its occurrence and usage, I have decided to include

    a figure of the bell curve for you to inspect.

    Figure 2: The world famous bell curve, i.e. the probability density function of the Gaussian randomvariable.

    4