+math

29
NAME : SITI ASZA IFFA BT MOHD SHARIAF IC NO : 930212-05-5516 CLASS : 5 IBNU KHALDUN TEACHER: TEACHER VASANTHI

Upload: asza-iffa-mohd-shariaf

Post on 20-Jun-2015

16 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: +math

NAME : SITI ASZA IFFA BT MOHD SHARIAFIC NO : 930212-05-5516CLASS : 5 IBNU KHALDUNTEACHER: TEACHER VASANTHI

Page 2: +math
Page 3: +math
Page 4: +math

HISTORY OF PROBABILITY

The scientific study of probability is a modern development. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, but exact mathematical descriptions of use in those problems only arose much later.

According to Richard Jeffrey, "Before the middle of the seventeenth century, the term 'probable' (Latin probabilis) meant approvable, and was applied in that sense, univocally, to opinion and to action. A probable action or opinion was one such as sensible people would undertake or hold, in the circumstances." However, in legal contexts especially, 'probable' could also apply to propositions for which there was good evidence.

Aside from some elementary considerations made by Girolamo Cardano in the 16th century, the doctrine of probabilities dates to the correspondence of Pierre de Fermat and Blaise Pascal (1654). Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi (posthumous, 1713) and Abraham de Moivre's Doctrine of Chances (1718) treated the subject as a branch of mathematics. See Ian Hacking's The Emergence of Probability and James Franklin's The Science of Conjecture for histories of the early development of the very concept of mathematical probability.

The theory of errors may be traced back to Roger Cotes's Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given.

Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve

Page 5: +math

y = φ(x), x being any error and y its probability, and laid down three properties of this curve:

1. it is symmetric as to the y-axis;2. the x-axis is an asymptote, the probability of the error being 0;3. the area enclosed is 1, it being certain that an error exists.

He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

The method of least squares is due to Adrien-Marie Legendre (1805), who introduced it in his Nouvelles méthodes pour la détermination des orbites des comètes (New Methods for Determining the Orbits of Comets). In ignorance of Legendre's contribution, an Irish-American writer, Robert Adrain, editor of "The Analyst" (1808), first deduced the law of facility of error,

h being a constant depending on precision of observation, and c a scale factor ensuring that the area under the curve equals 1. He gave two proofs, the second being essentially the same as John Herschel's (1850). Gauss gave the first proof which seems to have been known in Europe (the third after Adrain's) in 1809. Further proofs were given by Laplace (1810, 1812), Gauss (1823), James Ivory (1825, 1826), Hagen (1837), Friedrich Bessel (1838), W. F. Donkin (1844, 1856), and Morgan Crofton (1870). Other contributors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875). Peters's (1856) formula for r, the probable error of a single observation, is well known.

In the nineteenth century authors on the general theory included Laplace, Sylvestre Lacroix (1816), Littrow (1833), Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Hermann Laurent (1873), Liagre, Didion, and Karl Pearson. Augustus De Morgan and George Boole improved the exposition of the theory.

Andrey Markov introduced the notion of Markov chains (1906) playing an important role in theory of stochastic processes and its applications.

Page 6: +math

The modern theory of probability based on the meausure theory was developed by Andrey Kolmogorov (1931).

On the geometric side (see integral geometry) contributors to The Educational Times were influential (Miller, Crofton, McColl, Wolstenholme, Watson, and Artemas Martin).

EXAMPLE OF PROBABILITY IN OUR LIFE

EXAMPLE 1:

Suppose there is a school with 60% boys and 40% girls as students. The female students wear trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random) student from a distance; all the observer can see is that this student is wearing

Page 7: +math

trousers. What is the probability this student is a girl? The correct answer can be computed using Bayes' theorem.

The event A is that the student observed is a girl, and the event B is that the student observed is wearing trousers. To compute P(A|B), we first need to know:

P(A), or the probability that the student is a girl regardless of any other information. Since the observers sees a random student, meaning that all students have the same probability of being observed, and the fraction of girls among the students is 40%, this probability equals 0.4.

P(B|A), or the probability of the student wearing trousers given that the student is a girl. As they are as likely to wear skirts as trousers, this is 0.5.

P(B), or the probability of a (randomly selected) student wearing trousers regardless of any other information. Since half of the girls and all of the boys are wearing trousers, this is 0.5×0.4 + 1×0.6 = 0.8.

Given all this information, the probability of the observer having spotted a girl given that the observed student is wearing trousers can be computed by substituting these values in the formula:

EXAMPLE 2:

Suppose a certain drug test is 99% sensitive and 99% specific, that is, the test will correctly identify a drug user as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. This would seem to be a relatively accurate test, but Bayes' theorem can be used to demonstrate the relatively high probability of misclassifying non-users as users. Let's assume a corporation decides to test its employees for drug use, and that only 0.5% of the employees actually use the drug. What is the probability that, given a positive drug test, an employee is actually a drug user? Let "D" stand for being a drug user and "N" indicate being a non-user. Let "+" be the event of a positive drug test. We need to know the following:

P(D), or the probability that the employee is a drug user, regardless of any other information. This is 0.005, since 0.5% of the employees are drug users. This is the prior probability of D.

Page 8: +math

P(N), or the probability that the employee is not a drug user. This is 1 − P(D), or 0.995.

P(+|D), or the probability that the test is positive, given that the employee is a drug user. This is 0.99, since the test is 99% accurate.

P(+|N), or the probability that the test is positive, given that the employee is not a drug user. This is 0.01, since the test will produce a false positive for 1% of non-users.

P(+), or the probability of a positive test event, regardless of other information. This is 0.0149 or 1.49%, which is found by adding the probability that a true positive result will appear (= 99% x 0.5% = 0.495%) plus the probability that a false positive will appear (= 1% x 99.5% = 0.995%). This is the prior probability of +.

Given this information, we can compute the posterior probability P(D|+) of an employee who tested positive actually being a drug user:

Page 9: +math
Page 10: +math

INTRODUCTION

Probable and likely and their cognates in other modern languages derive from medieval learned Latin probabilis and verisimilis, deriving from Cicero and generally applied to an opinion to mean plausible or generally approved

Probability has a dual aspect: on the one hand the probability or likelihood of hypotheses given the evidence for them, and on the other hand the behavior of stochastic processes such as the throwing of dice or coins. The study of the former is historically older in, for example, the law of evidence, while the mathematical treatment of dice began with the work of Pascal and Fermat in the 1650s.

Probability is distinguished from statistics. While statistics deals with data and inferences from it, (stochastic) probability deals with the stochastic (random) processes which lie behind data or outcomes.

Two major applications of probability theory in everyday life are in risk assessment and in trade on commodity markets. Governments typically apply probabilistic methods in environmental regulation where it is called "pathway analysis", often measuring well-being using methods that are stochastic in nature, and choosing projects to undertake based on statistical analyses of their probable effect on the population as a whole.

A good example is the effect of the perceived probability of any widespread Middle East conflict on oil prices - which have ripple effects in the economy as a whole. An assessment by a commodity trader that a war is more likely vs. less likely sends prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are not assessed independently nor necessarily very rationally. The theory of behavioral finance emerged to describe the effect of such groupthink on pricing, on policy, and on peace and conflict.

It can reasonably be said that the discovery of rigorous methods to assess and combine probability assessments has had a profound effect on modern society. Accordingly, it may be of some importance to most citizens to understand how odds and probability assessments are made, and how they contribute to reputations and to decisions, especially in a democracy.

Another significant application of probability theory in everyday life is reliability. Many consumer products, such as automobiles and consumer electronics, utilize reliability theory.

Page 11: +math

b) difference between the theoretical and empirical probability

empirical probability

an event is “estimate” that the event will happen based on how often the event occurs after collecting data or running an experiment (in large number of trials). It based specifically on direct observation.

Page 12: +math

Theoretical probability

An event is number of ways that the event can occur, divided by the total number of outcomes. It is finding the probability of events that come from a sample space of known equally likely outcomes.

Comparing empirical probability and theoretical probability

Karen and Jason roll two dice 50 times and record their results in the accompanying chart.

1) what is their empirical probability of rolling a 7?

2) what is their theoretical probability of rolling a 7

3) how do the empirical and theoretical probabilities compare?

Solution

1)empirical probability is 13/50 = 26%

2) theoretical probability is 6/36 =1/6 =16.7%

3) Karen and Jason rolled more 7’s than would be expected theoretically.

Page 13: +math
Page 14: +math
Page 15: +math

a){1,2,3,4,5,6}

b)

1

1

2

3

4

5

6

Page 16: +math

2

1

2

3

4

5

6

3

1

2

3

4

5

6

Page 17: +math

4

1

2

3

4

5

6

5

1

2

3

4

5

6

Page 18: +math

6

1

2

3

4

5

6

Page 19: +math

a)Sum of the dots on

both turned-up faces(x)

Possible outcomes Probability,P(x)

2 (1,1) 1/363 (1,2) (2,1) 2/36=1/184 (1,3) (3,1) (2,2) 3/36=1/125 (1,4) (4,1) (2,3) (3,2) 4/36=1/96 (1,5) (5,1) (2,4) (4,2)

(3,3)5/36

Page 20: +math

7 (2,5) (5,2) (1,6) (6,1) (4,3) (3,4)

6/36=1/6

8 (3,5) (5,3) (6,2) (2,6) (4,4)

5/36

9 (3,6) (6,3) (4,5) (5,4) 4/36=1/910 (5,5) (6,4) (4,6) 3/36=1/1211 (6,5) (5,6) 2/36=1/1812 (6,6) 1/36

b)

Event (x)

Possible outcomes Probability, p(x)

A (1,2)(1,3)(1,4)(1,5)(1,6)(2,1)(2,3)(2,4)(2,5) (2,6)(3,1)(3,2)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3) (4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,6)(6,1)(6,2) (6,3) (6,4)(6,5)

B C (1,2)(1,4)(1,6)(2,1)(2,2)(2,3)(2,5)(3,2)(3,3)(3,4)

(4,1)(4,3)(5,2)(5,3)(5,5)(6,1)16/36=4/9

D (2,2)(5,3)(5,5)(3,3)(3,5) 5/36

Page 21: +math

a)Sum of the

two numbers (x)

Frequency () (x) (x2)

2 3 6 123 4 12 364 6 24 965 9 45 2256 4 24 1447 2 14 988 11 88 7049 4 36 324

10 2 20 20011 1 11 12112 4 48 576

∑=50 ∑x=328 ∑x2=2536

¡) mean: ∑ƒx

Page 22: +math

∑ =328 50 =6.56¡ ¡) variance: ∑x2 x 2 ∑ = 2536 - 6.562 =7.686 50

¡ ¡ ¡) standard daviation: √( ∑x2 _ x 2) ∑

=√(2536-6.562) = 2.772 50b) mean prediction of 100 times of tosses dice 6.56 x 2 =13.12c)

Sum of the two numbers

(x)Frequency () x x2

2 14 28 563 8 24 724 10 40 1605 12 60 3006 10 60 3607 9 63 4418 12 96 7689 6 54 486

10 6 60 60011 7 77 84712 6 72 864

∑ = 100 ∑x = 634 ∑x2 = 4954

¡) mean: ∑ƒϰ ∑ = 634/100 =6.34¡ ¡) variance: ∑x2 ϰ 2

Page 23: +math

∑ =4954/100 -6.342 =9.344

¡ ¡ ¡) standard deviation: √( ∑x2 _ ϰ2 ) ∑

= √(4954/100 – 6.342 ) = 3.057

Page 24: +math

a)

mean = x P(x)

= 2(1/36) + 3(1/18) + 4(1/12) +5(1/9) + 6(5/36) +7(1/6) +

8(5/36) + 9(1/9) + 10(1/12) + 11(1/18) + 12(1/36)] = 7

Variance = x2 P(x) – (mean)2

=22(1/36) + 32(1/18) + 42(1/12) +52(1/9) + 62(5/36)

+72(1/6) + 82(5/36) + 92(1/9) + 102(1/12) + 112(1/18)

+ 122(1/36) - 72]

= 5.83

Standard deviation = √5.83

=2.415

a)

X 2 3 4 5 6 7 8 9 10 11 12

P(X) 1/36 1/18 1/12

1/9 1/12

1/18

5/36 1/9 1/12

1/18 1/36

Page 25: +math

Part 4 Part 5

N = 50 N = 100

Mean 6.56 6.34 7.00

Variance 7.686 9.344 5.83

Standard deviation 2.772 3.057 2.415

We can see that, the mean, variance and standard deviation that we obtained trough experiment in part 4 are different but close to the theoretical value in part 5.

For mean, when the number of trial increased from n = 50 to n = 100, its value get closer (from 7.686 to 9.344) to the theoretical value. This is in accordance to the Law of Large Number in next section.

Nevertheless, the empirical variance and empirical standard deviation that we obtained I part 4 get further from the theoretical value in part 5. This violates the Law of Large Number. This is probably due to

1)the sample (n=100)is not large enough to see the change of value of mean, variance and standard deviation

2)Law of Large Number is not an absolute law. Violation of this law is still possible though the probability is relative

In conclusion, the empirical mean, variance and standard deviation can be different from the theoretical. When the number of trial (number of sample) getting bigger, the empirical value should get to the theoretical value. However, violation of this rule is still possible, especially when the number of trial (or sample) is not large enough.