a brief history of statistics. medieval times: dice and gambling

Post on 13-Jan-2016

222 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Brief History of Statistics

Medieval Times: Dice and Gambling

Modern Times: Dice and Games/Gambing

Dice Probabilities

16

= 16.7%

1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

136 = 2.78%

636

= 16.78%

Dice Outcome are Independent

Sum

Dice Probabilities

1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

Probability Distribution

Blaise Pascal

1600’s: Probability & Gambling

one "6" in four rolls  one double-six in 24 throws

Do these have equal probabilities?

Chevalier de Méré1623 - 16621607 - 1684

Binomial / Bernoulli Distribution

1654-1705

Binomial Distribution• The principal reason for using a normal curve test on a dichotomy has been

the past difficulty of calculating the exact binomial distribution.

1761: Bayes Formula

Probability Distribution

New Data

ProbabilityFemale

ProbabilityMale

Height of the Person

=

DataPrior (X) Prior (X)

DataPrior (X)

60 67.5 75

=

Gender

Prior (X)

Child Height

66.5

1701 - 1761

Bayesian Formulas – ExcelD

Google Ngram Viewer• Ngram: word or string in a corpus• Corpus: a large or complete collection of writings

• Team of researchers from Harvard, Google, Encyclopaedia Britannica, and the American Heritage Dictionary

• Analyzed 5 million books from 1500 to 2008• 500 billion unique words• ~4% of all books ever published

Bayes, Bayesian

1800 1900 20001760

Ngram Viewer: “statistics”

1800 1900 2000

Observation on Height

• Adolphe Quételet (1796-1874)• Mid 1800’s studied Social Data, Crime• ‘Quetelet Index’: Weight / Height• Now known as the “Body Mass Index”

"The average person"

Normal

1800 1900 2000

1st Regression Line - 1877

The first “Regression Line”

1822 - 1911

“statistics”, “correlation” “regression”

1800 1900 2000

statistics correlation regression

“Standard Deviation”

1800 1900 2000

Tukey

1915 – 2000

He introduced the box plot in his 1977 book, "Exploratory Data Analysis".

3

1800 1900 2000

Ngram Viewer: “sliderule”

``

1800 1900 2000

Ngram Viewer: “calculator”

Ngram Viewer: “computer”, “internet”

Machine Learning

Ngram Viewer: “chi square”

chi-square test vs. z-test on a proportion

Two-tailed Z-test for two proportions (using a pooled estimate of p) and a chi-square test for a 2-by-2 table will give exactly same P-value.

top related