“adolphe quetelet: statistics and social science in the early 19 th century” evan brott february...

84
“Adolphe Quetelet: Statistics and Social Science in the Early 19 th Century” Evan Brott February 3, 2003

Upload: theodora-hart

Post on 27-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

“Adolphe Quetelet:Statistics and Social Science in the Early

19th Century”

Evan Brott

February 3, 2003

Quetelet: 1796-1874

• Today, Quetelet is nearly unknown

• But, he made major contributions to statistics

• Also one of his era’s greatest social scientists

Main Works• 1835- Publishes Physique Sociale: A Treatise on

Man, and the Development of His Faculties which introduces the concept of the ‘Average Man,’

a basic concept in the Social Sciences. (That’s him on the right)

• 1846 – Is the first to fit a normal curve to a distribution of human traits

Outline1) Science in the 1830’s2) The Early Life of Quetelet3) The Average Man: a Study of Mortality4) Comparisons of Average Men:

a Look at European Sex Ratios5) Statistical Morality and Early ANOVA:

Crime and Punishment in 1820’s France6) Fitting a Normal Curve:

the Chest Size of a Scotsman7) Quetelet’s Legacy

Part I: Science in the 1830’s

or: They Thought WHAT!?

State of the Arts

• Quetelet’s research was from 1820-1850.

• MANY theories we take for granted were not yet developed.

Biology

• 1859 – Darwin publishes The Origin of Species

• 1860s – Pasteur develops Germ Theory of Disease

• 1865 – Mendel discovers basics of Genetics

Quetelet’s Environment

• Spontaneous Generation not disproved

• Quetelet believes Miasmic Theory of Disease

• Many results seemed strange without understanding heredity

Social Science

• Quetelet one of the first mathematical social scientists

• 1830’s beliefs seem very strange today

• Ex: Phrenology: Personality read by the shape of the skull

Early Statistical History

• Beginnings in 17th century

• Studied Laws of Probability through Gambling

More Proto-Statistics

• 1680s: Newton and Leibniz independently develop Theory of Calculus

• 1689: Bernoulli first states the

Law of Large Numbers

Normal Distribution

• 1733: De Moivre finds Normal Distribution arises as a limit of the Binomial

• 1778-1812: Laplace develops the

Central Limit Theorem

• 1809: Gauss finds that most random errors are distributed normally

Future Statistical Knowledge

• 1890s: Pearson develops his correlation coefficient

• 1904: Gosset (a.k.a. ‘Student’) develops the t-distribution

• 1920s: Fischer’s work starts the modern era of statistics

Part II: The Early Life of Quetelet

or: how to build an observatory without really trying

Origins

• Born on 2/22/1796 in Ghent, Belgium

• Doctorate in conic sections from University of Ghent in 1819

Astronomy

• Initial post-doctoral work in astronomy under Arago and Bouvard

• Famous story about founding Belgium’s first observatory: traveled to France at age 26, and got funding despite having NO experience at all.

Astronomical Statistics

• Galileo first showed astronomical measurement errors were:

- random

- symmetric

- small errors occur more often than large

errors.

Hypothesized Error Distributions

• Thomas Simpson (1756)

• Daniel Bernoulli (1777)

• Karl Freidrich Gauss (1809)

More Statistical Exposure

• Met the 75-year old Laplace while getting funding for his observatory

• Post-doctoral mathematical work with Fourier

The Census

• 1826: began work with the Belgian Department of the Census- was in charge by 1829.

• All censuses at that time were total population counts; Laplace thought of a simpler method

• Count the number of births in several regions; then multiply by ratio of births/population

Quetelet’s Plan• Quetelet was interested in Laplace’s method

• Received a letter from Baron de Keverberg

• Letter said far too many variables in social science for random sampling

• Quetelet was convinced- conducted full census anyway

PART III:THE

AVERAGE MAN

Physique Sociale• Newton’s mechanical physics was highly

esteemed in Quetelet’s time

• Quetelet envisioned a similar Social Physics

• Central to this was the idea of

The Average Man – which was likened to a social ‘center of gravity’

What is the Average Man?

• It’s exactly what you think it is

• Consider human size:

Small AVERAGE Large

Influential

• Quetelet was obviously not the first to think of this sort of thing

• He popularized it, and as we will see carried the concept much further though

• It is a VERY common concept today

Nutritional Example

“The average man needs 250g of carbohydrates each day”

Common Example

• “The Average Family has 2.4 Children”

(Here, we see the Average man doesn’t necessarily exist)

- “The Average American will save $278 dollars with my tax plan”

- “But 50% goes to the top 1% of Americans”

- “The bottom 20% pays no taxes”

- “The top 1% makes over $300,000 already”

- And so on . . .

Political Example

Silly Example• “The Average Man has less than 2 legs”

(Out of the worlds 6 billion people at least 10,000 have only 1 leg . . .)

What Quetelet Thought

“If an individual at any given epoch of society possessed all the qualities of the average man, he would represent all that is great,

good, or beautiful.”

Cournot’s Critique

• “A totally average man, if forced to exist, would be an unviable monstrosity: just as the averages of several different right triangles will not be a right triangle.”

Quetelet’s First Example

• The beginnings of Survival Analysis came from Mortality Tables

• These listed the expected times of death

• In short, the Age of the Average Man

Quetelet’s Work

• Mortality- P(dying this year)*10,000

• Viability- 1/P(dying this year)

Part IV: Many Average Men

Or:Where Male Babies Come From

Categories

• Quetelet did not only envision the Average Man as a ‘global average’

• Rather, there was: An Average Man – and Woman – for every

“race, location, age, and epoch – and all combinations of these”

• Allowed between group comparisons

Categories• This was also understood before his time

• The mortality tables were divided by gender, location, and occupation

• Still, Quetelet popularized and greatly refined the notion

• It is a biological fact that 1.06 male babies are born for every female baby.

• Known as early as the 17th Century

• Why?

1.06 : 1.00

The Sex Ratio

Current Thought

• Evolutionary: men are more expendable

• Sources of variation:

- Prenatal diseases disproportionately effect boys

- First birth, younger women have more boys

- Effects of family planning

• Quetelet noticed most of these!

The Mind of God• 1710: John Aurbuthnot believes probability

evidences the Divine Mind:

• Sees sex ratio as evidence – more men die in war, but still enough left to evenly match with women

• One of the first applications of probability outside of pure math / gaming

Quetelet: by Country

• Shows global average; evidence of variation

Sources of Variation

• Tried to explain why different countries had different ratios

• Decided on racial differences (e.g. Russians naturally have more boys than Swedes)

• Showed many other possible causes

South Africa

• Climate, Race, Lifestyle, Small Samples

Legitimacy• The following page shows a table of births

by marital status

• Quetelet never said WHY this effect was there – surely he didn’t think church sanction ‘blessed’ the couple with more boys?

• Proxy for age? Or social status?

Legitimacy

Age

• Quetelet presented other theories, this one from Hofacker:

• Overstates effect

Other Theories

• Dismisses Bicke’s family planning theory

• Shows first marriages (not births) lead to more boys

• Town vs. Country also considered

• Decides on Race

Still Births• Several Chapters later, demonstrates that

Stillbirths are predominately male

• Does not realize that differing levels of healthcare can exaggerate this effect- accounting for variation

Part V: Analysis of Crime

or: “If you must murder, try to be a well-educated woman over 30”

Victorian STAT 410• Ordinary Least Squares had been known for

centuries

• ‘Regression’ would not be called such until Galton in the 1870’s

• Hypothesis Testing, ANOVA still in extremely vague state

Criminology• Data collected from the French Courts of

Assize from 1825-1830

• Avg. Probability of Conviction: 0.614

Question

• P(conviction) = 0.614 for THE average man.

• Is this probability different for different groups of people (different “ average ‘men’ ”)?

Answer: YES!

New Question: How can we Explain this Variation?

• From the table, it appears that gender, age, type of crime, appearance at trial, and educational status are important.

• How can we tell which of these are significantly different from 0.614?

• Which of these variations are more significant than the yearly variation?

• Can we make multiple comparisons?

Quetelet’s Paradigm

• 3 sources of variation

- Constant

(e.g. women always have a lower rate)

- Variable

(e.g. conviction rate decreases w/ time)

- Accidental

(e.g. a change in alcohol policy at the university causes more arrests, but not convictions, in 1828.)

Analysis of Variation

Relative Degree of Influence

• Calculated as

• For instance- for crimes against property we get |0.655-0.614|/0.614 = 0.067

• Thus, property crimes are ‘average crimes’

614.0

|614.0)|(| statusconvictionP

How to Assess Variability without knowledge of 2

• Quetelet used (xmax-xavg)/xavg and

(xavg-xmin)/xavg

to give limits on variability.

• Hence for superior education we get a range of (0.40-0.35)/0.40 = 0.125 and

(0.48-0.40)/0.40 = 0.200

What does all this mean?

• Higher ‘relative degree of influence’ means the cause is more likely to be constant, i.e. P(conviction|status) P(conviction)

• If ‘variability’ is less than R.D.I., then variation by year (variable cause) is less important than the constant cause

Example: No Shows• Average Conviction Rate = 0.960

• Relative Degree of Influence = 0.563

• Lower Variability = 0.031

• Upper Variability = 0.010

• High RDI -> significant

• Small variability

-> same across years

Comparisons• Can we compare groups’ conviction rates?

• No, not really. We have a very poor grasp on variability, and cannot conduct hypothesis testing.

• Nevertheless, Quetelet states that the best position to be in was “a well-educated female over thirty, appearing voluntarily to answer a crime against persons.”

Primitive ANOVA• Can we decide which causes are more

variable or influential?

• Well, sort of. Quetelet has the basic framework of ANOVA set up

• Lacks consistency and optimality properties; ANOVA will be refined by Fischer in early 20th century

Multiple Comparisons• Many data groupings highly dependent

(e.g. gender and higher education in the 1820’s)

• Basic, modern ANOVA would fail in these circumstances too!

• So the ‘well-educated, voluntarily appearing woman over 30’ comment is not valid

Poisson• Quetelet’s most famous contemporary (by

today’s standards, anyway) was Poisson.• Poisson also analyzed this same dataset• Summary: - Using corrected data for 1825, refutes

Quetelet’s claim of decreasing rates - Modeling jury selections as a binomial

random variable, gets a rate distribution - Comes up with pseudo-Bayesian

probabilities on conviction.

Part VI: Fitting a Normal Curve

or: Statistics and the 48-inch chest

What is Normally Distributed?• Laplace’s CLT (1778-1812) showed that

the Normal is the limit of many distributions

• Gauss (1809) shows it is a very common error distribution

• Quetelet is the first to show human physiology can be normally distributed

• Thinks ALL natural variables are normal

Scottish Army Uniforms in 1819

• Data on the following page collected by Scottish army

• Needed to fit shirts to soldiers – so tried to estimate soldier’s shirt sizes

Average Soldier?

• Can’t just clothe the ‘Average Soldier’ – gotta clothe ‘em all.

• Possibility – Average solider of each height

1846• Instead, decides to fit a normal curve to his

data.

• Did not have a normal table – used a binomial with n=999 (1,000 outcomes)

• Created a table by realizing

yn+1 = yn * (999-n)/(n+1)

for the binomial

Odd fit

• 1) Split data at median

• 2) Find upper/lower cumulative frequencies

• 3) Transform to rank scale through inverse binomial

• 4) ‘Match ranks to transformed ranks through trial and error’ (???)

• 5) Transform fitted ranks through inverse normal.

Influence• This gave Quetelet mathematical

justification for the average man

• He asks: can we tell the difference between these measurements, and very inaccurate measurements on a single soldier?

• Normal can only arise through Accidental causes: All is NORMAL!

Part VII: Quetelet’s Fallout

or: The Good, the Bad, and the Statistical

Francis Galton (1822-1911)• Primary work in the 1870s• Discovered Genetics independently of

Mendel• Coined the phrase ‘regression to the mean’• Developed several intelligence tests• Mentor to Karl Pearson; Cousin to Darwin• Found direct precursor to Pearson’s r2

• Often considered the father of social science

• Often mistakenly credited for Quetelet’s work on the Normal

Theory of Heredity

• Firmly believed that performance was based solely on genetics

• Severely discounted education/life experience

• Concerned with intelligence, strength and beauty- thought all were dependent on each other

Fallacy• Armed with:

- his belief in heredity

- Darwin’s theory of evolution

- Quetelet’s many Average Men

• Reached startling conclusion: groups of people can be mathematically shown to be inferior to others!

Eugenics

• Therefore, we must ‘improve the human stock’

• Galton’s methods:

- encourage matings between desirable people

- forced sterilization of the truly unfit

(criminals, the insane, etc.)

• Science largely accepted in late 19th century England

• Pearson was Chair of Eugenics at Oxford!

Theory to Practice

• Most infamously adopted in Germany,

1930-1945

• Justified concept of ‘Aryan Master Race’

• Sterilization upgraded to genocide

• Obviously, today Eugenics is widely condemned

• Galton’s Eugenics merely ‘bad’, not ‘monstrous’

Quetelet’s Fault?

• Made few value judgments in comparison (e.g. only found one highly qualified mention of racial intelligence)

• Considered the Average Man to be ‘beautiful,’ not ‘mediocre’

• Advocated social reform (education, increased government spending) – not the gradual breeding out of the inferiors

• That is: NO!!!

Florence Nightingale (1820-1910)

• Studied statistics extensively under her friends Quetelet and William Farr

• Strong believer that statistics was evidence of the Divine Mind: Statistics

was her religion

• Worked extensively in wartime hospitals, saving many lives

• Used statistics to do so!

Hospital Sanitation• Germ Theory of disease not understood

• Hospitals – especially at war – lacked even basic methods of sterilization

• Demonstrated that Dr. Lister’s antiseptic surgical implements saved many lives- using Quetelet’s statistical methods

Eulogy for Quetelet• “Quetelet has shown us the path we must go on if

we are to discover the laws of the Divine Government of the Moral World.”

• “It is not understood that human actions are – not subordinate, but – reducible to general laws . . . Of these at present, we know hardly any. Our object in life is to ascertain what they are.”

• “A fitting memorial to Quetelet would therefore be the introduction of his science in the studies of Oxford”.

Overview of Quetelet’s Statistical Contributions

• Did much to firmly establish statistics as a reputable science, and to mathematicize the Social Sciences

• The Average Man is an enduring paradigm for statistical and social reasoning

• Showed basics of data analysis, hypothesis testing, and analysis of variance

• Demonstrated that natural human traits are normally distributed

THE END