sta 114, spring 2008 probability and statistics instructor: sayan mukherjee ta: quanlin li stat 113
TRANSCRIPT
Sta 114, spring 2008
Probability and statistics Probability and statistics
Instructor: Sayan Mukherjee
TA: Quanlin Li
STAT 113
Sta 114, spring 2008
There are three kinds of lies: lies, damned lies, and statistics.
B. Disraeli
Perspectives on stats
Sta 114, spring 2008
What is probability ?What is probability ?
Probability is a branch of mathematics that deals with calculating the likelihood of a given event's occurrence, which is expressed as a number between 1 and 0.
Sta 114, spring 2008
What is statistics ?What is statistics ?
Statistics derives from: Latin -- statisticum collegium ("council of state")Italian -- statista ("statesman" or "politician").
Statistik: German first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, or the "science of state". Acquired the meaning of the collection and classification of data generally in the early 19th century.
Statistics as inverse probability -- estimating parameters from experimental data
Sta 114, spring 2008
Well-posed problems
A problem is well-posed if its solution
• exists
• is unique
• is stable, eg depends continuously on the data
Inverse problems are typically ill-posed
Sta 114, spring 2008
Class requirements and rulesClass requirements and rules
Course webpage
Sta 114, spring 2008
First digitsFirst digits
List of world records
Count entries starting with: {1,2,3,4,5,6,7,8,9}
Count entries ending with: {1,2,3,4,5,6,7,8,9}
Accounting fraud
Sta 114, spring 2008
What’s wrong with the heartland ?What’s wrong with the heartland ?
Sta 114, spring 2008
It’s the emptinessIt’s the emptiness
Sta 114, spring 2008
The geometry of randomness
Dido’s problem (Isoperimetry) : Among all closed level curves of fixed length, find the one that encloses the largest area.
A
A
Sta 114, spring 2008
The geometry of Gaussian random variables
A Gaussian distribution:
Sta 114, spring 2008
The geometry of Gaussian random variables
A draw of n Gaussian random variables is a point in an n-dimensional space. How far from the origin is this point ?
x x12 x2
2 ... xn2
For n large the answer is that with very high probability
1c
nx
n1
c
n
Sta 114, spring 2008
Law of large numbers or central limit theorem
The previous observation is a special case of the following phenomena:
Given a smooth function of n variables
x (x1,...,xn ) the following is true
Pr f x x f x h C1 exp C2h2n .
A classic example : f (x) x1 x2 ... xnn
.
Sta 114, spring 2008
Geometry of real dataGeometry of real data
Digits in spaceDigits in space
Mandarin tonesMandarin tones
Sta 114, spring 2008
Regression -- pedestrian detection
Papageorgiou and Poggio, 1998
Sta 114, spring 2008
Daimler ChryslerDaimler Chrysler
Sta 114, spring 2008
Experimental MercedesExperimental Mercedes
A fast version, integrated with a real-time obstacle
detection system
MPEG
Constantine Papageorgiou
Sta 114, spring 2008
People classification/detectionPeople classification/detection
Stuttgart
STA 293 03, fall 2005
More regression: talking faces
Text-to-visual-speech (TTVS) systems:
STA 293 03, fall 2005
More regression: talking faces
• Hunter
• Its automatic
• Today show
STA 293 03, fall 2005
Conclusion
Statistics is about predictive modeling that quantifies uncertainty
There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know.
---- Donald Rumsfeld