estimation and uncertainty 12-706/73-359 original lecture by h. scott matthews, cmu sept 24, 2003
Post on 19-Dec-2015
213 views
TRANSCRIPT
Estimation and Uncertainty
12-706/73-359Original lecture byH. Scott Matthews, CMUSept 24, 2003
Fermi Problems
Estimating an unknown quantity is sometimes called a “Fermi problem,” after physicist Enrico Fermi Wanted to show students they had the power
to do estimation His first problem: “How many piano tuners are
there in Chicago?”
Sample Fermi Problems
How much tea is there in China? How may pounds of human hair are cut every day? How many leaves are there on all the trees in the
world? If you got a penny for each time someone said
“Damn!" in the United States, how long would it take you to become a billionaire?
What area of the Earth would it take to supply the U.S. with all its energy needs if solar energy could be converted with 1% efficiency? Solar energy at Earth is about 1 kW/m2.
Cobblers in the US – Method 1
Cobblers repair shoes On average, assume 20 min/task Thus 20 jobs / day ~ 5000/yr
How many jobs are needed overall for US? I get shoes fixed once every 4 years
About 280M people in US Thus 280M/4 = 56 M shoes fixed/year
56M/5000 ~ 11,000 => 10^4 cobblers in US Sensitivity:
Am I representative? Are all shoe repairs done by cobblers? Do cobblers work 8 hours per day?
Cobblers in the US – Method 2
Greater Pittsburgh Yellow Pages has 36 entries under “Shoe Repairing”
Assume each repair shop has two employees. 72 in greater Pittsburgh
Population of greater Pittsburgh = 2.3 million (2000 Census) = 0.82% of U.S.
Number of cobblers in U.S. = 72/0.0082 = 8780 Sensitivity:
Is Pittsburgh representative? Is “greater Pittsburgh” the right area for the Yellow
Pages? Average number of employees of a shoe repair shop
Cobblers in the US
Methods 1 and 2 give “close” answers: 11,000 v. 8780
Actual: Census Dept says 5,120 in US Depends on accuracy of job counting in Census Listing of occupations Full-time vs. part-time Number of responses received
Problem of Unknown Numbers
If we need a piece of data, we can: Look it up in a reference source Collect number through survey/investigation Guess it ourselves Get experts to help guess it
Often only ‘ballpark’, ‘back of the envelope’ or ‘order of magnitude needed Situations when actual number is unavailable
or where rough estimates are good enough E.g. 100s, 1000s, … (102, 103, etc.)
Methodology
First develop an upper bound and a lower bound. This will allow to do a “sanity check” on the answer
Use at least two independent methods of estimation and compare the answers
Identify sensitivity to errors in the data. For sensitive data, but sure you have good values
In the absence of “Real Data”
Are there similar or related values that we know or can guess? (proxies) Example: registered voters v. population
Are there ‘rules of thumb’ in the area? E.g. ‘Rule of 72’ for compound interest r*t = 72: investment at 6% doubles in 12 yrs
Set up a ‘model’ to estimate the unknown Linear, product, etc functional forms Divide and conquer
Methods
Similarity – do we have data that might apply to our problem?
Stratification – segment the population into subgroups, estimate each group
Triangulation – create models with different approaches and compare results
‘How much disk space to store every word you hear in a lifetime?’
How many words per day can you hear? 12 hours per day, 120 words per minute = 86,400
words/day = 33 million per year
How much disk space to store them? Average word < 10 characters, 330MB/year
Average lifetime? 75 years? Answer: < 25GB, less than the size of a laptop
‘How much energy used by lighting in US residences?’
Assume 25 light fixtures per house Assume each in use avg 2 hours per day Assume average fixture is 50W Thus each fixture uses 100Wh/day Each house uses 2500Wh/day 100 million households would use 250 million
kWh/day 91,300 million kWh/yr
‘How much energy used by lighting in US residences?’
Our guess: 91,300 million kWh/yr DOE: “lighting is 5-10% of household elec” http://www.eren.doe.gov/erec/factsheets/
eelight.html 2000 US residential Demand ~ 1.2 million million kWh
(source below) 10% is 120,000 million kWh 5% is 60,000 million kWh 2000 demand source:
http://www.eia.doe.gov/cneaf/electricity/epm/ epmt44p1.html
How many TV sets in the US?
Can this be calculated? Estimation approach #1:
Survey/similarity How many TV sets owned by class? Scale up by number of people in the
US Should we consider the class a
representative sample? Why not?
TV Sets in US – Method 2
Segmenting work from # households and # tvs per
household - may survey for one input Assume x households in US Assume z segments of ownership (i.e.
what % owns 0, owns 1, etc) Then estimated number of television
sets in US = x*(4z5+3z4+2z3+1z2+0z1)
TV Sets in US – By Segmentation
Assume 50 million households in US Assume 19% have 4, 30% 3, 35% 2,
15% 1, 1% 0 television sets Then
50,000,000*(4*.19+3*.3+2*.35+.15) = 125.5 M television sets
TV Sets in US – Method 3
Estimation approach #3 – published data
Source: Statistical Abstract of US Gives many basic statistics such as
population, areas, etc.
How well did we do?
Most recent data = 1997 But ‘recently’ increasing < 3% per year
TV/HH - 125.5 tvs, StatAb – 229M tvs, % error: (229M – 125.5M)/125.5M ~
82% What assumptions are crucial in
determining our answer? Were we right? What other data on this table validate our
models?
Some handy/often used data
Population of US 275-300 millionNumber of households ~ 100 millionAverage personal income ~$30,000
Good Assumptions
Justify and document your assumptions Have some basis in known facts or
experience Do not allow bias toward the answer affect
your assumptions Example: what will the inflation rate be next
year? Is past inflation a good predictor? Can I find current inflation? Should I assume change from current conditions? We typically use history to guide us
Notes on Estimation
Move from abstract to concrete, identifying assumptions
Draw from experience and basic data sources
Use statistical techniques/surveys if needed Be creative, BUT Be logical and able to justify Find answer, then learn from it. Apply a reasonableness test