univariate random variables geog 210cchris/lecture3_210c_spring2011... · chris funk lecture 3....

38
Univariate Random Variables Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 3

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Univariate Random VariablesGeog 210C

Introduction to Spatial Data Analysis

Chris Funk

Lecture 3

Page 4: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Bootstrapping

Term inspired by Rudolf Erich Raspe’s The Surprising Adventure’s of Baron von Munchausen

The Baron, trapped at the bottom of the ocean, pulls himself up by the bootstraps

Used by Bradley Efron in 1983 to refer to a technique for

Estimating the uncertainty of statistical parameters

Based on resampling (with replacement) from the observed set of data (as opposed to a theoretical distribution)

C. Funk Geog 210C Spring 20114

Baron von Munchausen

Page 5: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Bootstrapping versus Monte Carlo Simulation

Bootstrapping draws from the observational data setMonte Carlo Simulation draws from the cumulative distribution function (CFD)

Empirical CDF or

Theoretical CDF

5

Page 6: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 20116

Random Variables: Some Definitions

Random variable (RV):variable, say X, with series of possible outcomes (realizations),i.e., x-valuese.g., total number of members in the households of a city

Probability distribution:a table, graph, or mathematical function, that links potential outcomes of a RV with probabilities of their occurrencee.g., probability of a household selected at random to have x members

Discrete and continuous RVs:discrete: RVs taking particular values (finite or countably infinite) e.g., counting variables such as population, number of accidents on a road, number of floods or earthquakes in a regioncontinuous: RVs taking infinitely many values e.g., height, temperature, speed, distance

Page 7: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Bootstrapping Example-WRSI in Southern Africa

Context: in October of 2002, El Nino conditions quickly evolvedStatistical Question: What was the likely impact of El Nino on Southern African Crop production?Science Question: How do El Nino teleconnections interact with rainfall and crop water requirements in Southern Africa?Method

Use a long time-series of rainfall to drive a gridded water requirement satisfaction index (WRSI) modelUse bootstrapping to assess which areas had below normal WRSI values during El Nino events

C. Funk Geog 210C Spring 20117

Page 8: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Crop Phenology

C. Funk Geog 210C Spring 20118

Page 9: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Crop Water Requirements

C. Funk Geog 210C Spring 20119

Page 10: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Technique (run at each pixel)

C. Funk Geog 210C Spring 2011

10

Long term average WRSI 1. Translate data into anomaliesWRSI’ = WRSI-WRSIavg

2. Assume ‘n’ El Nino years, 3. Calculate 1,000 samples of ‘n’-year av

WRSI anomalies4. Sort all 1,000 average anomalies,

identify the 5% percentile value5. If the average El Nino anomaly is

less than this value, it is significantat the 95% level

Page 11: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Weak and moderate-too-strong El Nino WRSI anomalies

11

Weak El Nino anomalies Weak-to-Strong Anomalies

Bootstrapping used to assess significance at the 95% level

Page 12: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

El Nino Rainfall Anomalies

12

The inter-occular accuity test?

Page 13: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Forecast Interpretation using Monte Carlo Simulation

Fit distributionDraw sample from distributioncontingent on probabilisticforecastRefit distribution to resampleddataEstimate values of interest

Change in meanProbability of exceedance10th and 90th percentile rainfall

C. Funk Geog 210C Spring 2011

13

Page 14: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

FIT Process

C. Funk Geog 210C Spring 2011

14

Datos históricos

0 500 1000 1500 2000quanti le

0.0000

0.0005

0.0010

0.0015

Probability Map Map of Rainfall

Page 15: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Fit Process

C. Funk Geog 210C Spring 2011

15

0 500 1000 1500 2000quanti le

0.0000

0.0005

0.0010

0.0015

33% 34% 33%

GuatemalaClimatologyMean 955 mmStdd 257 mm

Forecast 25% 35% 40%60 Samples 15 21 34

0 500 1000 1500 2000FIT

0.0000

0.0005

0.0010

0.0015

InterpretedForecast Distribution

Page 16: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Original and Forecasted Distributions for Catacamas

Climatic Distribution: Mean of 621, Std. Dev of 118Forecast of 20/35/45 for hi/mid/loFIT distribution: Mean of 650, std. Dev of 105

C. Funk Geog 210C Spring 2011

16

200 300 400 500 600 700 800 900 10000

0.5

1

1.5

2

2.5

3

3.5

4x 10

-3 Effects of Climate Forecast on Rainfall Distribution

Rainfall, mm

Pro

babi

lity

Original Mean

Forecasted mean

Page 17: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

Fit Example (March 2011)

C. Funk Geog 210C Spring 2011

17

March – May 2011 Seasonal PerformanceForecasts

Greater Horn of Africa Consensus Climate Outlook for March to May

2011

Source: ICPAC

Rainfall outlooks for March-May 2011

Source: KMA

MAM 2011 Seasonal forecastProbability of ML category of

precipitation

Source: ECMWF

Source: Ethiopian NMA

Page 18: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

18

Probability and Cumulative Mass Functions

Probability distribution function (PDF):tabulation of occurrent probabilities for outcomes of a discrete RV:

e.g., PDF of household members

Cumulative distribution function (CDF):cumulative form of PDF:

Page 19: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

19

Expected Value of a Discrete RV

Expected value:mean E{X} of RV X using N outcomes {x1, . . . , xN}:

probability weighted sum of N outcomes; NOTE:the expected value could be an impossible outcome, e.g., the mean of an integer-valued discrete RV need not be an integer

Example:household data:

expected value (mean):

note that the mean value 3.75 is not an integer

Page 20: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

20

Expectation of a Linear Combination of a RV

expectation E{Y} of a RV Y defined as function y = h(x) of RV X:

Special cases:expectation of a constant RV X = c:

expectation of a RV Y = cX, i.e., product of a RV X with a constant:

expectation of a RV Y = a + bX, i.e., a linear combination of a RV X:

expectation of a linear combination of a RV X= linear combination of its expectation E{X};=> expectation is a linear operation (same as summation)

Page 21: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

21

Variance of a Discrete RV

expected value of squared deviations from mean μ = E{X}:or, expectation of a function h(x) = (x − μ)2 of a RV X:

computational formula: V {X} = E{(X − μ)2} =

Household data example:

Page 22: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

22

Binomial Distribution (1)

General remarks:situations where, on a number of N >1 experiments (trials), one or the other of two MECE events: (i) success (coded as 1), or (ii) failure (coded as 0), will occurRV of interest X = number of event occurrences (sum of zeros and ones) over N trials; X can take N + 1 integer outcomes: from 0 (event absent in all trials) to N (event present in all trials)binomial distribution used to calculate the probabilities that X attains any of the N + 1 possible outcomes

Conditions of applicability:1. probability of event occurrence does not change from trial to trial e.g., in a coin tossing experiment, if the coin is “fair”, the probability of either heads or tails is 0.5, and does not change from one coin flipping to another (or from one coin to another)2. outcomes of each of N trials are mutually independent e.g., if N >1 coins are flipped simultaneously, outcomes of one coin do not affect outcomes of other coins

Page 23: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

23

Binomial Distribution (2)

Binomial PDF:

(1) combinatorial part: (N choose x)

= number of distinct ways of getting x success outcomes (x ones) from a collection of N trials

(2) probability part:

Fitting a Binomial PDF to data:estimate parameters N and π from sample datafind probability for any particular number of successes x, for given π and for required Nspecial case: N = 1 corresponds to the Bernoulli distribution

Page 24: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

24

Binomial Distribution (3)

Binomial PDF:

Example:years in which a lake has frozen in a 200 year record:

whether the lake freezes in a particular year (success) depends only on the conditions of that year, not on those of previous yearsunder the assumption of no climate change, the probability πthat the lake will freeze in any year is constant over the record: π = 10/200 = 0.05

Requisites:probability that lake freezes exactly once (x = 1) in N = 10 years:

probability that lake freezes at least once (x ≥ 1) in N = 10 years:

lake cannot freeze exactly once and exactly twice each year: the events are mutually exclusive for a Binomial RV X: E{X} = Nπ, and V {X} = Nπ(1 − π)

Page 25: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

25

Geometric Distribution

General remarks:situations where, on a experiment (trial), one or the other of two dichotomous events:(i) success (coded as 1), or (ii) failure (coded as 0), will occur

RV of interest X = number of trials that will be required to observe the next successGeometric distribution used to calculate the probabilities that X attains any positive integer valueConditions of applicability:

1. outcomes are independent from one trial to another, and probability of event occurrence does not change from trial to trial (same conditions as for Binomial)2. trials must occur in a sequence

Geometric PDF:

Examples:model for trials that occur consecutively in time: waiting time distribution, e.g., lengths of weather regimes such as “spells” of dry periods

Page 26: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

26

Poisson Distribution (1)*

General remarks:used for modeling number of discrete events occurring in a sequence, e.g., number of hurricanes over a time period, or number of gasoline stations along a portion of a highwayindividual events being counted are independent, i.e., the probability of event occurrence in an interval depends only on the size of that interval, not on where the particular interval is located or on how often events have been observed in other intervalsPoisson events occur randomly, but at a constant rate, a sequence of Poisson events is said to stem from a Poisson process

Poisson PDF:

Fitting a Poisson PDF to data:estimate intensity from sample data by taking their averageequate that intensity to theoretical population parameter μ

*How many fish are in the sea?

Page 27: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

27

Poisson Distribution (2)

Data set:# of tornados per year reported in NY state from 1959-1988; rate of tornado occurrence: μ = E{X} = 138/30 =4.6 tornados/year

Sample and Poisson-derived probabilities:stem plot: sample frequencies of # of tornados per year for x = {0, 1, . . . , 12}Poisson probabilities with mean 4.6 evaluated at x = {0, 1, . . . , 12}, and superimposed on stem plot

Why bother about fitting theoretical distributions to data:smoothing sampling variations due to limited number of datacondense information with parametric distributionsextrapolate probabilities of events not seen in the sample data

ignore thick lines connecting bullets of the Poisson PDF, since it is discrete

Page 28: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

28

Uniform Distribution

General remarks:situations where any particular event out of a total of K events has equal probability of occurrence as any other eventuniformly distributed events are said to denote complete independence (or lack of knowledge)

Uniform PDF:

Page 29: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

29

Continuous RVs: Some Definitions (1)

Probability density function (PDF):tabulation of probabilities of occurrence of (class) outcomes of a continuous RV:

Cumulative density function (CDF):

ε denotes a very small positive number

Page 30: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

30

Continuous RVs: Some Definitions (2)

Quantile function or inverse CDF:

Monte Carlo simulation:procedure of sampling from a CDF:1. draw (simulate) a random number pi in [0, 1]2. retrieve a simulated quantile as: 3. repeat steps 1-3 S times to get a set of S simulated values

simulated values that are distributed according to CDF FX(x)used for uncertainty propagation in model predictions

Page 31: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

31

Continuous RVs: Some Definitions (3)

Link between CDF and PDF:the PDF is the derivative of the CDF:

if the CDF has discontinuities, then the PDF is not defined⇒ the CDF is always defined even if the PDF is not

Expected value of a continuous RV:mean E{X} = μ of RV X:

Variance of a continuous RV:expected value of squared deviations (X − μ)2 from mean μ = E{X}:

probability weighted sum of infinitely many outcomes

Page 32: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

32

Uniform Random Variable

All outcomes within an Interval are equiprobable

Uniform PDF Uniform CDF

Page 33: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

33

Exponential Random Variable

Continuous equivalent of geometric distribution; used for modeling random (waiting) times, e.g., radioactive decay

Exponential PDF: Exponential CDF

λ > 0 is interpreted as the rate of events over the unit interval

Page 34: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

34

Monte Carlo Drawing from Exponential RVs

Exponential CDF:

Monte Carlo simulation:generate uniform random numbers in [0, 1] (in Matlab use rand):p = [0.2140 0.6435 0.3200 0.9601 0.7266 0.4120 0.7446 0.2679 0.4399 0.9334]

use quantile function, i.e., solve FX(x) = p with respect to x:

to compute simulated values:x = [0.4816 2.0628 0.7713 6.4428 2.5936 1.0621 2.7298 0.6237 1.1593 5.4181]

Page 35: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

35

Standard Normal (Gaussian) Random Variable

Most famous continuous distribution, with characteristic bell shape

PDF: CDF:

no analytical equation; approximated using numerical methodsfor a standard Gaussian RV X: E{X} = 0, and V {X} = 1

FX(xp) = p, FX(−xp) = 1 − p = FX(x1−p) =>x1−p = −xp

Prob {X in [−2, +2]} = FX(2) − FX(−2) = 0.977 − 0.023 = 0.954

Page 36: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

36

Normal (Gaussian) Random Variable

Most famous continuous distribution, with characteristic bell shape

PDF: CDF:

Page 37: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

37

Link Between Std Normal and Normal RVs

Gaussian and Std Gaussian PDF:

From a Gaussian to a Std Gaussian RV:normalize the x-data, i.e., get their z-scores

From a std Gaussian to a Gaussian RV:multiply the z data by the target std deviation σ and add the target mean μ:

Page 38: Univariate Random Variables Geog 210Cchris/Lecture3_210C_Spring2011... · Chris Funk Lecture 3. Monte Carlo Simulation -I General approach based on repeated random sampling of a distribution

C. Funk Geog 210C Spring 2011

38

Some Continuous Distribution Examples

Gaussian PDFs Gaussian CDFs Lognormal PDF/CDF