stochastic modelling and quasi-random numbers
Post on 16-Apr-2017
470 Views
Preview:
TRANSCRIPT
Stochastic models + quasi-random
(Teytaud, Tao (Inria), Lri (Paris-Sud), UMR-Cnrs 8623,
France;
OASE Lab, NUTN, Taiwan
First part: randomness.
What is a stochastic / randomized model
Terminology, tools
Second part: quasi-random points
Random points can be very disappointing
Sometimes quasi-random points are better
Useful maths
we will need these tools...
Prime number: 2,3,5,7,11,13,17,...
P(A|B): conditionning in probability.
P(dice=1 | dice in {1,2,3} ) ?
P(dice=3 | dice in {1,2} ) ?
Frequency in datas x(1),x(2),...,x(n):
1,2,6,3,7: frequency(odd) ?
frequency ( x(i+1) > x(i) ) ?
frequency ( x(i+1) > 3 | x(i) < 4 ) ?
Modulo: 4%3, 5%3, 7%10, (-1)%4 ? 0%1 ?
Let's take time for understanding random simulations
I guess you all know how to simulate a random variable uniform in [0,1]
e.g. double u=drand48();
But do you know how to simulate one year of weather in Tainan ?
Not so simple.
Let's see this in more details.
Random sequence
in dimension 1
What is a climate model ?
Define:
w1 = weather at time step 1w2 = weather at time step 2w3 = weather at time step 3w4 = weather at time step 4==> let's keep it simple, let's define the weather by one single number in [0,1].(think of temperature, or anything you want...)
I want a generative model
As well as I can repeat u=drand48(), and generate a sample u1, u2, u3, I want to be able to generate
W1=(w11,w12,w13,...,w1T)
W2=(w21,w22,w23,...,w2T)
W3=...
==> think of a generator ofcurves
Random sequence
in dimension 1
What is a climate model ?
Define:
w1 = weather at time step 1The models tells you how can be w1. For example, it gives the density function g: P(w1 in I) = integral of g on I
0
1
Take-home message number 1:
a random variable w on R is entirely defined by P(w in I)for each interval I
0
1
Random sequence
in dimension 1
P(w1 in I) = integral of g on IP(w1 w4 very high and w5 very low is unrealistic; but in this model it happens very often!
Generating wi: also easy with
inv. cumulative distribution ?
Realistic: large-scale variations!
Unrealistic;and average value
almost constant
So how can we do ?
A good model should not give the (independent) distribution of w2, but the distribution of w2 conditionnally to w1 !
w1=invG1( drand48());w2=invG2(w1, drand48());w3=invG3(w2, drand48());
==> does it make sense ? This is a Markov Chain.W1, w2, w3, should NOT be generated independently!
Variant
A good model should not give the (independent) distribution of w2, but the distribution of w2 conditionnally to w1 !
w1=invG1( drand48());w2=invG2( w1, drand48());w3=invG3(w2, w1, drand48());w4=invG4(w3, w2, drand48());w5=invG5(w4, w3, drand48());
==> order 2 Markov chain==> let's stay at order 1 for today
Let's see an example
Assume that we have a plant.
This plant is a function:
(Production,State,Benefit) = f( Demand , State , Weather )
Demand = g(weather,economy,noise)
(where Economy is the part of Economy which is not too dependent on weather)
Benefit per year
= expectation of sum of f3 (= benefit) on one year
Graphically
Weather:
w1, w2, w3, w4, w5, ==> random sequence==> we assume a distribution of w(i) | w(i-1)==> this is a Markov Model ( forget w(i-2) )
Economy
e1, e2, e3, e4, e5, ==> random sequence==> we assume a distribution of e(i) | e(i-1)
Noise = given distribution
==> n1, n2, n3, ....
Graphically
w1
w2
w3
w4
w5
means: dependency
e1
e2
e3
e4
e5
d1
d2
d3
d4
d5
The model should tell you how to generate d2, given d1, e2,w2.(ei,di,wi) is a Markov chain. (di) is a hidden Markov chain: a part is hidden.
How to build a
stochastic model ?
It's about uncertainties
Even without hidden models, it's complicated
We have not discussed how to design a stochastic model (typically from historical archive):
Typically, discretization: w(k) in I1 or I2 or I3 with I1=[-,a], I2=]a,b], I3=]b,]
G(w,w')= frequency of w(k+1) typically, extreme values are more rare in models than in reality
Check the extreme events
Usually, it's good to have more extreme values than datas (because all models tend to make them too rare...).
Example: French climate
France has a quite climate
No big wind
No heavy rains
No heat wave
But:
2003: huge heat wave. 15 000 died in France.
1999: hurricane-like winds (96 died in Europe; gusts at 169 km/h in Paris)
1987: huge rain falls (96 mm in 24 hours)
6.2 times morethan 921 earthquake!
Example: 2003 heat wave
Paris:
9 days with max temp. > 35C
1 night with no less than 25.5C European countries were not ready for this
Example: 2003 heat wave
==> plenty of take-home messages
Bad model: air conditionning sometimes automatically stopped because such high temperatures = considered as measurement bugs ==> extreme values neglected
Heat wave + no wind ==> increased pollution
==> old people die (babies carefully protected...)==> pollution and temperature are not independent
Risk was underestimated:
Maybe (probably ?), climate change had an impact: archive not trustable might stop electricity production
- close to electricity breakdown, due to correlations demand/production- how many people would die in such a case ?
Example: 2003 heat wave
==> plenty of take-home messages
Be careful with extreme values neglected
==> extreme values are not always measurement bugs==> removing air conditionning because it's too hot... (some systems were not ready for such extremal temperatures)
Example: 2003 heat wave
==> plenty of take-home messages
Be careful with extreme values neglected
==> extreme values are not always measurement bugs
Independence is a very strong assumption
P(A) = 0.01 and P(B) = 0.01; what is P(A and B) ?
Example: 2003 heat wave
==> plenty of take-home messages
Be careful with extreme values neglected
==> extreme values are not always measurement bugs
Independence is a very strong assumption
P(A) = 0.01 and P(B) = 0.01; what is P(A and B) ?
do **not** answer 0.0001 ! ! !
Archive = not always trustable (in particular, weather)
Quasi-random points
(Teytaud, Tao (Inria), Lri (Paris-Sud), UMR-Cnrs 8623;
collabs with S. Gelly, J. Mary, S. Lallich, E. Prudhomme,...)
Quasi-random points ?
Dimension 1
Dimension n
Better in dimension n
Strange spaces
Quasi-random points ?
Why do we need random / quasi-random points ?
Numerical integration [thousands of papers; Niederreiter 92]
integral(f) nearly equal to
sum f(xi)
Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005]
Optimization [Teytaud et al, EA'2005]
Modelizat of random-process [Growe-Kruska et al, IEEEBPTP'03]
Path planning [Tuffin]
Where do we need numerical integration ?
Just everywhere.Expected pollution (=average pollution...)= integral of possible
pollutions as a function of many randomvariables(weather, defaults on pieces, gasoline, useof the car...)
Take-home message
When optimizingthe design of somethingwhich is built in a factory,take into account the variance in the productionsystem ==> all cars are different.
==> very important effect
==> real piece != specifications
Why do we need numerical integration ?
Expected benefit (=average benefit...)= integral of possible
benefit as a function of many randomvariables(weather, prices of raw materials...)
==> economical benefit (company)
==> overall welfare (state)
Why do we need numerical integration ?
Risk (=probability of failure...)= integral of possible
failures as a function of many randomvariables(quakes, flood, heat waves, electricity breakdowns, human error...)
Take-home message
Human error must be takeninto account:
- difficult to modelize- e.g. a minimum probability that action X
is not performed (for all actions) (or that unexpected action Y is performed) (what about an adversarial human ?)==> protection by independent validations
Why do we need numerical integration ?
Expected benefit as a function
of many prices/random variables,
Expected efficiency depending on machining
vibrations
Evaluating schedulings in industry (with
random events like faults, delay...) (e.g. processors)
How to know if some points
are well distributed ?
I propose N points x=(x1,...,xN)
How to know if these points are well distributed ?
A naive solution:
f(x)=max min ||y-xi|| (maximized) y i
(naive, but not always so bad)
How to know if some points
are well distributed ?
I propose N points x=(x1,...,xN)
How to know if these points are well distributed ?
A naive solution:
g(x)=min min ||xj-xi||2 (maximized)
i j!=i
= dispersion (naive, but not always so bad)
Low Discrepancy ?
Discrepancy = Sup |Area Frequency | Rectangle
Low Discrepancy ?
Discrepancy2 = mean ( |Area Frequency |2 )
Rectangle
Is there better than random points for low discrepancy ?
Random --> Discrepancy ~ sqrt ( 1/n ) Quasi-random --> Discrepancy ~ log(n)^d/nQuasi-random with N known --> Discrepancy ~ log(n)^(d-1)/n
Koksma & Hlawka :error in Monte-Carlo integration < Discrepancy x V
V= total variation (Hardy & Krause) ( many generalizations in Hickernel, A GeneralizedDiscrepancy and Quadrature Error Bound, 1997 )
==> sometimes V or log(n)^d huge==> don't always trust QR
Dimension 1
What would you do ?
Dimension 1
What would you do ?
Dimension 1
What would you do ?
Dimension 1
What would you do ?
Dimension 1
What would you do ?
Dimension 1
What would you do ?
Dimension 1
What would you do ?
--> Van Der Corput
n=1, n=2, n=3...
n=1, n=10, n=11, n=100, n=101, n=110... (p=2)
x=.1, x=.01, x=.11, x=.001, x=.101, (binary!)
Dimension 1
What would you do ?
--> Van Der Corput
n=1, n=2, n=3...
n=1, n=2, n=10, n=11, n=12, n=20... (p=3)
x=.1, x=.2, x=.01, x=.11, x=.21, x=.02... (ternary!)
Dimension 1 more general
p=2, but also p=3, 4, ...
but p=13 is not very nice :
Dimension 2: maybe just
use two Van Der Corput sequences with same p ?
x --> (x,x) ?
Dimension 2
x --> (x,x') ? with two different basis.
Dimension 2 or n : Halton
x --> (x,x') with diff. prime numbers is ok
(needsmaths...)(as smallnumbersare better,use the nsmallest...)
Dimension n+1 : Hammersley
(n/(N+1),xn,x'n) --> closed sequence
(i.e.,number Nknowninadvance)
Dimension n : the trouble
There are not so many small prime numbers
Dimension n : scrambling
(here, random comes back)
Pi(p) : [1,p-1] --> [1,p-1]
Pi(p) applied to coordinate with prime number p
Dimension n : scrambling
Pi(p) : [1,p-1] --> [1,p-1] (randomly chosen)
Pi(p) applied to coordinate with prime p (there is much more complicated)
Beyond low discrepancy ?
Other discrepancies : why rectangles ?
Other solutions : lattices {x0+nx} modulo 1
(very fast and simple)
Let's see very different approachesLow discrepancy for other spaces than [0,1]^n
Stratification
Symmetries
Why in the square ?
Other spaces/distributions:gaussians,sphere
Some animalsare quite goodfor low-discrepancy
Why in the square ?
Uniformity in the square is ok
But what about Gaussians distributions ?
x in ]0,1[^d
y(i) such that P( N > y(i) ) = x(i)with N standard gaussian
then y is quasi-random and gaussian
==> so you can have
quasi-random Gaussian numbers
Why in the square ?
Other n-dimensionnal random variables by the conditionning trickConsider a QR point: (x1,....xn) in [0,1]^nYou want to simulate z with distribution Zz1=inf { z; P(Z1x1 } = invG1(x1)z2=inf { z; P(Z2 x2 } = invG2(z1,x2)z3=inf { z; P(Z3 x3 } = invG2(z1,z2,x3)...==> ok for strange spaces or variables! ==> QR: choose the best ordering of variables (most important variables first)
Why in the square ?
Theorem: If x is random([0,1]n),
then z is distributed as Z !
==> convert the uniform square into strange spaces or variables
Why not for random walks ?
500 steps of random walks ==> huge dimension
Quasi-random basically does not work in huge dimension
But first coordinates of QR are ok; just use them for most
important coordinates! ==> change the order of variables and use conditionning !
coord1 for 250th point, coord2 for first point,
coord3 for 500th point.
Why not for random walks ?
Quasi-random number x in R^500
(e.g. Gaussian)
Change order: y(250) first (y(250) ---> x(1) )
y(1 | y(250) ) x(2)
y(500 | y(1) and y(250)) x(3)
Why not for random walks ?
500 steps of random walks ==> huge dimension
But strong derandomization possible :start by y(250), then y(1), then y(500), then y(125), then y(375)...
Why not for random walks ?
500 steps of random walks ==> huge dimension
But strong derandomization possible :
see e.g. Hickernel 1998 for nice generalizations drawback = rewriting the model
Very different approaches for derandomization ?
Symetries : instead of x1 and x2 in [0,1],
try x and 1-x
Or more generally, just draw n/2 points, and
use their symetries==> in dimension d, n/2d points and their 2d symetries
2d symetries of (n/2d) points better than n points!
==> antithetic variables==> roughly, it is almost always better, whereas quasi-random might be disappointing
Symmetries in Octave/Matlab
x=rand(800,2);subplot(2,2,1);plot(x(:,1),x(:,2),'+');
x=rand(400,2);x=[x;1-x];subplot(2,2,2);plot(x(:,1),x(:,2),'+');
x=rand(200,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];subplot(2,2,3);plot(x(:,1),x(:,2),'+');
x=rand(100,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];x=[x;x(:,2),x(:,1)];subplot(2,2,4);plot(x(:,1),x(:,2),'+');
Free !
Antithetic variables in Octave/Matlab
x=rand(800,2);subplot(2,2,1);plot(x(:,1),x(:,2),'+');
x=rand(400,2);x=[x;1-x];subplot(2,2,2);plot(x(:,1),x(:,2),'+');
x=rand(200,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];subplot(2,2,3);plot(x(:,1),x(:,2),'+');
x=rand(100,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];x=[x;x(:,2),x(:,1)];subplot(2,2,4);plot(x(:,1),x(:,2),'+');
Very different approaches for derandomization ?
Control : instead of estimating E f(x)
Choose g looking like f and estimateE (g-f)(x)
Then E f = E g +E(g-f) is much better
Troubles:
You need a good g
You must be able of evaluating Eg
Very different approaches for derandomization ?
Pi-estimation : instead of estimating E f(x)
Look for y with density (f)d(x)
Then E f(x) = E f(y) d(x)/d(y) ==> Variance is much better
Troubles:
You have to generate y
You have to know (f)
Very different approaches for derandomization ?
Stratification (jittering) :
Instead of generating n points i.i.d
Generate
k points in stratum 1
k points in stratum 2
...
k points in stratum mwith m.k=n ==> more stable==> depends on the choice of strata
Simple stratification
x=rand(40,2);subplot(1,2,1);plot(x(:,1),x(:,2),'+');
x=[ 0.5*rand(10,2); 0.5+0.5*rand(10,2); 0.5*rand(10,1),0.5+0.5*rand(10,1); 0.5+0.5*rand(10,1),0.5*rand(10,1)]; subplot(1,2,2);plot(x(:,1),x(:,2),'+');
Simple stratification
Summary on MC improvements ?
In many books you will read that quasi-random points are great.
Remember that people who spend their life studying quasi-random numbers will rarely conclude that all this was a bit useless.
Sometimes it's really good.
Sometimes it's similar to random.
Modern Quasi-Monte-Carlo methods (randomized) are usually at least as good as random methods ==> no risk.
Jittering / strata / symmetry usually very good.
Summary on MC improvements ?
Carefully designing the model (from data) is often more important than the randomization.
Typically, neglecting dependencies is often a disaster.
Yet, there are cases in which improved MC are the key.
Remarks on random search: dispersion much better than discrepancy...
Biblio (almost all on google)
Pi-estimation books for stratification, symmetries, ...
Owen, A.B. "Quasi-Monte Carlo Sampling", A Chapter on QMC for a SIGGRAPH 2003 course.
Fred J. Hickernell, A generalized discrepancy and quadrature error bound, 1998
B. Tuffin, On the Use of low-Discrepancy sequences in Monte-Carlo methods, 1996
Matousek, Geometric Discrepancy (book 99)
these slides : http://www.lri.fr/~teytaud/btr2.pdfor http://www.lri.fr/~teytaud/btr2.ppt
top related