random number generation using low discrepancy points
DESCRIPTION
Random Number Generation Using Low Discrepancy Points. Donald Mango, FCAS, MAAA Centre Solutions June 7, 1999 1999 CAS/CARe Reinsurance Seminar Baltimore, Maryland. What is Discrepancy?. Large # of points inside a unit hypercube : n-dimensional hypercube of length 1 on each side - PowerPoint PPT PresentationTRANSCRIPT
Random Number GenerationUsing Low Discrepancy
PointsDonald Mango, FCAS, MAAA
Centre Solutions
June 7, 1999
1999 CAS/CARe Reinsurance Seminar
Baltimore, Maryland
What is Discrepancy?
• Large # of points inside a unit hypercube :n-dimensional hypercube of length 1 on each side• For any “sub-volume” of the hypercube,
Discrepancy = the difference between
the proportion of points inside the volumeand
the volume itself
Low Discrepancy Point Generator:
• Method to generate a set of points which fills out a given n-dimensional unit hypercube, with as little discrepancy as possible
• Attempt to be systematic and efficient in filling a space, given the number of points
• My paper discusses “Faure” Points, just one of many alternatives
• Faure method relies on prime numbers
Other Low Discrepancy Point Generators:
• Named after number theorists: Sobol’, Neiderreiter, Halton, Hammersley, ...
• More advanced methods use “irreducible polynomials” -- polynomial equivalents of prime numbers (cannot be factored)
• More complex algorithms • Less flexible than Faure
Linear Congruential Generator:• Xn+1 = (aXn + c) mod m
• Used in spreadsheets -- RAND() in Excel, @RAND in Lotus
• Sequential• Cyclical, with a long cycle length or “period”• “Randomized” in spreadsheets by using a
random seed value ( X0 ) = the system clock
LDPMAKER Excel 97 Workbook:
• Available in the 1999 Spring Forum section of the CAS Website:
www.casact.org/pubs/forum/99spforum/99spftoc.htm
• Includes both:• A spreadsheet-only calculation (recalc-driven),
and• A Visual Basic for Applications (VBA) macro-
driven generator (run with a button)
LDPMAKER Excel 97 Workbook:
• “Example” sheet is spreadsheet-only calculation
• Demonstrates formulas• Not very flexible
Example: 4 Dimensions, 24 Iterations• Dimension #1:
• First, convert each iteration number N to base Prime (= 5)
• Iteration 1 = 01base5Iteration 10 = 20base5
• F(N, 1) = Faure point (Iteration N, Dimension 1)F(1,1) = 0/52 + 1/5 = 0.20F(10,1) = 2/52 + 0/5 = 0.08
Example: 4 Dimensions, 24 Iterations• Dimension #2:
• Start with the base Prime digits from Dimension #1 and “shuffle” them
• Using combinations, sum of digits and MOD operator
• First digit in Dimension #2 = [ Sum (first digit, second digit) from Dimension #1 ] MOD Prime
•Dimension #1, Iteration 10 = 20base5Dimension #2, Iteration 10 = 22base5
• Formula for F(N,2) is the same
Example: 4 Dimensions, 24 Iterations• Dimensions #3 and higher:
• Start with the base Prime digits from the previous dimension and “shuffle” them
• Formula for F(N,3) ... is the same
Loops in the Faure Algorithm:
• Fills out the space in ever-larger loops of ever-smaller spacing
• Fills out the space sequentially• There MAY be an issue with ending the
iterations in the middle of one of these loops • Examples later in the test results...
Visual Basic for Applications (VBA) Version:
• VBA = real programming language• Recursive algorithm using “dynamic arrays” -
arrays which are dimensioned (sized) at run-time• Generalization of spreadsheet-only calculations• FAST
Performance Test #1:Sum of Limited Paretos
Test # /Pareto #
B Q Policy Limit Limited Expected Value
1 / 1 10,000 1.10 100,000 21,3211 / 2 15,000 1.30 250,000 28,874
Test # 1 Theoretical Result 50,1942 / 1 10,000 1.10 50,000 16,4042 / 2 15,000 1.30 25,000 12,7452 / 3 25,000 1.20 40,000 21,7442 / 4 12,500 1.40 50,000 14,8342 / 5 30,000 2.00 25,000 13,636
Test # 2 Theoretical Result 79,364
Table 2 (from Paper) - Pareto Parameters
Performance Test #1:Sum of Limited Paretos
# of Iterations LDP Value LDP % Error RAND() Value RAND() % Error
250 49,170 -2.04% 47,573 -5.22%
728 50,022 -0.34% 50,267 0.15%
1,000 49,769 -0.85% 49,640 -1.10%
1,500 49,903 -0.58% 51,307 2.22%
2,186 50,137 -0.11% 50,737 1.08%
Table 3: Sum of 2 Limited Paretos
Performance Test #1:Sum of Limited Paretos
Table 4: Sum of 5 Limited Paretos
# of Iterations LDP Value LDP % Error RAND() Value RAND() % Error
342 79,319 -0.06% 80,179 1.03%
1,000 79,201 -0.21% 78,837 -0.66%
1,500 79,206 -0.20% 79,088 -0.35%
2,000 79,280 -0.11% 79,049 -0.40%
2,400 79,358 -0.01% 79,154 -0.27%
Performance Test #2:Sum of Poissons
Table 5: Sum of 2 Poissons ( = 8)
# of Iterations LDP % Error RAND() % Error250 -0.42% 1.30%728 -0.03% 0.64%
1,000 -0.22% 0.23%2,000 -0.09% -0.08%2,186 -0.01% 0.17%
Performance Test #2:Sum of Poissons
Table 6: Sum of 5 Poissons ( = 8)
# of Iterations LDP % Error RAND() % Error342 -0.24% 0.78%
1,000 -0.20% 0.59%2,000 -0.11% -0.22%2,400 -0.04% -0.23%
Performance Test #3:Low Frequency Events
Pareto # B Q Policy Limit Limited Expected Value1 10,000 1.30 50,000 13,860
Test #1 Theoretical Result 6932 25,000 1.60 50,000 20,1133 5,000 1.10 50,000 10,660
Test #2 Theoretical Result 2,232
Table 7 - Pareto Parameters used for Severity
Performance Test #3:Low Frequency Events
Table 8: One Event, 5% Prob of Occurrence
# of Iterations LDP Value LDP % Error RAND() Value RAND() % Error250 563 -18.82% 1,009 45.60%728 615 -11.19% 657 -5.23%
1,000 670 -3.27% 569 -17.93%1,500 667 -3.81% 613 -11.58%2,186 690 -0.50% 662 -4.45%
Performance Test #3:Low Frequency Events
Table 9: Two Events, each with 5% Prob of Occurrence
# of Iterations LDP Value LDP % Error RAND() Value RAND() % Error342 2,199 -1.46% 3,175 42.26%
1,000 2,251 0.86% 2,456 10.04%1,500 2,221 -0.49% 2,295 2.83%2,400 2,204 -1.22% 2,348 5.20%
Performance Test #4:99th Percentile of Sum of NormalsTable 10 - Normal Parameters
Test # /Normal #
Mean StdDev
99th
Percentile1 / 1 2,000 750 -1 / 2 1,000 500 -
1 Combined 3,000 901.4 5,0972 / 1 1,000 300 -2 / 2 1,000 800 -2 / 3 500 300 -2 / 4 750 600 -2 / 5 2,000 100 -
2 Combined 5,250 1090.9 7,788
Performance Test #4:99th Percentile of Sum of NormalsTable 11 - 99th Pctle of Sum of 2 Normals
# of Iterations LDP Value LDP % Error RAND() Value RAND() % Error250 5,084 -0.25% 4,800 -5.82%728 5,036 -1.19% 4,898 -3.91%
1,000 4,995 -2.00% 4,934 -3.19%1,500 5,047 -0.98% 4,989 -2.12%2,186 5,070 -0.52% 4,967 -2.55%
Performance Test #4:99th Percentile of Sum of NormalsTable 12 - 99th Pctle of Sum of 5 Normals
# of Iterations LDP Value LDP % Error RAND() Value RAND() % Error342 7,661 -1.63% 7,524 -3.38%
1,000 7,808 0.26% 7,653 -1.73%1,500 7,808 0.26% 7,650 -1.76%2,400 7,804 0.21% 7,703 -1.09%
Performance Test #5:Mixed Bag
• Sum of 5 each from:• LogNormal• Pareto• Uniform• Normal
• Testing variability of estimates over 10 runs
Performance Test #5:Mixed Bag
# of Iterations LDP Average %Error
LDP Std Dev of% Error
Rand Average% Error
Rand Std Devof % Error
250 -10.39% 0.33% -0.36% 5.51%500 -2.28% 0.71% -3.03% 7.79%
1,000 -0.47% 1.36% -0.76% 4.39%1,500 -0.41% 0.69% -0.67% 4.62%2,000 -0.41% 0.62% -1.40% 4.01%3,000 -0.72% 0.47% -1.17% 2.79%
Table 14 - Avg % Error and Std Dev of % Error over 10 runs
Possible Concerns in Using LDPs
• Unused Dimensions:• Example: modeling Excess Claims• # of Excess claims between 0 and 30
•requires 30 dimensions• If # claims < 30, are the “used” dimensions
still filled out with low discrepancy?• Dr. Tom?
Possible Concerns in Using LDPs
• Time Series:• Example: Probability of 2 consecutive
years of loss ratio exceeding 75%• How many dimensions is this problem?• Can’t use a single dimension of LDPs,
because they are sequentially dependent• Need to know “over how many years”,
then set dimensions
Possible Concerns in Using LDPs
• Correlation:• If two variables are
•100% correlated ==> 1 dimension• 0% correlated ==> 2 dimensions• x% correlated ==> ? dimensions
• Is promise of “low discrepancy” still fulfilled?
• How to implement?
Possible Concerns in Using LDPs
• Loop Boundaries:• Faure algorithm fills out space
sequentially in ever-expanding loops of ever-finer granularity
• If iteration count does not finish on a loop boundary (depends on Prime), there may be potential bias...
• See Appendix B of paper