load forecast uncertainty estimation using monte carlo...
TRANSCRIPT
Load Forecast Uncertainty Estimation using Monte Carlo Simulations
APRIL 26TH , 2018
Fernando Peña-Silva
Load Forecasting Analyst
Facilities & Transmission
69 kV Transmission Line
138 kV Transmission Line
230 kV Transmission Line
345 kV Transmission Line
Wreck
CoveCheticamp
Canso
Sheet Harbour
Lunenburg
Liverpool
Yarmouth
Weymouth
Digby Annapolis Royal
Middleton
Kentville
Onslow
Trenton
Amherst
Baddeck
Atlantic Ocean
HalifaxChester
Lingan
Pt. Aconi
Bay of Fundy
Tusket
Shelburne
Hopewell
Pt.Tupper
Truro
Port Hastings
NSPI MAJOR FACILITIES 2018
Hydro Generating Plants
Thermal Generating Plants
Combustion Turbine Generating Plant
Tuft's CoveBurnside
Gisborne
Brushy Hill
MerseyHydro
Major Transmission Substation
Lakeside
Canaan Rd
Milton
Bridgewater
Line routing is not to scale
Sydney
St. Croix
Tremont
Springhill
Interconnection
with NB Power
Wind Turbine Generating (Transmission)
Tidal Power Generating Plant
Biomass Power Generating Plant
Gulliver’s Cove
Pubnico Point30.6 MW
30.0 MW
51 MWDalhousie Mtn 62.1 MW
Glen Dhu
Bear Head22.0 MW
Lingan14.0 MW
Nuttby49.5 MW
Woodbine
31.5 MW
Memramcook
102 MWSouth Canoe Lake
13.8 MW
Sable Wind
Northumberland Strait
To
Newfoundland
Parrsboro
Antigonish
▪ Nova Scotia Power serves about 500K customers (about 1 million people)
▪ 10 year load forecast filed annually with Utility and Review Board (UARB).
▪ Prior to 2017, forecast provided high/low discrete scenarios based on economics, electrification, weather and large customers
▪ In 2016 the UARB’s consultant recommended that the sensitivity analysis should be broadened to encompass a wider range of assumptions
▪ Rather than trying to come up with more discrete scenarios, we decided to use a probabilistic approach (p10/p90) -> Monte Carlo Simulations
3
Context
44%
13%
11%4%
10%
11%
5% 2%
2017 Annual Production Volumes
Coal Natural gas Oil and petcoke Purchased power - other
Wind and hydro Purchased Power IPP Purchased Power COMFIT Biomass - renewables
4
Generation Mix
5
Trends of the 3 major customer groups
6
Current regression model (reporting)
Load Forecast SAE Model (Residential)
End Use:
AC Saturation
AC Efficiency
Thermal Efficiency
Home Size
Economic:
Income
Retail Sales
Household Size
Price
End Use:
Heating Saturation
Resistance/Heat Pump
Heating Efficiency
Thermal Efficiency
Home Size Economic:
Income
Retail Sales
Household Size
Price
End Use:
Saturation Levels
Water Heat
Appliances
Lighting Densities
Plug Loads
Appliance Efficiency
Economic:
Income
Retail Sales
Household Size
Price
Heating
Degree DaysCooling
Degree Days
Billing
Days
XCool XHeat XOther
AvgUsem a bc XCoolm bh XHeatm bo XOtherm cReported DSM + e
3
8
Deterministic Results
Net System Peak by year
Yearly Net System Requirements
9
Monte Carlo Simulations
▪ Base Forecast from SAE is a deterministic model that predicts a given number (sales, peak demand) at a given time
▪ Monte Carlo method adds volatility to the forecast by including historicalvariations, or best guesses in case of lack of data, to the inputs (predictors)
▪ Regression coefficients obtained in the previous step remain fixed; “we claim we understand the nature of current sales”
▪ With help of software tools (e.g., Oracle’s Crystal Ball, R, SAS, Matlab, etc.) we generate random sets of numbers for each predictor. We then simulate thousands of input numbers based on “realistic” probability distributions
▪ After each trial, we calculate the output given the trial inputs (and known fixed coefficients), collect all results and obtain the probability distribution of the output each year
▪ For this work we only considered variation in economics and weather
10
Monte Carlo Method
▪ If historical data is available, and normally distributed, extracting mean and standard deviations should be simple
▪ Software tools can then use the parameters of the best fit (e.g., mean and SD) to generate pseudorandom numbers as many times as one wants
▪ In normal distributions the mean locates the data in the spectrum of possibilities, while the standard deviation measures variability
11
Simulating Probability Distributions
▪ Normal distributions (bell curve) usually arise in naturally occurring events
▪ Others may be product of filters, rules, biases, and lack of information
▪ Right panel: Function MAX (of monthly, bell-shaped, HDDs) creates a bias that skews the normal distribution. This may show up when studying the system peak on a given year
▪ Bottom: When little or no data is available Uniform and Triangular distributions are suggested, they are easy to create and understand, albeit not very realistic.
12
Other types of distributions
How our model works in practice
AvgUsem a bc XCoolm bh XHeatm bo XOtherm+…
3
Trial PriceTrial Income Trial March HDDOutput
• We reproduce Base SAE model in Excel
• Regression coefficients are imported
• Economic indicators , HDDs and CDDs are simulated
• For now End-Use forecasts are notmodelled
• Reported DSM is used to correct outputs
▪ After all predictor distributions (per time interval) are set, we can run our regression model, which takes as inputs each set of trials
▪ We gather distributions of the output (sales) per year
▪ The standard deviation, or spread of the outcome is a measure of uncertainty!
14
Results 1: Long term sales forecast
▪ We use a version of the SAE model for the monthly system peak model. It uses weather normalized versions of the Energy Loads and multiplies it for temperature at Peak for that month
▪ The model is very accurate (R2>0.9) , and also heavily sensitive to Peak temperature, whose distribution is slightly skewed
15
Results 2: Long term peak forecast
▪ As a byproduct one can trace the ranked correlation (Spearman’s) between each predictor and output, and rank by the order of importance
▪ This way one can have a relative (by predictor) estimation of the contribution to variance
▪ d=difference between ranks of the pairs output-predictor
16
Sensitivity analysis 1: Energy Sales
𝜌𝑆 = 1 −6σ𝑑2
𝑁(𝑁2 − 1)
𝐶2𝑉𝑎𝑟𝑖 =𝜌𝑆𝑖
2
σ𝑖 𝜌𝑆𝑖2
▪ The System Peak forecast is heavily dependent on the temperature of the peak day, and very loosely dependent on cumulative effects (economics, price)
17
Sensitivity analysis 2: Peak Demand
18
Correlations
▪ After a Monte Carlo model has been completed, it’s time to add correlations between predictors
▪ In principle all variables exhibit some degree of correlations, it is up to the analyst to select those that will be included
▪ Including correlations will have an impact on the forecast uncertainty, particularly in the mid-to-long term
▪ We can proceed in a similar way as with the historical variation study for the predictors. This time we examine the correlation of historical measures among pairs of predictors
19
Adding correlations
20
Reminder: Correlations
𝑐𝑜𝑟𝑟 𝑋, 𝑌 =𝐸[(𝑋 − ത𝑋)(𝑌 − ത𝑌)
𝜎𝑋𝜎𝑌
=σ𝑖=1𝑛 (𝑥𝑖− ҧ𝑥)(𝑦𝑖−ത𝑦)
σ𝑖=1𝑛 (𝑥𝑖 − ҧ𝑥)2 σ𝑖=1
𝑛 (𝑦𝑖 −ത𝑦)2
21
Simulated Correlations (after 10K trials)
▪ Calculating correlations will produce “numbers”, the analyst has to use their knowledge in order to determine if a certain correlation should be included or not (should I include temperature-price pair?...it depends!)
▪ Different software may handle correlations differently, in some cases affecting the forecast computation time, significantly
▪ We test for significance of the calculated correlation, using the T-test (for 20 years of yearly data, t* >|2.1| rejects the Null hypothesis of zero )
22
What correlations should we add?
𝑡∗ = 𝜌𝑆𝑁 − 2
1 − 𝜌𝑆2
23
Correlations make a difference
24
Test case
▪ Purpose of the confidence bands is assessing probable outcomes
▪ At the moment the Load Forecast team produces year-to-year, long term forecast reports that are submitted both to NSPI and UARB
▪ Yet we still have to find ways to asses historical DSM activities and End-Uses, and integrate them to the Monte Carlo variables
▪ However, our current forecast model runs monthly -> we can asses short term accuracies when comparing actuals (so far actuals within 5% of Monte Carlo mean)
25
Monthly Sales Forecast & 2017 Actuals
▪ We use Monte Carlo to assess probable outcomes produced by our regression models
▪ The predicted uncertainty is a result of the combinations of (known) uncertainties of many inputs
▪ Our challenges: Realistic DSM and End-Use probability distributions
▪ After assumptions are understood (e.g. sensitivity) and predictions trusted we can explore (probabilistic) actions that affect inputs that in turn affect outcomes
▪ We look forward to hear from those in the audience who may be using Monte Carlo simulations in similar or new ways!
26
Conclusions and future work
27
Thank you!
Questions?