learning, selecting, and control in residential demand ... · learning, selecting, and control in...

Learning, selecting, and control in residential demand response for grid reliability

Yingying Li, Qinran Hu, Na Li Alison Su, Jun Shinmada

08/04/2018

DR Capacity for 2015/2016 in PJM

Source: PJM Interconnection, “Demand response strategy,” Tech. Rep., 2017.

Residential demand consists the largest share. It is underutilized in demand response (DR)

Residential

14%

Source: U.S. Energy Information Administration (EIA),

“Annual Electric Power Industry Report”.

Residential

38%

Commercial and Other

37%

Industrial

25%

Previous pilots

▪ LocationArizona, California, Colorado, Hawaii,

Massachusetts, Michigan, New Mexico,

New York, Texas, Utah, Virginia, etc

▪ DeviceNon-intrusive, Switch, ThinkEco, Ecobee, NEST, etc.

▪ RewardTime of use, Real time pricing, coupon,

Discounted bill, Gift card, Check,

Raffle, Other recognition, etc.

▪ ControlDirect load control,

Voluntary, etc.

Issues

▪ Money Incentive is low

▪ Users quit if being pushed

too hard

▪ User uncertainty is high

and unknown

▪ Users prefer simple DR

program

1. A pilot study: Learning the customers 2. Real-time learning/decision making

This Talk: Learning for Residential DR

Data Model Decision Data Model Decision

Learn user behaviorSelect the “right” usersSet the “right” control actions

Reliable aggregated DRe.g.

TargetIndividual reduction

The pilot study by ThinkEco, Inc

▪ Tech: SmartAC kits for window ACs, app/web control

▪ Data Resolution: 1 minute

▪ Time: Jun-Sep 2015 to now

▪ DR: multiple events in the afternoon/evening during the entire summer

two control types: i) reset temp. target ii) cycling rate

*opt-out option available

▪ Incentives: A simple example: $5 for setup, $15 for device staying online

Other types exist as well

Data of the pilot

▪ For each AC units (minute level)

▪ For each DR event (four events)

DR Event

Target Temperature

Room Temperature

Energy Consumption

DR events

2016

Targets of the pilot

Can we improve DR performance

by analyzing data?

Data Model Decision

Aggregated

behavior

Green curve is a baseline using the full summer’s dataset (KEMA)

What can we learn?

Now Past

Now

Past

Room Temperature

✓AC operation pattern

✓House thermal model

✓Temperature preference

✓Occupancy

✓Opt-out rate

✓User type (classification)

Etc…

Pro

babili

ty

Bernoulli Distribution

AC#1’s power consumption

(Histogram)

+ =

AC operation pattern follows Bernoulli distribution

Re

st

Fan Compressor

Histogram of Aggregated

power consumption of ACs

Learn house thermal model with regression

∆𝜃𝑘∆𝑡𝑘

= 𝑎 ∙ 𝜃𝐴 − 𝜃𝑟𝑜𝑜𝑚 + 𝑢𝑘𝑄 + 𝑤𝑘

▪ Apply linear regression

Heat loss rate cooling effect

Use thermal model to design better AC control

For example, smooth response

+7F Offset

Predicting Opt Out

Data Model Decision

- Users’ occupancy rate

- Decision in previous DR events

- Temperature preference

- Ambient temperature

* Recent day and same weekday effect

▪ Raw Data → Inputs:

▪ Output:

- Opt-out probability

ANN Prediction Model

~ 85% detection rate for opt-out

1. A pilot study: Learning the customers 2. Real-time learning/decision making



Multi-armed bandit learning

algorithm in aggregating demands

Introduction to multi-armed bandit (MAB) problem

Ex. Slot machine

MAB is about Exploration vs Exploitation

Demand Response

Select the top K arms

to maximize the expected reward

Select a number of customers

to maximize reliability (minimize variance)

A MAB model for reliability: Nonmonotone objective function

Ramping Support Ancillary Service

A MAB model for reliability in DR (simplified)

➢ A set of customers S

➢ Each customer i reduces one unit of load with probability pi (Bernoulli Distribution)

➢ A target total reduction Dt at time step t

➢ Objective: choose a subset St of customers to minimize reliability cost

➢ Performance analysis:

Regret: = Online reliability cost – offline optimal reliability cost

43

𝑃1 𝑃2 𝑃3 𝑃4

1 2

Results:

100 users, D=35

100 users, 𝐷𝑡 ∈ [10, 30]

Yingying Li, Qinran Hu, Na Li, "Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach", Control and Decision Conference, 2018

Thm: Our DR algorithm CUCB-Avg achieves log(T) regrets

where T is the number of total DR events

1. A pilot study: Learning the customers 2. Real-time learning: Multi-armed Bandit


Ongoing work with ThinkEco:

Residential DR field study in New York City with 40K+ AC devices this summer

Human-machine interaction; Engineering-Learning integration


Conclusion:

DR pilots have huge amount of valuable data.

Examples showed learning techniques are helpful for DRs.

Backup

18

43

Offline optimal selection algorithm

1 2

43𝑃1 𝑃2 𝑃3 𝑃4

1 2

Determine

the number

k

Objective:

If we know pi for all i

𝑃1 𝑃2 𝑃3 𝑃4

Rank

43

Offline optimal selection algorithm

1 2

43𝑃1 𝑃2 𝑃3 𝑃4

1 2

Theorem (Li, Hu, Li, 2018): Algorithm 1 produces an

optimal output for the offline optimization problem.

Determine

the number

k

𝑃1 𝑃2 𝑃3 𝑃4

Rank

Online Algorithm

Define: sample average

Number of

selection

Greedy Algorithm: Use the sample average to run the offline optimal algorithm.

Events when

arm i is selected

Realization

Too Much Exploitation

Not enough Exploration

Online Algorithm: UCB: Upper Confidence Bound (Auer et al. 2002)

Define: Upper Confidence Bound

UCB Algorithm: Use the UCB to run the offline optimal algorithm.

Exploitation Exploration

➢Popular algorithm for K-arm Maximization MAB with log(T) regret➢Performs poorly in our reliability problem:

➢Tends to select less arms → larger variance and less exploration

The Proposed Online Algorithm: UCB-Average (Li, Hu, Li, 2018)

Define: Upper Confidence Bound

UCB-Average Algorithm:

➢ Use the UCB to rank the arms

➢ Use the sample average to determine the number K of selected arms

Exploitation Exploration

Regret Analysis of the Algorithm (Li, Hu, Li, 2018)

Define:

Online cost Offline optimal cost

Regret Analysis of the Algorithm (Li, Hu, Li, 2018)

Time varying

Stationary

Define:

Interpretation of the regret

Stationary

Define:

30

Proof Sketch

Part I Part II Part III Part IV

Proof Sketch:

I: Initial time step

II: Sample Average (Estimation) is far away from true value

III: Select an arm that is currently under explored

IV: The other events

Stationary

Define:

➢Use the historical data as prior information

➢ Introduce heterogeneity and sub-arms in modeling the arms

➢Reduce the uncertainty by taking side information (Contextual Bandits)

➢Group arms to super-arms based these inputs

➢Use mechanism design/prices/rewards to ``influence’’ users

➢…


Future work:


32


➢Use the historical data as prior information

➢ Introduce heterogeneity and sub-arms in modeling the arms

➢Reduce the uncertainty by taking side information (Contextual Bandits)

➢Group arms to super-arms based these inputs

➢Use mechanism design/prices/rewards to ``influence’’ users

➢…


DR events

2016

Higher, predictable demand reductions

Lower opt out rates/ customer fatigue

Targets of the pilot

Can we improve DR performance

by analyzing data?

Data Model Decision

learning, selecting, and control in residential demand ... · learning, selecting, and control in...

Documents