a two-layer approach for estimating behind-the-meter pv

10
1 A Two-layer Approach for Estimating Behind-the-Meter PV Generation Using Smart Meter Data Fankun Bu, Graduate Student Member, IEEE, Rui Cheng, Graduate Student Member, IEEE, and Zhaoyu Wang, Senior Member, IEEE Abstract—As the cost of the residential solar system decreases, rooftop photovoltaic (PV) has been widely integrated into distri- bution systems. Most rooftop PV systems are installed behind- the-meter (BTM), i.e., only the net demand is metered, while the native demand and PV generation are not separately recorded. Under this condition, the PV generation and native demand are invisible to utilities, which brings challenges for optimal distribution system operation and expansion. In this paper, we have come up with a novel two-layer approach to disaggregate the unknown PV generation and native demand from the known hourly net demand data recorded by smart meters: 1) At the aggregate level, the proposed approach separates the total PV generation and native demand time series from the total net demand time series for customers with PVs. 2) At the customer level, the separated aggregate-level PV generation is allocated to individual PVs. These two layers leverage the spatial correlations of native demand and PV generation, respectively. One primary advantage of our proposed approach is that it is more independent and practical compared to previous works because it does not require PV array parameters, meteorological data and previously recorded solar power exemplars. We have verified our proposed approach using real native demand and PV generation data. Index Terms—Rooftop photovoltaic, behind-the-meter, PV gen- eration estimation, smart meter, and distribution system. I. I NTRODUCTION I N the last decade, residential rooftop photovoltaic (PV) has been proliferating in distribution systems. In most cases, utilities only install a bi-directional smart meter to record the net demand of customers with PVs. This type of installation is referred to as behind-the-meter (BTM), in which case the net demand equals native demand minus PV generation. Therefore, the PV generation produced by solar array and the native demand consumed by appliances are unknown to utilities. Only metering the net demand can reduce the financial cost for utilities; however, as the penetration level of PV increases, the unobservability of notable PV generation and native demand brings significant challenges to distribution systems. We focus on three specific applications to elaborate the necessity of estimating the unknown BTM PV generation and native demand: First, the unavailability This work was supported in part by the National Science Foundation under EPCN 2042314 and in part by the Grid Modernization Initiative of the U.S. Department of Energy (DOE) under GMLC project 2.1.1 – FASTDERMS. (Corresponding author: Zhaoyu Wang) F. Bu, R. Cheng and Z. Wang are with the Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA (e-mail: [email protected]; [email protected]). of native load and PV generation might cause unacceptable forecasting errors because some forecasters require reconsti- tuting the generation and native demand time series [1], [2]. In contrast, knowing BTM PV generation and native load can help utilities forecast generation and load separately, thus provide utilities useful information regarding load/generation growth. Second, the invisibility of PV generation and native load can hinder designing optimal service restoration plans [3], [4]. During the restoration stage after an outage, the native demand might be several times higher than the pre-outage demand due to the simultaneous restarting of a large number of air-conditioning appliances. This anomalous demand should be estimated for optimal restoration plans because it can damage electric devices when simultaneously restoring a large number of customers. In practice, utilities multiply the normal native demand before outage by a ratio to estimate the anomalous demand during restoration. Also, utilities typically do not consider PVs as reliable restoration sources [3]. Therefore, separating normal native demand and generation is needed for estimating the restoration demand. Third, the unobservability of native demand and solar generation might cause inaccurate reliability analysis. When evaluating a transmission system’s reliability, each distribution system is generally simplified as a bus whose native load duration curve is constructed [5], [6]. For those utilities with a high-penetration PV integration, directly using the net demand to construct the load duration curve can significantly underestimate the actual native load. This is because the net demand is typically smaller than the native demand due to the existence of PV generation. In contrast, using the native demand separated from the net demand can help construct more accurate load duration curves. In summary, disaggregating BTM PV generation and native demand from the recorded net demand can enhance distribution system observability and awareness and can also provide more accurate information for transmission system reliability analysis. Previous works on BTM PV generation disaggregation can be categorized into two types: Type I - Model-based approaches: PV array performance model is employed to represent physical PV arrays. In [7], a PV model is com- bined with a clear sky model to estimate customer-level solar generation. In [8], a virtual equivalent PV station model is utilized to represent the aggregate generation of BTM PVs within a region. In [9] and [10], a physical PV model and a statistical model are utilized to estimate BTM solar generation arXiv:2110.07697v2 [eess.SP] 31 Oct 2021

Upload: others

Post on 21-Nov-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Two-layer Approach for Estimating Behind-the-Meter PV

1

A Two-layer Approach for EstimatingBehind-the-Meter PV Generation Using Smart

Meter DataFankun Bu, Graduate Student Member, IEEE, Rui Cheng, Graduate Student Member, IEEE, and Zhaoyu

Wang, Senior Member, IEEE

Abstract—As the cost of the residential solar system decreases,rooftop photovoltaic (PV) has been widely integrated into distri-bution systems. Most rooftop PV systems are installed behind-the-meter (BTM), i.e., only the net demand is metered, while thenative demand and PV generation are not separately recorded.Under this condition, the PV generation and native demandare invisible to utilities, which brings challenges for optimaldistribution system operation and expansion. In this paper, wehave come up with a novel two-layer approach to disaggregatethe unknown PV generation and native demand from the knownhourly net demand data recorded by smart meters: 1) Atthe aggregate level, the proposed approach separates the totalPV generation and native demand time series from the totalnet demand time series for customers with PVs. 2) At thecustomer level, the separated aggregate-level PV generation isallocated to individual PVs. These two layers leverage the spatialcorrelations of native demand and PV generation, respectively.One primary advantage of our proposed approach is that it ismore independent and practical compared to previous worksbecause it does not require PV array parameters, meteorologicaldata and previously recorded solar power exemplars. We haveverified our proposed approach using real native demand andPV generation data.

Index Terms—Rooftop photovoltaic, behind-the-meter, PV gen-eration estimation, smart meter, and distribution system.

I. INTRODUCTION

IN the last decade, residential rooftop photovoltaic (PV)has been proliferating in distribution systems. In most

cases, utilities only install a bi-directional smart meter torecord the net demand of customers with PVs. This typeof installation is referred to as behind-the-meter (BTM), inwhich case the net demand equals native demand minus PVgeneration. Therefore, the PV generation produced by solararray and the native demand consumed by appliances areunknown to utilities. Only metering the net demand can reducethe financial cost for utilities; however, as the penetrationlevel of PV increases, the unobservability of notable PVgeneration and native demand brings significant challenges todistribution systems. We focus on three specific applicationsto elaborate the necessity of estimating the unknown BTMPV generation and native demand: First, the unavailability

This work was supported in part by the National Science Foundation underEPCN 2042314 and in part by the Grid Modernization Initiative of the U.S.Department of Energy (DOE) under GMLC project 2.1.1 – FASTDERMS.(Corresponding author: Zhaoyu Wang)

F. Bu, R. Cheng and Z. Wang are with the Department of Electrical andComputer Engineering, Iowa State University, Ames, IA 50011, USA (e-mail:[email protected]; [email protected]).

of native load and PV generation might cause unacceptableforecasting errors because some forecasters require reconsti-tuting the generation and native demand time series [1], [2].In contrast, knowing BTM PV generation and native loadcan help utilities forecast generation and load separately, thusprovide utilities useful information regarding load/generationgrowth. Second, the invisibility of PV generation and nativeload can hinder designing optimal service restoration plans [3],[4]. During the restoration stage after an outage, the nativedemand might be several times higher than the pre-outagedemand due to the simultaneous restarting of a large number ofair-conditioning appliances. This anomalous demand should beestimated for optimal restoration plans because it can damageelectric devices when simultaneously restoring a large numberof customers. In practice, utilities multiply the normal nativedemand before outage by a ratio to estimate the anomalousdemand during restoration. Also, utilities typically do notconsider PVs as reliable restoration sources [3]. Therefore,separating normal native demand and generation is needed forestimating the restoration demand. Third, the unobservabilityof native demand and solar generation might cause inaccuratereliability analysis. When evaluating a transmission system’sreliability, each distribution system is generally simplified asa bus whose native load duration curve is constructed [5],[6]. For those utilities with a high-penetration PV integration,directly using the net demand to construct the load durationcurve can significantly underestimate the actual native load.This is because the net demand is typically smaller thanthe native demand due to the existence of PV generation.In contrast, using the native demand separated from thenet demand can help construct more accurate load durationcurves. In summary, disaggregating BTM PV generation andnative demand from the recorded net demand can enhancedistribution system observability and awareness and can alsoprovide more accurate information for transmission systemreliability analysis.

Previous works on BTM PV generation disaggregationcan be categorized into two types: Type I - Model-basedapproaches: PV array performance model is employed torepresent physical PV arrays. In [7], a PV model is com-bined with a clear sky model to estimate customer-level solargeneration. In [8], a virtual equivalent PV station model isutilized to represent the aggregate generation of BTM PVswithin a region. In [9] and [10], a physical PV model and astatistical model are utilized to estimate BTM solar generation

arX

iv:2

110.

0769

7v2

[ee

ss.S

P] 3

1 O

ct 2

021

Page 2: A Two-layer Approach for Estimating Behind-the-Meter PV

2

and native demand, respectively. One primary disadvantageof these model-based approaches is that detailed PV arrayparameters or accurate meteorological data are required. How-ever, in practice, these parameters are typically unavailableto utilities. Also, acquiring meteorological data might causeadditional costs to utilities. Type II - Model-free approaches:In [11] and [12], net demands under heterogeneous weatherconditions are employed to estimate BTM PV capacity, whichis then multiplied by a standard solar power time series toinfer BTM PV generations. In [13], native demand and PVgeneration are estimated using 1-second net demand data byidentifying appliances’ states, which are then leveraged toestimate appliance demands and solar power. Based on thevariation difference between load and solar power, in [14], anapproach is proposed for estimating service transformer-levelPV generation. In [15], regional-level generation is estimatedby installing additional sensors to record typical PV generationprofiles. In [16], feeder-level solar generation is estimatedby utilizing net load measurements and a nearby PV farm’sgeneration readings. Using known native loads for customerswithout PVs and the generations for a limited number ofobservable PVs, in [17], the authors formulate an optimiza-tion process to estimate the aggregated native load and PVgeneration. In [18], a federated learning-based framework isproposed to probabilistically estimate community-level BTMsolar generation. Furthermore, previously in [19] and [20], wehave proposed two approaches for estimating the unknownBTM generation using measured solar power exemplars. Oneprimary shortcoming of the model-free approaches is that theyrely on contextual information, i.e., recorded solar power ex-emplars or meteorological data, which might bring additionalcosts to utilities.

Considering the shortcomings of previous approaches, wepropose a novel BTM PV generation and native demand esti-mation framework which does not require previously recordedsolar power and meteorological measurements. Our approachis based on our two findings from real data. The first findingis referred to as the spatial correlation of native load, i.e., thenative demands of two sizeable residential customer groupsare strongly correlated and have highly homogeneous shapes.The second finding is associated with the spatial correlation ofsolar power generation, i.e., the generations for different PVsin a geographically bounded distribution system are highlycorrelated and have highly similar profiles. Specifically, ourproposed approach contains two layers: (1) At the aggregatelevel, the total generation of all BTM PVs in a distributionsystem is estimated, by leveraging our first finding regardingthe native demand spatial correlation. (2) Then, at the indi-vidual customer level, utilizing our second finding regardingPV generation spatial correlation, the estimated aggregateBTM PV generation is allocated to individual customers withPVs. By doing this while not relying on native demanddata, our approach can avoid the impact of customer-levelload uncertainty [21]. Specifically, first, we train a model toproduce multiple candidate generation time series, using solarpower data generated by a publicly available tool. Second,we address the challenge of determining the peak generationfor each PV, by leveraging our observation from real data

Customerswith & without PVs

Native Demand Spatial Correlation

PV Generation Spatial Correlation

Aggregate BTMPV Generation

Individual BTMPV Generation

Fig. 1. Overall structure of the proposed BTM PV generation estimationapproach.

10 20 30 40 50 60 70

Time (hour)

0

100

200

300

400

P (k

Wh)

# of cust.=40 # of cust.=60 # of cust.=80

Fig. 2. Three-day actual native demand curves for three example groupswith different customer numbers.

that the actual peak generation is highly correlated withthe difference between the minimum diurnal native demandand the minimum diurnal net demand. Since the minimumdiurnal native demand is unknown due to the existence of PVgeneration, we approximate it using the minimum nocturnalnative demand, which equals the known minimum nocturnalnet demand because PV does not generate power duringnighttime. Finally, the allocating procedure is formulated as anoptimization problem. The overall structure of our proposedapproach is shown in Fig. 1. We have verified our proposedapproach using real hourly native demand and PV generationdata [22].

Throughout the paper, vectors are denoted using bold italicletters, and matrices are represented as bold non-italic letters.In addition, we adopt the sign convention that the nativedemand consumed by customers and the power output fromPVs are both positive.

The rest of the paper is organized as follows: SectionII introduces our first and second findings regarding spatialcorrelation of native demand/generation. Section III presentshow we estimate the aggregate generation for customers withPVs. Section IV presents the procedure of formulating andsolving an optimization problem to allocate the estimatedaggregate generation to individual PVs. In Section V, casestudies are analyzed. Section VI concludes the paper.

II. SPATIAL CORRELATION OF NATIVE DEMAND/PVGENERATION

A. Finding 1: Native Demand Spatial Correlation betweenTwo Sizeable Groups

By examining real residential native demand data, we findthat once the customer numbers for two groups reach a certainlevel, their native demands are highly correlated. This findingis leveraged for estimating the aggregate native demand timeseries for customers with PVs.

Specifically, we use native demand curves to illustrate theobserved spatial correlation. Fig. 2 presents real native demandcurves for three example groups with different customernumbers, i.e., 40, 60, and 80, respectively. We can observe

Page 3: A Two-layer Approach for Estimating Behind-the-Meter PV

3

10 20 30 40 50 60 70

Time (hour)

0

0.5

1P

(p.u

.)

# of cust.=40 # of cust.=60 # of cust.=80

Fig. 3. Three-day normalized native demand curves for three example groupswith different customer numbers.

that these three curves demonstrate almost identical shapes,although they have different magnitudes. The high shapesimilarity can also be corroborated by Fig. 3, which presentsnormalized native demand curves corresponding to the curvesin Fig. 2. Note that the normalized curves are obtained bydividing the real curves in Fig. 2 by their peaks, respectively.

To stress the importance of Fig. 3, we first define twotypes of customer groups: the residential customers with andwithout PVs. These two customer groups are denoted as Cwand Co, respectively. For Co, its native demand is recordedby smart meters. For Cw, we only know its net demand, andwe do not know its native demand. Our goal is to estimateCw’s unknown native demand and thus to estimate its PVgeneration. Therefore, Fig. 3 inspires us that given the knownnative demand curve of Co, we can infer the unknown nativedemand curve of Cw by multiplying the normalized nativedemand curve of Co by a ratio, r.

Since the native demands for the customers in Co are di-rectly recorded by smart meters, the normalized native demandcurve of Co can be obtained by first aggregating the nativedemand time series of customers in Co, and then normalizingthat aggregate curve. The challenge for inferring the unknownnative demand curve of Cw is that the ratio, r, is unknown andneeds to be estimated. Note that for the customers in Cw, thenative demand during the daytime is unknown. This is becausePV generates power during the daytime, which can mask thenative demand in the case of net metering. Thus, we cannotuse daytime native demand to compute r. Instead, we use thenocturnal native demand to estimate r because PV does notgenerate power during nighttime, and thus the nocturnal nativedemand for Cw is known. Based on the above inference, wepropose to first utilize the nocturnal native demand to computea nocturnal native demand ratio, rn, and then approximate ras rn.

One condition for approximating r as rn is that r shouldbe similar to rn. To verify this condition, we randomly selecttwo groups with different customer numbers ranging from 20to 80. Then, for each group, the native demand time seriesare spatially aggregated over customers to obtain an aggregatenative demand time series. After that, we compute r using thetwo groups’ native demand time series throughout a certainperiod, and compute rn using the two groups’ native demandtime series only during nighttime within that period. Finally,we plot r against rn, as shown in Fig. 4. We can see thatr is almost identical with rn. Therefore, we can accuratelyestimate r by directly letting it equal rn.

Once we obtain the estimate of r, we can compute the un-

0 1 2 3 40

1

2

3

4

Fig. 4. The relationship between native demand ratio and the nocturnal nativedemand ratio between two example customer groups.

10 20 30 40 50 60 70

Time (hour)

0

2

4

6

G (

kWh)

PV 1 PV 2 PV 3

Fig. 5. Three-day real generation curves for three example PVs with differentcapacities.

known native demand of Cw by multiplying the known nativedemand of Co by the estimate of r. After that, estimatingthe unknown PV generation of Cw is straightforward, i.e., bysubtracting the recorded net demand measurements from theestimated native demand.

B. Finding 2: Generation Spatial Correlation between TwoPVs

There are two primary factors that determine the generationspatial correlation: (1) In most cases, a distribution systemis geographically bounded in a small district. (2) The mostwidely available sampling resolution for smart meters is 1-hour. Under these two conditions, different PV arrays aresubject to nearly identical meteorological inputs. Thus, theidentical inputs can result in highly similar shapes among PVgeneration curves. Fig. 5 presents three example PV generationcurves corresponding to different PV array capacities. Similarto the native demand curves for sizeable customer groups,these three generation curves also demonstrate significantspatial correlation, i.e., they possess highly similar shapes.This high similarity can also be corroborated by Fig. 6, wherethe normalized generation curves corresponding to the threecurves in Fig. 5 overlap with each other. Most importantly,Fig. 5 and 6 inspire us that estimating a BTM PV generationcurve comes down to two steps: first, determine the generationcurve’s shape, and then determine its magnitude. This two-stepmethod can notably simplify the estimation of unknown BTMPV generation time series. This is because compared to model-based methods, our approach is developed on the foundation ofhigh similarity among generation curves; therefore, it requiressignificantly less information.

III. ESTIMATING AGGREGATE BTM PV GENERATION FORCUSTOMERS WITH PVS

As elaborated in Section II-A, the native demands of twosizeable customer groups are highly correlated. This high

Page 4: A Two-layer Approach for Estimating Behind-the-Meter PV

4

10 20 30 40 50 60 70

Time (hour)

0

0.5

1G

(p.

u.)

PV 1 PV 2 PV 3

Fig. 6. Three-day normalized generation curves for three example PVs withdifferent capacities.

correlation inspires us that we can infer the unknown nativedemand of Cw by multiplying the known native demand ofCo by a ratio:

PPPw = PPP or, (1)

where, PPPw = {Pw(t)} and PPP o = {Po(t)}, t = 1, ..., T , denotethe estimated native demand time series for Cw and the actualnative demand time series for Co, respectively. T is the totalnumber of native demands in a selected window (e.g., onemonth). Po(t) is computed by aggregating the measured nativedemands over customers without PVs:

Po(t) =

No∑i=1

Po,i(t), t = 1, ..., T, (2)

where, No represents the total number of customers in Co, i.e.,customers without PVs. Po,i(t) denotes the measured nativedemand at time t for the i’th customer in Co.

In (1), r denotes the native demand ratio between Cw andCo, and is defined as follows:

r =ΣTt=1Pw(t)

ΣTt=1Po(t). (3)

However, as presented in Section II-B, since the diurnalnative demand for Cw is masked by PV generation andunavailable to utilities, we need to estimate r using nocturnalnative demand measurements. This approximation method isbased on the observation that PV does not generate powerduring nighttime and the verification that r and rn are almostidentical. Specifically, we use rn to approximate r:

r = rn =Σt∈InPw(t)

Σt∈InPo(t), (4)

where, In denotes the set of nighttime hours. In our paper, Inrefers to the hours between 9:00 P.M. and 5:00 A.M. Note thatfor the hours in In, since PV does not generate power, Pw(t)equals the known aggregate net demand, P ′w(t). Therefore,

r =Σt∈InP

′w(t)

Σt∈InPo(t), (5)

where, P ′w(t) is computed by aggregating the measured netdemands over customers in Cw:

P ′w(t) =

Nw∑i=1

P ′w,i(t), t = 1, ..., T, (6)

where, Nw represents the total number of customers in Cw.P ′w,i(t) denotes the measured net demand at time t for the i’thcustomer in Cw.

Then, using the estimate of r and the known native demandtime series for Co, we can apply (1) to compute the estimatednative demand time series for Cw. Finally, inferring the PVgeneration time series for Cw, GGGw = {Gw(t)}, t = 1, ..., T ,is straightforward:

GGGw = PPPw −PPP ′w, (7)

where, PPP ′w = {P ′w(t)}, t = 1, ..., T , denotes the known netdemand time series for Cw.

The above procedure for estimating the aggregate-level PVgeneration and native demand for Cw are illustrated in Fig. 7.

IV. ESTIMATING BTM PV GENERATION FOR EACHINDIVIDUAL PV

Knowing the aggregate BTM PV generation and nativedemand might not be sufficient for some applications [23],[24]. For example, some demand response schemes requireknown customer-level native demand [11]. Therefore, esti-mating individual customers’ BTM native demand and PVgeneration is of significance.

To achieve this goal, we propose an approach to allocate theestimated aggregate PV generation/native demand time seriesto individual customers with PVs. As discussed in SectionII-B, estimating an individual PV’s generation curve boilsdown to determining the generation curve’s shape and itsmagnitude. Thus, in this section, we first determine candidateshapes for individual PV generation curves. Then, we estimatethe peak generation for each individual PV. Finally, allocatingthe aggregate PV generation to individual PVs is formulatedas an optimization process.

A. Generating Diverse Candidate Shapes for Individual PVs

As discussed earlier, in a geographically bounded distri-bution system, two primary factors determining a generationcurve are the magnitude and shape. In this section, we elab-orate on how to obtain candidate shapes for individual PVs’generation curves.

In Section III, we have obtained the estimated time series forthe aggregate generation of all PVs. One question is whetherwe can use that shape to represent the unknown shapes ofindividual PVs. To answer this question, we have conducted anumerical experiment. First, we normalized the aggregate gen-eration curve of all PVs by dividing the aggregate generationtime series by its peak. Then, in the same way, we normalizedthe generation curve of an example PV facing south. The twonormalized curves are plotted in Fig. 8. It can be seen that thenormalized curve corresponding to the aggregate generationfor all PVs is highly similar to the normalized curve for asouth-facing PV. One primary reason for this similarity is thatthe majority of residential PVs face south because a south-facing PV can typically generate more power than PVs inother directions. Most importantly, Fig. 8 tells us that a south-facing PV’s generation curve can be accurately represented bythe normalized aggregate generation curve of all PVs.

Note that in distribution systems, in addition to the ma-jority of south-facing PVs, there exist some residential PVs

Page 5: A Two-layer Approach for Estimating Behind-the-Meter PV

5

𝐶 : Customerswithout PVs

𝐶 : Customers with PVs

Smart Meter Measurements

Aggregate Native/Net Demand

Aggregate Native Demand

Aggregate Net Demand

𝑃 , (𝑡)

𝑃 , (𝑡)

𝑃 𝑡 , 𝑡 ∈ 𝐼

÷

𝑃 𝑡 , 𝑡 ∈ 𝐼

��

𝑃 𝑡

×𝑃 𝑡

𝑃 𝑡

−+

Estimating Aggregate BTM Native Demand and PV Generation

𝐺 𝑡

𝑃 𝑡 : estimated aggregate native demand

𝐺 𝑡 : estimated aggregate PV generation

Fig. 7. Detailed structure of the proposed aggregate-level BTM PV generation/native demand estimation.

10 20 30 40 50 60 70

Time (hour)

0

0.5

1

G (

p.u.

)

Aggregate A South-facing PV

Fig. 8. Three-day normalized aggregate generation curve for all PVs andnormalized generation curve for an individual PV facing south.

10 20 30 40 50 60 70

Time (hour)

0

0.5

1

G (

p.u.

)

Aggregate An East-facing PV A West-facing PV

Fig. 9. Three-day normalized aggregate generation curve of all PVs andnormalized generation curves for two example PVs facing east and west,respectively.

with other azimuths, such as east or west. These non-south-facing PVs’ generation curves cannot be fully representedby the normalized aggregate PV generation curve in Fig.8. Specifically, compared to the normalized aggregate PVgeneration curve, the normalized generation curves for aneast-facing PV and a west-facing PV are somewhat “left-skewed” and “right-skewed”, respectively, as shown in Fig.9. Therefore, it is necessary to obtain candidate shapes forthose non-south-facing PVs’ generation curves. To achievethis goal, our basic idea is first to feed PV power datagenerated by PVWatts Calculator into a regression model tocapture the relationship between the generations for a south-facing PV and a non-south-facing PV. Then, the aggregategeneration curve estimated in Section III, which can accuratelyrepresent a south-facing PV’s generation curve, is fed into thetrained regression model to produce diverse generation curvescorresponding to non-south azimuths. The overall structure isshown in Fig. 10:

1) Training A Gaussian Process Regression Model: Sincethe shape of a south-facing PV’s generation curve can beapproximated as the shape of the aggregate generation curveof all PVs, one intuitive way for inferring non-south-facingPVs’ candidate shapes is to produce diverse shapes based

Estimated Curve for A South-facing PV

PVWatts CalculatorPower Curves for PVs with Typical Azimuths

TrainedRegression Models

Produced DiverseGeneration Curves

Fig. 10. Overall structure for producing diverse candidate PV generationcurves using power output data generated by PVWatts Calculator.

on the south-facing PV’s estimated generation curve. Thisidea is based on our observation that there exists a mappingbetween the generation curves for PVs with different azimuths.Therefore, one critical step for producing diverse candidategeneration curves is to identify the relationship between anon-south-facing PV’s generation curve and a south-facingPV’s generation curve. To capture the relationship, first, weuse PVWatts Calculator [25], an online application developedby the National Renewable Energy Laboratory (NREL), togenerate power output data for PVs with typical azimuths, e.g.,east, south, and west. Then, using the generated PV outputpower data, we train a Gaussian Process Regression (GPR)model to capture the relationship between the generation curvecorresponding to a typical azimuth except for south (e.g., east)and the generation curve corresponding to the azimuth of thesouth. The primary reason for selecting GPR is that afterrunning numerical tests, GPR demonstrated a relatively betterperformance when applied to our dataset than some other state-of-the-art nonlinear regression models, such as the SupportVector Machine model and the Polynomial regression model.

Specifically, first, we use PVWatts Calculator to generatetime-series data for a south-facing PV and a PV with othertypical azimuth (e.g., east). Then, each time series is normal-ized so that the peak generation is 1 p.u. The two normalizedtime series corresponding to the south-facing PV and thenon-south-facing PV are denoted as GGG∗s = {G∗s(t)} andGGG∗ns = {G∗ns(t)}, t = 1, ..., T , respectively. G∗s(t) and G∗ns(t)denote the normalized generation at time t for a south-facingPV and a non-south-facing PV, respectively. Our goal is touse G∗s(t) to explain G∗ns(t) because PVs in a geographicallybounded distribution system typically have highly correlatedgenerations. By conducting numerical experiments, we findthat in addition to G∗s(t), the hour-in-day, Hd(t), and day-in-year, Dy(t), are also related with G∗ns(t). Therefore, we useG∗s(t), Hd(t), and Dy(t) as the input variables and G∗ns(t) asthe output variable, respectively, to train a GPR model. The

Page 6: A Two-layer Approach for Estimating Behind-the-Meter PV

6

function of GPR is to capture the relationship between G∗ns(t)and G∗s(t). The basic idea behind GPR is that if the distancebetween two explanatory variables is small, the differencebetween their corresponding dependent variables will also berelatively small. Specifically, the output, G∗ns(t), is denoted asa function of the input vector, XXX∗(t):

G∗ns(t) = f(XXX∗(t)), (8)

where, XXX∗(t) = [G∗s(t), Hd(t), Dy(t)]T. For GPR, f(XXX∗(t))is assumed to be a random variable reflecting the uncertaintyof functions evaluated at XXX∗(t). Specifically, the functionf(XXX∗(t)) is distributed as a Gaussian process:

f(XXX∗(t)

)∼ GP

(µ(XXX∗(t)),K(XXX∗(t),XXX∗(t′))

), (9)

where, µ(XXX∗(t)) represents the expected value of f(XXX∗(t)),i.e., the value of G∗ns(t). The covariance function,K(XXX∗(t),XXX∗(t′)), represents the dependence betweenG∗ns(t)’s at different times. In our problem, the covariancefunction, K(·, ·), is specified by the Squared ExponentialKernel function expressed as:

K(XXX∗(t),XXX∗(t′)

)= σ2

fexp(− ||X

XX∗(t)−XXX∗(t′)||222σ2

), (10)

where, || · ||2 represents l2-norm, σf and σ are hyper-parameters, which are determined using cross-validation. Intu-itively, (10) measures the distance between XXX∗(t) and XXX∗(t′),which can also reflect the similarity between G∗ns(t) andG∗ns(t

′).Note that G∗s(t) and G∗ns(t) are generated solar powers

using PVWatts Calculator; thus, they are known and a T -dimensional joint Gaussian distribution can be constructed as: f

(XXX∗(1)

)...

f(XXX∗(T )

) ∼ N(µµµ∗,ΣΣΣ∗), (11)

where,

µµµ∗ =

µ(XXX∗(1)

)...

µ(XXX∗(T )

) , (12a)

ΣΣΣ∗ =

K(XXX∗(1),XXX∗(1)

)· · · K

(XXX∗(1),XXX∗(T )

)...

. . ....

K(XXX∗(T ),XXX∗(1)

)· · · K

(XXX∗(T ),XXX∗(T )

) .

(12b)The joint Gaussian distribution formulated in (11) represents

a trained non-parametric model, which captures the relation-ship between G∗ns(t) and G∗s(t).

2) Inferring A Non-south-facing PV’s Generation Curve:As shown in Fig. 8, the normalized generation curve for asouth-facing PV, GGGs = {Gs(t)}, t = 1, ..., T , can be approxi-mated as the normalized estimated aggregate generation curvefor all PVs:

GGGs =GGGw

Gm, (13)

where, Gm denotes the peak of GGGw. To infer the unknowngeneration time series for a non-south-facing PV, GGGns =

{Gns(t)}, t = 1, ..., T , we assume Gns(t) is a function ofGs(t), i.e., Gns(t) = f(Gs(t)). By appending f(Gs(t)) to theend of (11), an (N+1)-dimensional joint Gaussian distributioncan be constructed as:

G∗ns(1)...

G∗ns(T )Gns(t)

=

f(XXX∗(1))

...f(XXX∗(T ))f(XXX(t))

∼ N

([µµµ∗µ1

],

[ΣΣΣ∗ ΣΣΣ∗1ΣΣΣT∗1 Σ11

]), (14)

where, XXX(t) = [Gs(t), Hd(t), Dy(t)]T is a vector of explana-tory variables. ΣΣΣ∗1 represents the training-test set covariancesand Σ11 is the test set covariance. Since G∗ns(t), XXX∗(t), andXXX(t) are known, using the Bayes rule, the distribution ofGns(t) conditioned on GGG∗ns can be computed as follows:

Gns(t)|GGG∗ns ∼ N (µ1(t),Σ1(t)), (15)

where, µ1(t) = ΣΣΣT∗1ΣΣΣ∗−1GGG∗ns and Σ1(t) = Σ11 −

ΣΣΣT∗1ΣΣΣ∗−1ΣΣΣ∗1. Note that µ1(t) denotes the most probable value

of the estimated generation at time t for a non-south-facingPV. By conducting the above inferring procedure for all the t’s,we can obtain a candidate generation time series correspondingto a particular typical PV azimuth. Since there are multipletypical azimuths, such as east, and west, we can infer multiplecandidate PV generation time series:

GGGjns = {Gjns(t)}, t = 1, ..., T, j = 1, ..., Nns, (16)

where, Gjns(t) denotes the inferred PV generation at time t,for the j’th typical non-south-facing azimuth. Nns denotes thetotal number of typical non-south-facing PV azimuths and isdetermined by conducting numerical experiments.

B. Estimating Peak Generation for Each Individual PV

Simply knowing the candidate shapes for unknown genera-tion curves is insufficient for allocating the estimated aggregategeneration to individual PVs. As discussed earlier, we shouldalso know the magnitudes for the candidate generation curves.To estimate the peak generation, we employ our observationfrom real data that the peak generation is almost identical withthe difference between the minimum diurnal native demandand the minimum net demand.

Specifically, to explain our observation regarding the cor-relation, we start with Fig. 11, showing the load durationcurves for the i’th customer’s diurnal native demand, Pw,d,i(t),and diurnal net demand, P ′w,d,i(t). Thus, we can compute thedifference between the minimums of Pw,d,i(t) and P ′w,d,i(t):

Dw,i = Pw,d,i − P′w,d,i, (17)

where, Pw,d,i and P ′w,d,i denote the minimums of Pw,d,i(t)and P ′w,d,i(t) during a selected window, respectively. Note thatPw,d,i is positive, and P ′w,d,i is negative. Then, our finding isthat Dw,i is highly similar to the peak generation, Gw,m,i, asshown in Fig. 12. This relationship inspires us to approximateGw,m,i as Dw,i:

Gw,m,i = Dw,i, i = 1, .., Nw, (18)

Page 7: A Two-layer Approach for Estimating Behind-the-Meter PV

7

Minimum diurnalnative demand

Minimumnet demand

Difference

Fig. 11. Load duration curves for an example customer’s diurnal nativedemand and diurnal net demand.

where, Gw,m,i is the estimate of Gw,m,i. However, onechallenge is that Dw,i depends on Pw,d,i, which is unknowndue to BTM PV generation. Therefore, we need to estimatePw,d,i, which is involved with another finding from real nativedemand data. Specifically, as shown in Fig. 13, the minimumdiurnal native demand, Pw,d,i, can be approximated as theminimum nocturnal native demand, Pw,n,i:

Pw,d,i ≈ Pw,n,i, i = 1, .., Nw. (19)

Note that since PV does not generate power during nighttime,Pw,n,i is known to utilities. Finally, using the estimate ofPw,d,i and the known P ′w,d,i, we can compute Dw,i using(17), and then compute Gw,m,i using (18).

2 4 6 82

4

6

8

(a) Spring

2 4 6 82

4

6

8

(b) Summer

Fig. 12. The relationship between peak generation and the difference betweenminimum diurnal native demand and minimum net demand.

0 0.5 1 1.50

0.5

1

1.5

(a) Spring

0 0.5 1 1.50

0.5

1

1.5

(b) Summer

Fig. 13. The relationship between minimum diurnal native demand andminimum nocturnal native demand.

C. Allocating the Estimated Aggregate PV Generation toIndividual PVs

Sections III, IV-A, and IV-B provide the estimated aggregategeneration time series of all PVs, inferred candidate generation

curves for individual PVs, and estimated generation peaks forindividual PVs, respectively. Therefore, estimating individualPVs’ generation curves comes down to allocating the estimatedaggregate generation time series to individual PVs. This allo-cating procedure is formulated as an optimization process:

minK,γγγ||Ge ∗K ∗ 111− GGGw||22 + λ ∗ ||γγγ||22 (20a)

s.t. Ge ∗K ≤ 111 ∗ (GGGw,m + γγγ)T, (20b)

000 ≤ γγγ ≤ P0 ∗ 111, (20c)

where, Ge = [GGGs,GGG1ns, ...,GGG

Nnsns ] is a T -by-Ne matrix, which

denotes a collection of candidate generation curves. Ne =Ns+1 denotes the total number of candidate generation curves.K = [KKK1, ...,KKKNw

] is an Ne-by-Nw matrix of decisionvariables, which denote the weights assigned to candidategeneration curves for individual PVs. KKKi, i = 1, ..., Nw, isan Ne-by-1 vector, which denotes the weights assigned tocandidate generation curves for the i’th PV. The first 111 is anNw-by-1 vector of ones. Ge ∗K results in a T -by-Nw matrix,which is a collection of estimated generation time series forindividual PVs. The first term in the objective function (20a)reflects the difference between the estimated aggregate PVgeneration, GGGw, and the weighted summation of individualPV’s estimated generations, Ge ∗K ∗ 111. The second term inthe objective function (20a) considers the estimation errors ofpeak generations. λ is a tuning parameter. γγγ is an Nw-by-1vector with non-negative elements, which reflect the errors ofapproximating Gw,m,i as Dw,i, as shown in (18). The second111 is a T -by-1 vector of ones. GGGw,m = [Gw,m,1, ..., Gw,m,Nw ]T

denotes an Nw-by-1 vector of the estimated generation peaksfor all PVs. (GGGw,m + γγγ) denotes the corrected generationpeaks with consideration of estimation errors. 111∗(GGGw,m+γγγ)T

produces a T -by-Nw matrix , in which each column containsthe same element. Constraint (20b) ensures that the estimatedgeneration time series for each PV is smaller than its estimatedpeak generation. 000 is an Nw-by-1 vector of zeros. P0 denotesthe maximum error of approximating Gw,m,i as Dw,i forindividual PVs. The third 111 is an Nw-by-1 vector of ones.Constraint (20c) ensures that the estimation errors for individ-ual PVs are non-negative and smaller than an upper bound.The reason for constraining the elements of γγγ as non-negativeis that Dw,i typically under-estimates Gw,m,i, as shown in Fig.13.

The optimization process represented in (20) is a convexquadratic programming problem, thus, we can obtain a uniquesolution for K, i.e., K∗ = [KKK∗1, ...,KKK

∗Nw

]. Then, the estimatedgeneration time series for the i’th PV, GGGw,i = {Gw,i(t)}, t =1, ..., T , can be computed as:

GGGw,i = Ge ∗KKK∗i , i = 1, ..., Nw. (21)

Then, the estimated native demand time series for the i’thcustomer, PPPw,i = {Pw,i(t)}, t = 1, ..., T , can be computedas:

PPPw,i = PPP ′w,i + GGGw,i, i = 1, ..., Nw. (22)

where, PPP ′w,i = {P ′w,i(t)}, t = 1, ..., T, denotes the knownnet demand time series recorded by smart meter for the i’thcustomer with PVs.

Page 8: A Two-layer Approach for Estimating Behind-the-Meter PV

8

𝑃 , , = 𝑃 , , , 𝑖 = 1, … , 𝑁 .

{𝑮 , 𝑮 , … , 𝑮 } 𝑮

Optimization

𝐷 , = 𝑃 , , − 𝑃, ,

, 𝑖 = 1, … , 𝑁 .

𝐺 , , = 𝐷 , , 𝑖 = 1, … , 𝑁 .

𝑲∗, 𝑖 = 1, … , 𝑁 .

𝑮 , = 𝐆 ∗ 𝑲∗, 𝑖 = 1, … , 𝑁 .

Fig. 14. Detailed steps of the individual customer-level BTM PV generationestimation.

Note that (20) can be solved for a selected window. Thewindow size, T , can impact estimation accuracy and runtime,which will be examined in the Case Study Section. Thedetailed steps for estimating customer-level PV generation areillustrated in Fig. 14.

V. CASE STUDY

In this section, the proposed two-layer BTM solar powerand native demand estimation approach is verified using realPV generation and native demand data.

A. Dataset Description

The hourly native demand and PV generation data used inthis paper are from a public dataset [22]. The time rangeof native demand and solar power is one year. This datasetcontains a total number of 100 customers with PVs and 115customers without PVs. For the customers with PVs, the netdemand is obtained by subtracting PV generation from nativedemand.

B. Aggregate-level BTM PV Generation Estimation Validation

Fig. 15 shows three-day actual and estimated aggregatePV generation/native demand curves. It can be seen that theestimated curves can accurately follow the actual curves. Toquantitatively evaluate the estimation accuracy, we also com-pute the mean absolute percentage error (MAPE) as follows:

MAPEG =100%

Nd·∑t∈Id

∣∣∣∣∣ Gw(t)−Gw(t)

Gw,m

∣∣∣∣∣, (23a)

MAPEP =100%

Nd·∑t∈Id

∣∣∣∣∣ Pw(t)− Pw(t)

Pw,m

∣∣∣∣∣, (23b)

where, MAPEG and MAPEP denote the computedMAPE’s for PV generation and native demand, respectively.Id denotes the set of daytime hours. Nd denotes the totalnumber of hours in Id. Gw,m and Pw,m denote the actualpeaks of PV generation and native demand, respectively. Thecomputed values for MAPEG and MAPEP are 1.21%and 1.28%, respectively, reflecting the high accuracy of ourproposed approach.

0 10 20 30 40 50 60 70Time (hour)

0

200

400

600Actual Estimated

(a) Aggregate PV generation

0 10 20 30 40 50 60 70Time (hour)

0

200

400Actual Estimated

(b) Aggregate native demand

Fig. 15. Three-day actual and estimated aggregate PV generation/nativedemand curves.

C. Customer-level BTM PV Generation Estimation Validation

1) Estimation Performance: Fig. 16 shows three-day actualand estimated PV generation and native demand curves for anexample customer with PV. We can see that the estimatedcurves can accurately fit the actual curves. To comprehen-sively examine the performance of our approach, we computethe MAPE for all customers with PVs. Specifically, theMAPE’s for the i’th customer are computed as follows:

MAPEG,i =100%

Nd·∑t∈Id

∣∣∣∣∣ Gw,i(t)−Gw,i(t)Gw,m,i

∣∣∣∣∣, (24a)

MAPEP,i =100%

Nd·∑t∈Id

∣∣∣∣∣ Pw,i(t)− Pw,i(t)Pw,m,i

∣∣∣∣∣, (24b)

where, Gw,m,i and Pw,m,i denote the actual generation andnative demand peaks for the i’th customer, respectively. TableI summarises the empirical cumulative distribution function(CDF) of estimation MAPE, which is constructed using allthe computed MAPE’s. As can be seen, for the estimatedhourly PV generation, 80% of the MAPE’s are less than6.28%. Regarding the estimated hourly native demand, 80%of the MAPE’s are less than 3.80%. This effectively verifiesthe estimation accuracy of our proposed approach.

TABLE IEMPIRICAL CDF OF ESTIMATION MAPE

Empirical CDF 0.2 0.4 0.6 0.8 1.0

MAPE of G (%) 2.99 4.04 4.91 6.28 8.98

MAPE of P (%) 1.79 2.16 2.83 3.80 5.65

Note that the above results are obtained under the condi-tions that (1) five produced candidate generation curves areemployed (Ne = 5), (2) the tuning parameter in (20a) is 100(λ = 100), and (3) the optimization process specified in (20)is executed for individual windows with a time length of onemonth (T = 720 hours, the entire year is divided into 12windows).

Page 9: A Two-layer Approach for Estimating Behind-the-Meter PV

9

0 10 20 30 40 50 60 70Time (hour)

0

2

4Actual Estimated

(a) PV generation

0 10 20 30 40 50 60 70Time (hour)

0

2

4

6Actual Estimated

(b) Native demand

Fig. 16. Three-day actual and estimated PV generation/native demand curvesfor an example customer with PV.

10 20 30 40 50 60 70

Time (hour)

0

0.5

1

G (

p.u.

)

East South West

Fig. 17. Three-day produced candidate generation curves corresponding tothree typical azimuths, i.e., east, south, and west.

2) Testing the Candidate Generation Curves: As elaboratedin Section IV-A, diverse candidate generation curves are pro-duced for representing the unknown BTM generation. Thus,it is of interest to examine the effectiveness of producingcandidate curves. Fig. 17 shows three produced candidategeneration curves corresponding to three typical azimuths,i.e., east, south, and west, respectively. We can observe thatcompared to the generation curve corresponding to the south,the produced curve corresponding to the east is “left-skewed”,and the produced curve corresponding to the west is “right-skewed”. Therefore, the produced curves demonstrate diver-sity, which is consistent with our observation on real PVgeneration curves shown in Fig. 9.

In addition, we have also quantitatively examined the ef-fectiveness of producing diverse candidate generation curves.Specifically, we test the impact of the number of candidategeneration curves, i.e., we solve (20) separately for threecases with different numbers of candidate curves: (I) onecandidate generation curve corresponding to the azimuth ofsouth; (II) three candidate generation curves corresponding tothe east, south, and west, respectively; and (III) five candidategeneration curves corresponding to the east, southeast, south,southwest, and west, respectively. The other conditions forthe three cases are the same: λ = 100 and T = 720 hours.To evaluate the impact of candidate number, we compute theaverage MAPE over all PVs’ MAPE’s obtained from (24).The results are summarized in Table II. We can see that as thecandidate number increases, the estimation error decreases,and the execution time increases. In addition, the MAPE for

Case I is relatively greater than Case II and III, and CaseII and Case III provide nearly identical MAPE’s. This isbecause three candidate curves - corresponding to the east,south, and west - can comprehensively represent the unknownBTM generation curve; adding extra candidate curves simplyresult in a slight accuracy improvement.

TABLE IIIMPACT OF CANDIDATE GENERATION CURVES

Case I II III

Average MAPE of G (%) 5.677 5.474 5.473

Average MAPE of P (%) 3.924 3.086 3.086

Runtime (s) 40 125 194

3) Testing the Tuning Parameter λ: As discussed in SectionIV-C, λ in (20) reflects the confidence of estimating peakgenerations for individual PVs. One general principle fordetermining λ is that the largest element in γγγ is a coupleof kilo-watts. In addition, the solutions for (20) should notbe sensitive to λ, i.e., (20) should be robust to λ. To verifythe robustness of our proposed approach, we solve (20) basedon different values of λ, and then compute the correspondingaverage MAPE’s for the estimated PV generation and nativedemand. Other conditions are that T = 720 hours and fivecandidate generation curves - corresponding to the south,southeast, south, southwest, and west - are employed. Theresults show that for the λ’s ranging from 100 to 500 withan interval of 100, the average MAPE’s for PV generationand native demand do not change (5.47% and 3.09%). Theinvariant average MAPE’s demonstrate the robustness of ourproposed approach.

4) Testing the Window Size T : Since our proposed approachcan be conducted for each divided window, it is of importanceto examine the impact of window size on estimation accuracy.To do this, we perform our approach for windows withdifferent lengths and then compute the estimation MAPE.In Table III, it can be seen that the average MAPE decreasesas T increases. This is because for a wider window, theprobability for the minimum diurnal native demand, Pw,d,i,equaling the minimum nocturnal native demand, Pw,n,i, islarger. Thus, we have a smaller estimation error for Pw,d,i,as seen in (19). Then, based on (17) and (18), it can be seenthat the smaller estimation error for Pw,d,i results in a moreaccurate Dw,i, which then brings a more accurate estimatefor Gw,m,i. Finally, more accurate peak generation estimatesresult in smaller estimation errors for the PV generation andnative demand time series.

TABLE IIIIMPACT OF WINDOW SIZE T

T (month) 1 2 3 4

Average MAPE of G (%) 5.47 5.30 5.18 5.08

Average MAPE of P (%) 3.09 2.99 2.92 2.87

Page 10: A Two-layer Approach for Estimating Behind-the-Meter PV

10

D. Performance ComparisonWe compare our proposed approach with previous works

from two perspectives, qualitatively and quantitatively.1) Qualitative Analysis: From a qualitative point of view,

one primary advantage of our approach is that it does not re-quire meteorological data and solar generation exemplars. Forthe aggregate level, our approach can perform PV generationestimation by only using recorded net demand data. For thecustomer level, our approach can also work by only relyingon recorded smart meter data, although leveraging PVWattsCalculator’s generated data can improve the estimation accu-racy.

2) Quantitative Comparison: For the customer level, wehave also compared our approach with previous works. Specif-ically, we focus on comparing our approach with the methodpresented in [20], which demonstrates better performancecompared to previous works. Table IV summarizes the com-puted MAPE’s for our approach and the compared approach.Note that the average MAPE’s for our approach have lowerand upper bounds because the considered window size, T ,ranges from one month to four months. As can be seen, theapproach in [20] demonstrates a similar estimation accuracyas our approach does. However, our approach does not re-quire solar exemplars, which makes it more independent andpractical.

TABLE IVAVERAGE MAPE COMPARISON

Approaches Our Approach Approach in [20]

Average MAPE of G (%) [5.08, 5.47] 5.24

Average MAPE of P (%) [2.87, 3.09] 2.95

VI. CONCLUSION

This paper is dedicated to proposing an independent andpractical BTM solar power/native demand estimation ap-proach. Our proposed approach contains two interconnectedlayers. The aggregate level leverages the spatial correlation ofnative demand to perform the aggregate PV generation/nativedemand estimation. The customer level utilizes the spatial cor-relation of PV generation to allocate the estimated aggregatePV generation/native demand to individual customers. TheCase Study verifies that our approach can accurately estimateBTM PV generation/native demand, significantly enhancingdistribution system observability and situation awareness. Thenumerical experiments also demonstrate that our approachdoes not require meteorological data and measured solar powerexemplars. Therefore, our approach is more independent andthus is practical for utilities to implement.

REFERENCES

[1] J. Black and V. Rojo, “Long-term load forecast methodology overview,”Sep. 2019, https://www.iso-ne.com/static-assets/documents/2019/09/p1load forecast methodology.pdf.

[2] F. Wang, Z. Xuan, Z. Zhen, K. Li, T. Wang, and M. Shi, “A day-ahead PV power forecasting method based on LSTM-RNN modeland time correlation modification under partial daily pattern predictionframework,” Energy Convers. Manage., vol. 212, no. 112766, pp. 1–14,May, 2020.

[3] R. Seguin, J. Woyak, D. Costyk, J. Hambrick, and B. Mather, “Highpenetration PV integration handbook for distribution engineers,” Nat.Renew. Energy Lab., Golden, CO, USA, Tech. Rep. NREL/TP-5D00-63114, 2016.

[4] T. A. Short, Electric Power Distribution Handbook. Boca Raton,London, New York: CRC Press, 2014.

[5] V. Krishnan and J. D. McCalley, “Building foresight in long-term in-frastructure planning using end-effect mitigation models,” IEEE SystemsJournal, vol. 11, no. 4, pp. 2040–2051, 2017.

[6] W. Buehring, C. Huber, and J. Marques, “Expansion planning for elec-trical generating systems,” International atomic energy agency, Vienna,Austria, Tech. Rep. STI/DOC/10/241, 1984.

[7] D. Chen and D. Irwin, “Sundance: Black-box behind-the-meter solardisaggregation,” in e-Energy, pp. 16–19, May 2017.

[8] Y. Wang, N. Zhang, Q. Chen, D. S. Kirschen, P. Li, and Q. Xia, “Data-driven probabilistic net load forecasting with high penetration of behind-the-meter PV,” IEEE Trans. Power Syst., vol. 33, no. 3, pp. 3255–3264,May 2018.

[9] F. Kabir, N. Yu, W. Yao, R. Yang, and Y. Zhang, “Estimation ofbehind-the-meter solar generation by integrating physical with statisticalmodels,” in IEEE SmartGridComm., pp. 1–6, Oct. 2019.

[10] F. Kabir, N. Yu, W. Yao, R. Yang, and Y. Zhang, “Joint estimation ofbehind-the-meter solar generation in a community,” IEEE Trans. Sustain.Energy, vol. 12, no. 1, pp. 682–694, 2021.

[11] K. Li, F. Wang, Z. Mi, M. Fotuhi-Firuzabad, N. Duic, and T. Wang,“Capacity and output power estimation approach of individual behind-the-meter distributed photovoltaic system for demand response baselineestimation,” Appl. Energy, vol. 253, p. 113595, 2019.

[12] F. Wang, K. Li, X. Wang, L. Jiang, J. Ren, Z. Mi, M. Shafie-khah,and J. Catalao, “A distributed PV system capacity estimation approachbased on support vector machine with customer net load curve features,”Energies, vol. 11, no. 7, p. 1750, Jul. 2018.

[13] C. Dinesh, S. Welikala, Y. Liyanage, M. P. B. Ekanayake, R. I.Godaliyadda, and J. Ekanayake, “Non-intrusive load monitoring underresidential solar power influx,” Appl. Energy, vol. 205, pp. 1068–1080,Aug. 2017.

[14] F. Sossan, L. Nespoli, V. Medici, and M. Paolone, “Unsuperviseddisaggregation of photovoltaic production from composite power flowmeasurements of heterogeneous prosumers,” IEEE Trans. Ind. Informat.,vol. 14, no. 9, pp. 3904–3913, Sep. 2018.

[15] H. Shaker, H. Zareipour, E. Muljadi, and D. Wood, “A data-drivenapproach for estimating the power generation of invisible solar sites,”IEEE Trans. Smart Grid, vol. 7, no. 5, pp. 2466–2476, Sep. 2016.

[16] E. C. Kara, C. M. Roberts, M. D. Tabone, L. Alvarez, D. S. Callaway,and E. M. Stewart, “Disaggregating solar generation from feeder-levelmeasurements,” Sustain. Energy, Grids Netw., vol. 13, pp. 112–121,2018.

[17] K. Li, J. Yan, L. Hu, F. Wang, and N. Zhang, “Two-stage decoupledestimation approach of aggregated baseline load under high penetrationof behind-the-meter PV system,” IEEE Trans. Smart Grid, pp. 1–1, 2021.

[18] J. Lin, J. Ma, and J. Zhu, “A privacy-preserving federated learningmethod for probabilistic community-level behind-the-meter solar gen-eration disaggregation,” IEEE Trans. Smart Grid, pp. 1–1, 2021.

[19] F. Bu, K. Dehghanpour, Y. Yuan, Z. Wang, and Y. Zhang, “A data-driven game-theoretic approach for behind-the-meter PV generationdisaggregation,” IEEE Trans. Power Syst., vol. 35, no. 4, pp. 3133–3144,2020.

[20] F. Bu, K. Dehghanpour, Y. Yuan, Z. Wang, and Y. Guo, “Disaggregatingcustomer-level behind-the-meter PV generation using smart meter dataand solar exemplars,” IEEE Trans. Power Syst., pp. 1–1, 2021.

[21] F. Bu, K. Dehghanpour, Y. Yuan, and Z. Wang, “Quantifying loaduncertainty using real smart meter data,” in 2020 IEEE SmartGridComm,2020, pp. 1–6.

[22] K. Nagasawa, C. R. Upshaw, J. D. Rhodes, C. L. Holcomb, D. A.Walling, and M. E. Webber, “Data management for a large-scale smartgrid demonstration project in austin, texas,” in Proc. 6th Int. Conf.Energy Sustain., Jul. 23-26, 2012, pp. 1027–1031.

[23] Q. Zhang, Y. Guo, Z. Wang, and F. Bu, “Distributed optimal conservationvoltage reduction in integrated primary-secondary distribution systems,”IEEE Trans. Smart Grid, pp. 1–1, 2021.

[24] R. Cheng, Z. Wang, Y. Guo, and F. Bu, “Analyzing Photovoltaic’s Impacton Conservation Voltage Reduction in Distribution Networks ,” Oct.2021, arXiv:2110.14777.

[25] A. P. Dobos, “PVWatts version 5 manual,” Nat. Renew. Energy Lab.,Golden, CO, USA, Tech. Rep. NREL/TP-6A20-62641, Sep, 2014.