short-term forecasting of residential electricity demand using

9
Heriot-Watt University Research Gateway Short-term forecasting of residential electricity demand using CNN-LSTM Citation for published version: Lotfipoor, A, Patidar, S & Jenkins, DP 2020, 'Short-term forecasting of residential electricity demand using CNN-LSTM', Paper presented at 2nd IBPSA-Scotland uSIM Conference, Edinburgh, United Kingdom, 12/11/20 - 12/11/20. Link: Link to publication record in Heriot-Watt Research Portal Document Version: Publisher's PDF, also known as Version of record General rights Copyright for the publications made accessible via Heriot-Watt Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy Heriot-Watt University has made every reasonable effort to ensure that the content in Heriot-Watt Research Portal complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 03. May. 2022

Upload: others

Post on 03-May-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Short-term forecasting of residential electricity demand using

Heriot-Watt University Research Gateway

Short-term forecasting of residential electricity demand usingCNN-LSTM

Citation for published version:Lotfipoor, A, Patidar, S & Jenkins, DP 2020, 'Short-term forecasting of residential electricity demand usingCNN-LSTM', Paper presented at 2nd IBPSA-Scotland uSIM Conference, Edinburgh, United Kingdom,12/11/20 - 12/11/20.

Link:Link to publication record in Heriot-Watt Research Portal

Document Version:Publisher's PDF, also known as Version of record

General rightsCopyright for the publications made accessible via Heriot-Watt Research Portal is retained by the author(s) and /or other copyright owners and it is a condition of accessing these publications that users recognise and abide bythe legal requirements associated with these rights.

Take down policyHeriot-Watt University has made every reasonable effort to ensure that the content in Heriot-Watt ResearchPortal complies with UK legislation. If you believe that the public display of this file breaches copyright pleasecontact [email protected] providing details, and we will remove access to the work immediately andinvestigate your claim.

Download date: 03. May. 2022

Page 2: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

Short-term Forecasting of Residential Electricity Demand Using CNN-LSTM

Ashkan Lotfipoor1, Sandhya Patidar2, David Jenkins21PhD researcher, Institute for Infrastructure and Environment, Heriot-Watt University

2Associate Professor, Institute for Infrastructure and Environment, Heriot-Watt University

AbstractNear accurate forecasting of energy demand has be-come a non-trivial requirement for developing effec-tive management and planning strategies/policies fora resilient energy system. This paper is aimed to de-velop a novel deep learning-based energy demand pre-diction model by utilising the combination of Convo-lutional neural networks and Long Short-term Mem-ory units. The proposed model consists of two one di-mensional convolutional layer with max pooling, twobidirectional LSTM layers and finally three fully con-nected dense layer. The energy consumption dataavailable for a household based in Findhorn ecovillagelocated in the north of Scotland for a six-week periodduring the February and March of 2015 was utilisedfor training, validating, and testing the models. Theproposed model provides energy demand predictionfor short-term forecasting (5 minutes). The resultsobtained from the model are compared against fourof the classical and widely applied algorithms for timeseries forecasting: autoregressive integrated movingaverage (ARIMA), light gradient boosting machine(LightGBM), random forest (RF), and deep neuralnetworks (DNN). The result obtained demonstratedthe efficiency of the proposed architecture in outper-forming all well-established models.

IntroductionDuring the past decade, smart meters and grids havebecome widespread around the globe; for example,the numbers of smart meters installed in the UKreached 14.9 million at the end of June 2019 (Kerai,2019). Smart meters provide a vast amount of in-formation/data on energy supply and demand to re-searchers and industry stakeholders.Over the years, researchers have used this data toimprove energy efficiency and sustainability in manyfrontiers, such as community energy modelling, en-ergy management and energy forecasting. Possibilityof near accurate forecasting of energy demand couldserve several purposes including optimising the utili-sation of available resources and identification of anypotential risks, which is essential for developing ef-fective management and planning strategies/policiesfor a robust and resilient energy system. ASHRAE

breaks energy estimation models into two main cate-gories: physics-based and data-driven models (Owenet al., 2009). Data-driven models can be divided intotwo types of methodologies, a statistical model or amachine learning algorithm.In recent years, with the boom of artificial intelligencein various fields and the widespread use of machinelearning algorithms, many researchers have startedtesting the potential application of these techniquesfor accurate forecasting of energy demand. For exam-ple, Robinson et al. (2017) in a comprehensive studyevaluating the performance of linear regressor, RBFkernel support vector regressor (SVR), AdaBoost re-gressor, bagging regressor, gradient boosting regres-sor, random forest, multilayer perceptron regressor(MLP regressor), and K-nearest neighbour regressor.Theile et al. (2018) studied the performance of Sup-port Vector Machine (SVM) and Recurrent NeuralNetworks on the day-ahead electricity consumptionprediction of a group of households. The trainingof the models was performed using observed weatherdata while the forecasting was performed using pre-dicted weather data. Johannesen et al. (2019) com-pared three algorithms, namely Random Forest, K-nearest neighbour and Linear Regressor for forecast-ing electricity demand profiles of the urban area. Sim-ilarly, Huang et al. (2019) used XGboost, Extremelearning machine (ELM) and MLP for energy demandprediction for residential buildings.The deep learning has also drawn the attention ofmany researchers working in the area of energy de-mand prediction. Some related examples include thework of Amarasinghe et al. (2017). They investigatedthe effectiveness of Convolutional Neural Network(CNN) for performing energy load forecasting at theindividual building level. Paterakis et al. (2017) com-pared the performance of MLPs with SVMs, GaussianProcesses, Regression Trees, Ensemble Boosting andLinear Regression. Kim and Cho (2019a) provided areview of using deep learning with state explainableautoencoder for prediction of electricity consumptionprofiles. An autoencoder is a neural network that istrained to attempt to copy its input to its output. Itlearns efficient data representations (encoding) andproduces an encoded vector representing the input

Page 3: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

sequence. This vector is then provided as an inputto the decoder model that interprets it, and the out-put sequence is generated (Goodfellow et al., 2016).Moustris et al. (2020) explored the potential of MLPfor the medium, short and very short-term forecast-ing of electricity load. This study utilises, Humanthermal comfort-discomfort biometeorological indexas one of the critical features in forecasting. Kimand Cho (2019b) proposed a CNN-LSTM model us-ing the combination of CNN and LSTM for predictingelectric energy consumption on IHEPC dataset. Sim-ilarly, Le et al. (2019) used a CNN and Bi-LSTM netto predict energy consumption on IHEPC dataset.Despite this recent development in the previouslystudied methods for energy demand prediction, lit-tle progress has been made in the robustness of thesemodels. Many accurate machine learning models arecomparable or even superior to traditional methods,but they have never been deployed as a solution forthe public sector. Often, researchers use datasets thatare expensive and complex to collect, so when stake-holders want to use these methods for solving theirproblem, they face the challenge of running thesemodels. To bridge the aforementioned gap, this pa-per aims to develop a prediction model which is easyto implement and yet is capable of very accurate pre-diction. The other objectives are to compare the de-veloped methodology with existing methods.The proposed model provides energy demand predic-tion (EDP) for short-term forecasting (5 minutes).The results obtained from the model are comparedagainst four of the classical and widely applied al-gorithms: autoregressive integrated moving average(ARIMA), random forest, light gradient boosting ma-chine (LightGBM), and deep neural networks (DNN).The remaining of the paper is organised as follows:Study area and data collection section present de-tails about the dataset; Methodology section intro-duces the different elements of the methodology andapproach for combining workflow; Result and Dis-cussion section reports the key findings along with acritical discussion on overall success and potentials ofproposed methods and ideas for future works.

Study Area and Data CollectionTo train and test the developed models, high-resolution data gathered from the Findhorn Ecovil-lage site, located in northern Scotland, is used in thisstudy. The demand data of Findhorn was collectedas part of the Orchestration of renewable integratedgeneration in neighbourhoods (ORIGIN) project. Forthe case study, high-resolution data (five-minute in-terval) collected over a continuous period from Febru-ary 2015 to March 2015 is used.The selected case-study household for this study isbuilt on two floors with a total area of approximately96 m2. The reading collected from the smart meteris solely related to the overall electricity usage, as

Figure 1: Findhorn Ecovillage Position in Scotland

the house uses a gas boiler for heating purposes. Thedynamics of the electricity demand profile for the casestudy dwelling during the stated period is presentedin figure 2.

Figure 2: Demand graph from Feb 2015 to Mar 2015

MethodologyDeep Neural NetworksPerceptrons were developed in the 1950s and 1960sby Rosenblatt (1962), inspired by earlier work by Mc-Culloch and Pitts (1943). A perceptron takes in user-specified number of input units, x1, x2, . . . , xn,and each of these units is associated with a specificweight, w1, w2, . . . , wn. These weights are usuallymultiplied to input units and then summed togetherto produce the logit of the neuron. The logit is thenpassed through a linear function to produce the out-put (Buduma and Locascio, 2017). To learn complexrelationships, we need to use neurons that employsome nonlinearity. Today, it is more common to useother types of neurons, namely the sigmoid neuron.A sigmoid neuron is similar to a perceptron, exceptthat the output is not 0 or 1. Instead, it’s σ(wx+ b),where σ is a sigmoid function.A standard neural network consists of many neurons,each producing a sequence of real-valued activation.Many researchers had wanted for decades to traindeep multilayer neural networks, but no successfulattempts were reported before 2006 (Bengio, 2009).Hinton et al. (2006) at the University of Toronto in-

Page 4: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

troduced Deep Belief Networks (DBNs) with a learn-ing algorithm that greedily trains one layer at a time.Deep learning methods aim at learning feature hier-archies with features from higher levels of the hier-archy formed by the composition of lower-level fea-tures (Bengio, 2009). Deep feedforward networks,also called feedforward neural networks, or multi-layer perceptrons (MLPs), are the quintessential deeplearning models (Goodfellow et al., 2016).

Convolutional Neural NetworksConvolutional nets were inspired by the visual sys-tem’s structure. The first computational modelsbased on these local connectivities between neuronsand hierarchically organised transformations of theimage are found in Fukushima (1980). Later, LeCunand collaborators designed and trained convolutionalnetworks using the error gradient, obtaining state-of-the-art performance (LeCun et al., 1998).Convolution preserves the relationship between pix-els by learning image features using small squares ofinput data. It is a mathematical operation that takestwo inputs, such as an image matrix and a filter.The architecture of a CNN is designed to take advan-tage of the 2D structure of an input image. This isachieved with local connections and tied weights fol-lowed by some form of pooling (Average/Maximum)which results in translation-invariant features. Sim-ilarly, to computer vision tasks, in time series prob-lems it is desired to extract a small number of low-level features with a small receptive field across theentire input. This method can significantly improvethe accuracy of prediction system while keeping thecomputation cost at an acceptable range.

Long Short-term MemoryThe most effective sequence models in deep learningare called gated recurrent neural networks. Theseinclude the long short-term memory (LSTM) andthe gated recurrent units (GRU) (Goodfellow et al.,2016). To combat the problem of vanishing gradients(Hochreiter et al., 2001), Hochreiter and Schmidhu-ber (1997) introduced the LSTM architecture by in-troducing self-loops to produce paths where the gra-dient can flow for a long duration. A crucial additionhas been to make the weight on this self-loop condi-tioned on the context, rather than fixed (Gers et al.,2000).LSTMs make small modifications to the informationby multiplications and additions. With LSTMs, theinformation flows through a mechanism known as cellstates, and the cell states allow LSTMs to let infor-mation through selectively. They are composed outof a sigmoid neuron and a point-wise product. Thosegates act on the signals they receive, and similar tothe neural network’s nodes, they block or pass on in-formation based on its strength and import, whichthey filter with their own sets of weights. Thoseweights, like the weights that modulate input and hid-

den states, are adjusted via the learning processes ofrecurrent networks. That is, the cells learn when toallow data to enter, leave or to be deleted throughthe iterative process of making guesses, backpropa-gating error, and adjusting weights via gradient de-scent (Goodfellow et al., 2016).

Figure 3: LSTM Architecture (Goodfellow et al.,2016)

Model ArchitectureIn many recent studies, both CNNs and LSTMs haveshown a significant increase in performance over DeepNeural Networks (DNNs) across a variety of tasks(Huang and Kuo, 2018; Ding et al., 2018; Núñez et al.,2018; Zhao et al., 2019; Bhunia et al., 2019). CNNs,LSTMs and DNNs are complementary in their mod-elling capabilities, as CNNs are good at reducing fre-quency variations, LSTMs are good at sequence mod-elling, and DNNs are appropriate for mapping fea-tures to a more separable space (Sainath et al., 2015).In this paper, a deep neural net is developed, whichwill use this complementarity to predict energy de-mand.Figure 4 shows the complete architecture of the de-veloped model. The two convolutional layers will ex-tract features from the input variables (energy de-mand with 5 min interval). The convolutional layersdevelop a feature map of the input variables whichwill improve the accuracy of the model in compari-son with vanilla LSTM networks. After each convolu-tional layer, there is a max-pooling layer which acts asa tool to reduce over-fitting and to minimise the com-putational cost by reducing the number of parametersto learn. Max pooling is a sample-based discretisa-tion process, and the objective is to down-sample aninput representation and reducing its dimensionality(Kulkarni and Satapathy, 2019). By using a pool-ing layer in the architecture, only features with thehighest value and importance will be selected.After flattening, converting a matrix to a single array,the output of convolutional layers, the information ispassed to the three Bi-LSTM layers to detect patternsin long periods in both directions. Finally, the outputof the layers will pass onto the four fully connected

Page 5: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

Figure 4: CNN-LSTM Architecture

dense layers with one dropout layer in between formaking a prediction.The proposed methodology was implemented to per-form prediction for the next time step of (five-minute)consumption. In order to do this, 35 previous obser-vation was fed into the model as lagged features. Thisnumber was selected based on a grid search to obtainthe optimum window size for maximising the perfor-mance and minimise the computation cost. It worthmentioning that the predicted value is not used forforecasting the next time step. For the CNN layers,the filter size of 128 was selected as the optimum num-ber with a kernel size of 2. The activation function forthese layers was the linear rectifier unit (ReLU) func-tion which is the default recommendation in modernneural networks (Goodfellow et al., 2016). As ex-plained earlier for pooling phase of the network, themax-pooling was selected with a pool size of 2. Oncethe CNN layers made their feature maps, and it wasflattened, their output was sent to two Bi-LSTM lay-ers.Bidirectional RNN is consist of two independentRNNs together. This structure allows the networks tohave both backward and forward information aboutthe sequence at each time step. Bidirectional layerswill run the inputs in two ways, one from past to fu-ture and one from future to past and what makes thismethod powerful is that it enables the network in anypoint in time to preserve information from both pastand future. Based on our experiments, using this ar-chitecture help the model to better understand the

patterns in the demand load. Next, for regularisingthe network, a dropout layer with a dropout rate of0.3 was used after the second Bi-LSTM layer.In the developed model, three fully connected denselayer with 50, 10 and 1 neurons were used, respec-tively. These layers used the ReLU function as theiractivation function. For compiling the model, Adap-tive Moment Estimation (Adam) (Kingma and Ba,2014) algorithm was used as the gradient-based op-timiser. The value of the learning rate was obtainedby running the model first with a learning rate sched-uler. Mean square error (MSE) was used as the lossfunction for training the model and root mean squareerror (RMSE) and mean absolute error (MAE) wereused as the metrics for evaluating the performance ofthe model. Furthermore, an early stopping methodwas used to prevent the model from overfitting onthe train data. The model was trained in 200 epochswith a batch size of 256. In total, 1,385,275 pa-rameters were trained during the model developmentphase. The model was developed using the Kerasneural-network library with TensorFlow 2.1.0 back-end in Python.Evaluation index of model performancesThe predicted values by the proposed model are eval-uated by three performance metrics for regressionmodels, MAE, MSE and RMSE. Furthermore, theresidual error was calculated for test data to evaluatethe performance of the model by plotting its distri-bution. In regression analysis, the difference betweenthe observed value of the dependent variable (yi) andthe predicted value (xi) is called the residual error(e).

e = yi − xi

MAE measures the average magnitude of the errorsin predicted values, without considering their direc-tion. The MAE is a linear score which means that allthe individual differences are weighted equally in theaverage. MAE is calculated as follows.

MAE =1

n

n∑i=1

abs(yi − xi)

MSE measures the average of the squares of the er-rors. In other words, it is the average squared dif-ference between the predicted values and the actualvalues. The equation for MSE is as follows.

MSE =1

n

n∑i=1

(yi − xi)2

RMSE is the standard deviation of the residual errors.It is the square root of the average of squared dif-ferences between predictions and actual values. TheRMSE is calculated as:

RMSE =

√√√√(1

n)

n∑i=1

(yi − xi)2

Page 6: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

Table 1: Performance of different algorithms on the dataset

Evaluation ResultsARIMA Random Forest LightGBM DNN CNN-LSTM C24

Train Test Train Test Train Test Train Test Train Test Train TestMAE 0.230 - 0.207 0.205 0.245 0.276 0.202 0.201 0.200 0.192 - 0.180MSE 0.227 - 0.200 0.298 0.154 0.225 0.211 0.209 0.206 0.214 - 0.151RMSE 0.477 - 0.459 0.476 0.493 0.475 0.479 0.469 0.454 0.463 - 0.389

Results and DiscussionThe performance of the developed network is eval-uated against one mainstream statistical method(ARIMA) and three established machine learning al-gorithms, namely Random Forest, Light GradientBoosting Machine (LightGBM) (Ke et al., 2017) anda DNN. LightGBM is a gradient boosting frameworkthat uses tree-based learning algorithms. For theARIMA model, the values of the hyperparameters (p,d, q) were chosen by using autocorrelation function(ACF) and partial autocorrelation function (PACF).For Random Forest and LightGBM model, the val-ues of the hyperparameters were tuned by performinggrid search so that the maximum performance of theindividual models were achieved. Moreover, cross-validation was performed to check the degree of gen-eralisation of the models. For the DNN, four denselayers with 15, 10, 5 and 1 neurons were developed,and similar values for hyperparameters (Activationfunction = ReLU, Optimisation algorithm = Adamwith a learning rate of 1e-4) were used. All modelsused the same 80/20 ratio for splitting the datasetinto training and testing set; however, the train/val-idation/test configuration was used in the develop-ment of the neural networks (20% of train data wasused as validation set). Table 1 summaries the re-sults (MAE, MSE and RMSE) obtained from all thealgorithms for training and testing sets.Although the difference in the performance of differ-ent models is not substantial, it can be seen that theCNN-LSTM network provides the lowest values in allthree evaluation metrics for an accurate forecast, out-performing all other models. DNNs require a largesize training data because of the considerable num-ber of parameters needed to be tuned by a learningalgorithm. The issue in DNN is that the networkstarts with a poor initial state and then an optimi-sation algorithm such as stochastic gradient descentor Adam is used to converge the network to an opti-mal position. So, given more data to the CNN-LSTMnetwork, it is likely it outperform other models sig-nificantly.Also, it can be argued that the CNN-LSTM networkcan be generalised best to unseen data due to thelow difference between RMSE for train and test data,which increase the robustness and generality of the

Figure 5: Predicted values and actual values for a day

model. The reason for this could be all the measuresthat were implemented to prevent the model fromoverfittings, such as the use of max-pooling layers,dropout layer, and early stopping. Furthermore, itis worth noting that all machine learning algorithmsoutperform the statistical model used in this study,showing substantial promise for the machine learn-ing algorithms in time series forecasting of electricitydemand load.Figure 5 depicts a prediction from all models againstthe actual measurement in the testing set. As it can

Page 7: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

be seen, RF and CNN-LSTM can follow the generaltrend of data showing capabilities of generalisation.ARIMA model also predicts accurately, however ithas seen the test data during model development. Ifthe model sees the test data during the model devel-opment phase, its accuracy and generalisation cannotbe adequately assessed.Although the most commonly applied metrics wereused here to assess the accuracy of model predic-tions, it is particularly difficult to conclude whetherthe predictions are consistently accurate or not. Forthis reason, the probability density function (PDF) ofresidual errors, depicted in Figure 6, was created tocheck the distribution of the error. A symmetricallydistributed probability density functions with meancloser to zero, and small standard deviation producesmore accurate and reliable results. The PDF graphof CNN-LSTM has the closest mean to zero than anyother method.

Figure 6: Histogram of Residual Error on test data

Furthermore, to evaluate the efficiency of the devel-oped model in prediction of unseen data, a differentdwelling from Findhorn village was selected to per-form further testing of the model. It is worth not-ing that the chosen building has similar constructionand overall design as the building used for trainingthe model. The result shows (under C24 in Table1), even better performance of the developed modelon this dataset. The proposed model predicted thedemand for this building by RMSE value of 0.389.As mentioned earlier, the CNN-LSTM network out-performs other approaches, however, the methodneeds to be tested on several different datasets withmany different architectures (number of neurons, lay-ers, etc.) to accurately analyse the effectiveness ofthis algorithm for generating near accurate energy de-mand prediction.

ConclusionThis study was conducted to investigate the perfor-mance of a CNN-LSTM model for generating energydemand forecasting for residential buildings. Thepresented model was trained and tested on demandload of a building in Findhorn ecovillage in Scotlandand then further validated through an independentapplication on another building in the same area.

The novel structure of the proposed model consists oftwo one-dimensional convolutional layers with max-pooling, three bidirectional LSTM layers and threefully connected dense layer. Furthermore, the per-formance of the model was tested against ARIMA,RF, LightGBM and a DNN to see if CNN-LSTMshow any advantage in this problem. The result ob-tained demonstrated the efficiency of the proposed ar-chitecture in outperforming all well-established mod-els. The RMSE value measured on the test data forCNN-LSTM was significantly lower than the rest ofthe other algorithms investigated herein. Thus, it canbe concluded that for the present investigation, CNN-LSTM architecture shows a good performance in gen-erating accurate load forecasting. There are needs fora detailed investigation to explore further potentialsof the proposed model. For the future work, the au-thors plan to implement a multivariate approach toimprove the load forecasting abilities further and touse a variety of features such as weather-related dataand building information in parallel. In addition tothis, it would be interesting to see if the addition ofmore dataset in training and evaluation steps of themodel development could contribute to achieving aneven better performance. Furthermore, future workswill focus on the real-world problem of multi-stepforecasting for household electricity consumption.

AcknowledgementThe authors would like to acknowledge the FindhornFoundation and the academic and industry partnerswithin the Centre for Energy Systems Integration(CESI - National Centre for Energy System Integra-tion).

ReferencesAmarasinghe, K., D. L. Marino, and M. Manic

(2017). Deep neural networks for energy load fore-casting. In 2017 IEEE 26th International Sympo-sium on Industrial Electronics (ISIE), pp. 1483–1488.

Bengio, Y. (2009). Learning deep architectures forai. Foundations and trends® in Machine Learn-ing 2(1), 1–127.

Bhunia, A. K., A. Konwer, A. K. Bhunia,A. Bhowmick, P. P. Roy, and U. Pal (2019). Scriptidentification in natural scene image and videoframes using an attention based convolutional-lstmnetwork. Pattern Recognition 85, 172–184.

Buduma, N. and N. Locascio (2017). Fundamentalsof Deep Learning: Designing Next-Generation Ma-chine Intelligence Algorithms. O’Reilly Media, Inc.

Ding, L., W. Fang, H. Luo, P. E. D. Love, B. Zhong,and X. Ouyang (2018). A deep hybrid learningmodel to detect unsafe behavior: Integrating con-

Page 8: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

volution neural networks and long short-term mem-ory. Automation in Construction 86, 118–124.

Fukushima, K. (1980). Neocognitron: a self orga-nizing neural network model for a mechanism ofpattern recognition unaffected by shift in position.Biol Cybern 36(4), 193–202.

Gers, F. A., J. A. Schmidhuber, and F. A. Cummins(2000). Learning to forget: Continual predictionwith lstm. Neural Comput. 12(10), 2451–2471.

Goodfellow, I., Y. Bengio, and A. Courville (2016).Deep Learning. MIT Press.

Hinton, G. E., S. Osindero, and Y.-W. Teh (2006). Afast learning algorithm for deep belief nets. Neuralcomputation 18(7), 1527–1554.

Hochreiter, S., Y. Bengio, P. Frasconi, and J. Schmid-huber (2001). Gradient flow in recurrent nets: thedifficulty of learning long-term dependencies, Vol-ume 6, pp. 237–243. IEEE Press.

Hochreiter, S. and J. Schmidhuber (1997). Longshort-term memory. Neural Computation 9(8),1735–1780.

Huang, C. J. and P. H. Kuo (2018). A deep cnn-lstmmodel for particulate matter (pm2.5) forecasting insmart cities. Sensors (Switzerland) 18(7).

Huang, Y., Y. Yuan, H. Chen, J. Wang, Y. Guo, andT. Ahmad (2019). A novel energy demand predic-tion strategy for residential buildings based on en-semble learning. In Energy Procedia, Volume 158,pp. 3411–3416.

Johannesen, N. J., M. Kolhe, and M. Goodwin(2019). Relative evaluation of regression tools forurban area electrical energy demand forecasting.Journal of Cleaner Production 218, 555–564.

Ke, G., Q. Meng, T. Finley, T. Wang, W. Chen,W. Ma, Q. Ye, and T.-Y. Liu (2017). Lightgbm: Ahighly efficient gradient boosting decision tree. InAdvances in neural information processing systems,pp. 3146–3154.

Kerai, M. (2019). Statistical release and data: Smartmeters, great britain, quarter 4 2019. Report,Department for Business, Energy and IndustrialStrategy.

Kim, J. Y. and S. B. Cho (2019a). Electric energyconsumption prediction by deep learning with stateexplainable autoencoder. Energies 12(4), 739.

Kim, T.-Y. and S.-B. Cho (2019b). Predicting resi-dential energy consumption using cnn-lstm neuralnetworks. Energy 182, 72–81.

Kingma, D. P. and J. Ba (2014). Adam: A methodfor stochastic optimization.

Kulkarni, A. and S. Satapathy (2019). Optimizationin Machine Learning and Applications. SpringerSingapore.

Le, T., M. T. Vo, B. Vo, E. Hwang, S. Rho, andS. W. Baik (2019). Improving electric energy con-sumption prediction using cnn and bi-lstm. AppliedSciences (Switzerland) 9(20).

LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner(1998). Gradient-based learning applied to docu-ment recognition. Proceedings of the IEEE 86(11),2278–2324.

McCulloch, W. S. and W. Pitts (1943). A logicalcalculus of the ideas immanent in nervous activity.The bulletin of mathematical biophysics 5(4), 115–133.

Moustris, K., K. A. Kavadias, D. Zafirakis, and J. K.Kaldellis (2020). Medium, short and very short-term prognosis of load demand for the greek is-land of tilos using artificial neural networks andhuman thermal comfort-discomfort biometeorolog-ical data. Renewable Energy 147, 100–109.

Núñez, J. C., R. Cabido, J. J. Pantrigo, A. S. Mon-temayor, and J. F. Vélez (2018). Convolutionalneural networks and long short-term memory forskeleton-based human activity and hand gesturerecognition. Pattern Recognition 76, 80–94.

Owen, M., R. American Society of Heating, and A.-C.Engineers (2009). 2009 ASHRAE Handbook: Fun-damentals. American Society of Heating, Refriger-ation and Air-Conditioning Engineers.

Paterakis, N. G., E. Mocanu, M. Gibescu, B. Stap-pers, and W. v. Alst (2017). Deep learning ver-sus traditional machine learning methods for ag-gregated energy demand prediction. In 2017 IEEEPES Innovative Smart Grid Technologies Confer-ence Europe (ISGT-Europe), pp. 1–6.

Robinson, C., B. Dilkina, J. Hubbs, W. Zhang,S. Guhathakurta, M. A. Brown, and R. M.Pendyala (2017). Machine learning approaches forestimating commercial building energy consump-tion. Applied Energy 208, 889–904.

Rosenblatt, F. (1962). Principles of neurodynamics:perceptrons and the theory of brain mechanisms.Spartan Books.

Sainath, T. N., O. Vinyals, A. Senior, and H. Sak(2015). Convolutional, long short-term mem-ory, fully connected deep neural networks. In2015 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), pp. 4580–4584.

Page 9: Short-term forecasting of residential electricity demand using

uSIM2020 - Building to Buildings: Urban and Community Energy Modelling, November 12th, 2020

Theile, P., A. L. Towle, K. Karnataki, A. Crosara,K. Paridari, G. Turk, and L. Nordstrom (2018).Day-ahead electricity consumption prediction of apopulation of households: Analyzing different ma-chine learning techniques based on real data fromrte in france. In 2018 IEEE International Con-ference on Communications, Control, and Com-puting Technologies for Smart Grids, SmartGrid-Comm 2018.

Zhao, J., X. Mao, and L. Chen (2019). Speech emo-tion recognition using deep 1d and 2d cnn lstmnetworks. Biomedical Signal Processing and Con-trol 47, 312–323.