ELECTRICITY LOAD FORECASTING WITH ARTIFICIAL NEURAL NETWORKS

Artificial neural networks are powerful tools for time series forecasting. The problem addressed in this article is to do multi-step prediction of a stationary time series, and to find the associated prediction limits. Artificial neural network models for time series are nonlinear. However, results that are applicable to linear models are sometimes mistakenly applied to non-linear models. One example where this is observed is in multi-step forecasting. A bootstrap method is proposed to calculate oneand multi-step predictions and prediction limits. The results are applied to an electricity load time series as well as to a pure autoregressive time series.


INTRODUCTION
Artificial neural networks are powerful tools for time series forecasting.See [10], [6], [8] and [17] for various applications.The problem addressed in this article is to do multi-step prediction of a stationary time series, and to find the associated prediction limits.Artificial neural network models for time series are non-linear.However, results that are applicable to linear models are sometimes mistakenly applied to non-linear models.One example where this is observed is in multi-step forecasting.Let represent a stationary time series.Suppose that the observation at time t can be described by the following model: The minimum mean square error forecast of a future observation is the expected value of the future value, given the observed time series [1].Let the vector represent the n observations At time n (which is the number of terms in the time series), the hstep forecast of , given observations is the conditional expected value n Z )., , ( 1 From (1) it follows that The notation h is used to indicate the number of time steps of the forecast.It is assumed that forecasts are made from time n, in other words, the observed time series is used up to its last value to produce forecasts.is, for instance, the forecast of the future value which is unknown at time n, and likewise, is the forecast of the future value at time n.The number of previous time series observations used to explain the current observation is denoted by p and it also corresponds to the number of input terms in a neural network.If this model is used for electricity load forecasting on an hourly basis, p previous hours' loads will be used to forecast the current load.
Therefore, using (2) and (3) the forecast of , given observations is In the case of a linear model, forecasts can be calculated recursively, by using forecasts calculated in previous steps, since the expected value of a linear function is the linear function of the expected values: Therefore, from equations (4), ( 5) and ( 6), the h-step forecast in the linear case can be written as It can be seen that the forecast depends on forecasts of the preceding time points, .In neural network terminology, outputs from the neural network are fed back as inputs to produce subsequent forecasts.
However, if the function defined by the neural network architecture, ) ; , , , ( is non-linear, equation ( 5) is not true.Even so, it is often seen in neural network literature that forecasts produced by the network are used as inputs for subsequent forecasts [8].A number of different approaches to obtain one-step and multi-step forecasts for non-linear models are discussed by [11].One of the methods, namely the bootstrap, is used in this article to produce multi-step forecasts.
Since no forecast is complete without standard error and prediction limits, it is also shown how these quantities can be calculated with bootstrap methods.In [13] confidence intervals for linear models based on bootstrap methodology are proposed.The results are extended to non-linear models in this article.Confidence intervals are derived for a time series that is generated by an autoregressive process.The results are also applied to find predictions and confidence limits for an electricity load time series.

FORECASTING WITH AUTOREGRESSIVE MODELS
In the case of an autoregressive model of order p or AR(p) model (see [1]), which is linear, the h-step forecast (see equation ( 7)) is . In that case, the observed time series value is used.Using the property of conditional expectation, it follows that The forecasts are calculated recursively, and converge to the mean of the time series for large values of h.
Assuming that the distribution of K is normal, the distribution of is normal with mean and variance where the s ' j θ are the weights in the moving average representation of (see reference [1]): probability limits for are given by -th percentile of the standard normal distribution.

MULTI-STEP FORECASTS AND CONFIDENCE INTERVALS FOR NEURAL NETWORK MODELS
Neural network models are non-linear, and the expected value of a non-linear function is generally not of a simple form.In the case where the model in equation ( 1) is non-linear, the observation at time n+1 is and the minimum MSE single-step (h=1) forecast (from equation ( 4)) is since the expected value of a function of known values (in this case ) is equal to the function of the known values.The maximum likelihood estimator of is where is an estimator of θ ˆθ which is obtained by maximising the likelihood function of the observations, or in neural network terminology, the vector represents the weights of the trained network.
The 2-, 3-, … step forecasts are not of such a simple form, since they depend on the future time series observations that are unknown.Consider which is the expected value of a non-linear function.Instead of calculating this expected value, which may be a complex integral, bootstrap methodology can be used to estimate the minimum MSE forecasts.
Bootstrapping is a resampling method introduced by Efron [4].Different methods have been proposed for bootstrapping in the context of time series [2], [7], [9], [14], [15].In this article, the method based on residuals is used ([3], [9] and [5]).The residuals of the neural network are the difference between observed time series values and the corresponding values predicted by the neural network.The idea is to generate a large number of time series from the same population as the observed time series, called bootstrap time series.For each bootstrap time series, some statistic of interest is calculated.In this way, a large sample of possible realizations of the statistic is obtained, called the empirical distribution of the statistic.The empirical distribution of the statistic is an estimator of the true sampling distribution.The variance of the statistic is for instance estimated by using the variance of the empirical distribution.
In this application, the statistic is a future value of the time series.The residuals of the neural network and the trained network are used to produce many different realizations of the time series from the point n+1 onwards.It is possible to generate a whole distribution of possible values at each time point.The mean of the empirical distribution is an estimate of the forecast and the percentiles are used as confidence limits.
The two-step forecast in equation ( 16) depends on .Since Z is unknown, it has to be estimated.
) ; , , , ( , can be obtained by using the one-step forecast and adding a typical error term to it.Recall from ( 12) and ( 13) that A possible realization of is given by: The error terms, denoted by , j=1,2,…, m are observations (drawn randomly with replacement) from the residuals of the model, , with ) *( 1 The outputs of the neural network corresponding with the m input sets K are averaged to get the two-step forecast: This approach can be extended to three steps, four steps, and so on.Suppose one-step, twostep, up to h-step forecasts are required.Following the above procedure, the j-th bootstrap (j=1,2,…m) time series is generated as follows (see (17)): A total of m bootstrap series is required.In general, the forecast of a future observation is the average network output associated with the p most recent observations which may be only bootstrap time series values (if h>p): or a combination of bootstrap time series and actual observations (if h≤p) across the m bootstrap time series: The procedure can be summarized in a number of steps: • Train the neural network on the time series, in other words, fit the model (1) to the time series .
where and is an observation (drawn randomly with replacement) from .
) *( 1) ; , , , ( ) *( 1) *( 2) ; , , , ( ) *( 1) *( 2) *( 3) ; , , , , ( • Repeat the previous step m times, where the minimum value for m is 1000.A minimum of 1000 observations is required for the calculation of the percentiles.The 2.5 th and 97.5 th percentiles are on the left and right extremes of the distribution.The generated bootstrap time series values will tend to be close to the mean, with only a few extreme values.A relatively large sample is therefore required to estimate values in the extremes of the distribution more accurately.There are therefore m bootstrap time series with possible values for .
• The 2.5 th and 97.5 th percentiles of are the 95% confidence limits associated with the h-step forecast.

Example 1
The time series considered is total electricity load demand measured in kilowatt hours accumulated hourly by Eskom (the largest electricity supplier in South Africa).The data were scaled between -1.0 and 1.0.A time plot of two weeks' data is given in Figure 1.Four weeks' data, a total of n=672 observations, were used to train the network.Input variables included periodic terms to take care of the 168-hour, 24-hour and 12-hour consumption patterns present in the data, as well as electricity load values of previous time points.A feed forward neural network with one hidden layer and a sigmoidal activation function on the hidden layer was trained to predict the next load value, given the set of inputs.Bootstrap methodology was used to predict the electricity loads for 1 to 24 hours ahead, together with their standard errors and a 95% confidence interval.The results are presented in Figure 2. The notation L95 and U95 on the graph indicates the lower and upper 95% confidence limits respectively.
A linear regression model was also used to predict the electricity load.The same set of explanatory variables (inputs) was used.In this case, the prediction limits are based on the assumption that the prediction errors follow a normal distribution.The predictions and 95% prediction limits for the linear model are given in Figure 3.It can be seen that the forecasts obtained by the neural network and linear model compares well.
It can be seen that the predictions are quite smooth and that the width of the interval gradually increases with the forecasting period in the case of the linear model.The same is not true for the confidence interval obtained by the bootstrap method, especially towards the end of the forecasting period.This is due to the fact that the bootstrap method is essentially data driven and does not rely on any assumptions of the probability distribution of the data.The data used for this example were 200 computer-generated values of a stationary autoregressive model of order 2 (AR(2)) process as given by equation (8).The generating process is linear, and it can therefore not be expected that a neural network model would produce better prediction results than a linear model.A feed forward neural network with two input nodes for two past values of the time series and one hidden layer with a sigmoidal activation function was trained.Bootstrap methodology was used to calculate predictions and prediction limits.The prediction results are given in Figures 4 and 5.As in the previous example, the results for the neural network, where bootstrap methods were used, and the linear model correspond more or less.

CONCLUSIONS
It is shown that bootstrap methodology can be used to calculate multi-step predictions of neural networks and their associated prediction limits.
Effective algorithms are essential for network training, since the bootstrap approach is very computer intensive.In order to derive prediction limits, a minimum of 1000 time series should be resampled from the observed time series.Considering the ever increasing speed of computers and the development of more effective algorithms for network training, this is not viewed as a real problem.Two examples are given.In both cases, neural networks, as well as linear regression models, are used to model the time series, and predictions and prediction limits are calculated.The results compare reasonably well.Further research is required to establish whether the bootstrap results can be improved by, for instance, increasing the number of bootstrap replications.
been proposed for the calculation of bootstrap confidence intervals.See[16]  for a review.The method based on percentiles is used in this application.

Figure 1 :Figure 2 :
Figure 1: Hourly electricity load demand time series