Title: A comprehensive review on deep learning approaches in wind forecasting applications
Abstract: CAAI Transactions on Intelligence TechnologyVolume 7, Issue 2 p. 129-143 REVIEWOpen Access A comprehensive review on deep learning approaches in wind forecasting applications Zhou Wu, Zhou Wu College of Automation, Chongqing University, Chongqing, ChinaSearch for more papers by this authorGan Luo, Gan Luo College of Automation, Chongqing University, Chongqing, ChinaSearch for more papers by this authorZhile Yang, Corresponding Author Zhile Yang [email protected] orcid.org/0000-0001-8580-534X Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China Correspondence Zhile Yang, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China. Email: [email protected]Search for more papers by this authorYuanjun Guo, Yuanjun Guo Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, ChinaSearch for more papers by this authorKang Li, Kang Li School of Electronic and Electrical Engineering, University of Leeds, Leeds, UKSearch for more papers by this authorYusheng Xue, Yusheng Xue State Grid Electric Power Research Institute, Nanjing, Jiangsu, ChinaSearch for more papers by this author Zhou Wu, Zhou Wu College of Automation, Chongqing University, Chongqing, ChinaSearch for more papers by this authorGan Luo, Gan Luo College of Automation, Chongqing University, Chongqing, ChinaSearch for more papers by this authorZhile Yang, Corresponding Author Zhile Yang [email protected] orcid.org/0000-0001-8580-534X Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China Correspondence Zhile Yang, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China. Email: [email protected]Search for more papers by this authorYuanjun Guo, Yuanjun Guo Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, ChinaSearch for more papers by this authorKang Li, Kang Li School of Electronic and Electrical Engineering, University of Leeds, Leeds, UKSearch for more papers by this authorYusheng Xue, Yusheng Xue State Grid Electric Power Research Institute, Nanjing, Jiangsu, ChinaSearch for more papers by this author First published: 18 January 2022 https://doi.org/10.1049/cit2.12076Citations: 1AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract The effective use of wind energy is an essential part of the sustainable development of human society, in particular, at the recent unprecedented pressure in shaping a low carbon energy environment. Accurate wind resource and power forecasting play a key role in improving the wind penetration. However, it has not been well adopted in the real-world applications due to the strong stochastic characteristics of wind energy. In recent years, the application boost of deep learning methods provides new effective tools in wind forecasting. This paper provides a comprehensive overview of the forecasting models based on deep learning in the field of wind energy. Featured approaches include time-series-based recurrent neural networks, restricted Boltzmann machines, convolutional neural networks as well as auto-encoder-based approaches. In addition, future development directions of deep-learning-based wind energy forecasting have also been discussed. 1 INTRODUCTION Given the increasing growth of energy demands, it is critical to incorporate renewable energy into the power supply. Demand for renewable energy is expected to increase on account of lower operating costs and preferential use of many power systems [1]. As a kind of renewable energy with the characteristics of green, clean, environmentally friendly and high economic benefits, wind energy is very important for the sustainable development of human society. Due to the superiority of wind energy, it has developed by leaps and bounds in the past 10 years and become one of the most cost-competitive energy sources in the world. In 2020, the global installed capacity of wind energy was 93 GW. China and the United States are the world's largest onshore wind energy markets, which together accounts for more than 60% of new installed capacity in 2020 [2]. Up to 2020, China's cumulative installed capacity of wind energy exceeded 216 million kilowatts, accounting for around 40% of the world total amount. China has become one of the leaders in the development of global wind power [3]. The cumulative installed capacity of new energy power generation in the State Grid Operating Area is 350 million kilowatts, of which the installed capacity of wind power generation is 169 million kilowatts with a yearly increase of 16%. New energy's annual power generation is 510.2 billion kWh, accounting for 9.2% of the total power generation, of which wind power generation is 315.2 billion kilowatt-hour (kWh) with a yearly increase of 11% [4]. American wind power reached an important milestone in 2019, reaching an operating capacity of 100 GW. Since 2008, the wind power generation capacity has quadrupled and it has become the largest source of the renewable energy generation capacity in the United States, which will account for 7.2% of US electricity by 2019 [5]. The top countries in the global wind power installed capacity, such as Germany, India, Italy, Spain, the United Kingdom, France, Brazil, and Canada, are also vigorously developing wind energy [6]. Despite the advantages of wind energy, the smooth integration of large-scale wind power into the grid still faces many challenges. Due to the randomness, volatility and intermittency of wind, large-scale wind power grid connection makes it very difficult to balance power supply and demand, and also brought a universal curtailment of wind power. One possible solution to balance the challenge is to increase wind speed and power prediction. Improving wind forecast accuracy can help to optimize the overall planning and scheduling of the power grid, find the optimal combination of wind turbines, and ensure the safe and stable operation of the power system, thereby further increase the economic benefits of wind. Meanwhile, accurate wind forecasting is also one of the key prerequisites for providing wind power absorption capabilities. Wind energy forecasting has been an intractable problem in the energy system, where numerous reviews have been proposed broadly covering data processing, power and resource forecasting. Jung et al. [7] reviewed the potential technologies that can improve the performance of wind energy forecasting models, and emphasized the promising knowledge system in the forecasting. Tascikaraoglu et al. [8] outlined the combined wind energy forecasting methods and focussed on the various model combinations. Wang et al. [9] summarized eight multi-step wind speed forecasting strategies, where 48 hybrid models were compared based on these eight strategies. Bokde et al. [10] compared the existing method with empirical mode decomposition (EMD) and its improved versions in terms of pre-processing technologies. Liu et al. [11] provided a detailed review and classification of data processing techniques in wind energy forecasting, and an in-depth study of each mentioned data processing method including purpose, function, details and performance was also provided. Liu et al. [12] reviewed eight kinds of intelligent predictors for shallow and deep learning in the wind energy prediction field and auxiliary methods that can improve the predictive ability of the predictive model that include integrated learning and optimization algorithms. Vargas et al. [13] demonstrated a new literature review method called systematic literature network analysis, which was used to summarize the development of wind energy analysis in the decision-making process in the past 30 years. The authors pointed out that the most commonly used methods this year are Monte Carlo simulation and artificial neural network methods. Wang et al. [14] reviewed applications of artificial intelligent algorithms in wind energy forecasting. Gonzalez et al. [15] summarized the commonly used performance indicators for deterministic and probabilistic short-term wind power forecasting and explained the performance of these indicators on different data sets, time resolutions and certain specific model attributes. Yang et al. [16] provided a comprehensive summary and comparison of more than one hundred wind forecasting methods from three perspectives: wind speed and power prediction, uncertainty prediction, and slope time prediction. Though numerous reviews have been proposed in terms of wind forecasting, the emerging artificial intelligence technology, in particular, deep learning methods, has boosted in recent years and provides a number of new technologies in wind forecasting. However, the previous reviews mainly focussed on classification issues but did not discuss the development trends in detail. This paper attempts to summarize the methods of wind forecasting based on deep learning in the past 5 years, providing a comprehensive survey for researchers in developing new effective wind forecasting tools. The remainder of the paper is organized as follows: Section 2 described some basic concepts in the wind energy forecasting field. Section 3 presents wind forecasting models based on deep learning. Section 4 discusses the possible future research direction of wind energy forecasting. Section 5 concludes this paper. Further, the prediction framework based on deep learning is shown in Figure 1, which summarized the categories of each technique. FIGURE 1Open in figure viewerPowerPoint Wind energy prediction framework based on deep learning. AE, auto-encoder; CNN, convolutional neural network; DBM, deep Boltzmann machines; DBN, deep belief network; ESN, echo state network; GRU, gated recurrent unit; LSTM, long short-term memory; RBM, restricted Boltzmann machine; RNN, recurrent neural network; SAE, stacked auto-encoder; SDAE, stacked denoising auto-encoders 2 OVERVIEW OF WIND ENERGY FORECASTING The wind is the movement of the atmosphere and a featured form of solar energy. When there is an atmospheric pressure difference, the air moves from the higher pressure area to the lower pressure area. It is caused by three concurrent events: the uneven heating of the Earth's atmosphere by the sun, irregularities found on the Earth's surface, and the rotation of the Earth. The wind flows across the wind turbine blades, and the blades with a special structure produce an air pressure difference that produces lift and drag. When the lift is stronger than the drag, the rotor shafts rotates to drive the generator to generate electricity [17, 18]. Wind power P can be calculated as follows: P = 1 2 ρ A v 3 (1)where P represents the wind power, ρ denotes the density of air, A is the swept area of the wind turbine, and v is the wind speed. Wind power exhibits a highly non-linear cubic dependence on wind speed, and accurate wind speed prediction can provide higher power [19]. Besides, studies have shown that if the accuracy of wind speed forecasting is increased by 10%, wind power generation will increase by about 30% than expected [20]. Wind time-series forecasting classifications and applications Up to date, there is no uniform and strict standard for the forecasting term limits. They are separated strongly according to the applications. Soman et al. [21] divided the forecast period into four categories: very short-term, short-term, medium-term and long-term, as shown in Figure 2. FIGURE 2Open in figure viewerPowerPoint Forecasting period classifications The forecasting period is equal to the time resolution multiplied by the predicted steps, usually referring to the period of the test set rather than the training set, which is calculated as follows: T p = t i ∗ s t (2)where Tp is the forecasting period, ti is the time unit of the data, st is the time step. The corresponding applications are as follows: (1) Very short-term: electricity market clearing, electricity regulations, real-time grid operations, wind turbine control, power quality research, load following and distribution (2) Short-term: economic load dispatch planning, load increment/decrement decisions, load sharing, and operational security in the electricity market (3) Medium-term: energy allocation, economic dispatch, reserve requirement decisions, generator online/offline decisions, coordination of wind farm and storage device, planned maintenance on network lines, transmission network planning, congestion management, day-ahead energy and reserve scheduling, wind farm maintenance and troubleshooting (4) Long-term: wind energy resource assessment, wind farm construction planning, optimal operating cost, annual maintenance plan, operation and maintenance of conventional generation, operation management, feasibility study for wind farm, design of wind farm operation plan, energy trading strategy, and coordinate optimal unit portfolio [10, 21-26] Wind energy forecasting goals and results In order that more effective energy planning and decision-making, wind energy forecasting is indispensable. From the perspective of the forecasting process, there are two types of forecasting, namely direct forecasting and indirect forecasting, respectively. Direct forecasting refers to direct forecasting through historical wind speed or wind power data. Indirect prediction first predicts the future wind speed and then converts the predicted wind speed into wind power forecast according to the power curve of the wind turbine [10]. Indirect methods are more accurate and, therefore, more popular. According to wind forecasting results, wind forecasting models can also be divided into two categories, deterministic forecasting and probabilistic forecasting [27, 28]. Deterministic forecasting is also called point forecasting and the forecasting result is a deterministic value. The result of probabilistic forecasting is usually an interval, and the probability distribution of interval values can be given. A single deterministic method cannot reflect the uncertainty and randomness of wind speed. Many applications in the field of wind energy need to consider the uncertainty and randomness so that probabilistic forecastings have attracted an increasing attention in recent years [29]. Wind energy forecasting models From the most basic types, wind forecasting methods can be divided into five categories: persistence method, physical method, conventional statistical method, machine learning method with shallow structure, machine learning method with deep structure, that is, deep learning [25]. The persistence method is fairly straightforward. It is assumed that wind speed or power at a certain future time will be the same as it is when the forecast is made [24]. The expression of this method is as follows: P ( t + k ) = P ( t ) (3) This model performs well in very short-term forecasting, but as the time scale increases, its accuracy gradually decreases. Hence, it is usually used as a benchmark model to compare with new models [30]. The physical method usually refers to the numerical weather prediction (NWP) model. The NWP model establishes a complex physical and mathematical model to simulate the changing process of wind by comprehensively considering meteorological and geographic factors such as temperature, humidity, air pressure, and terrain [31]. NWP models are usually used for weather forecasts in larger areas, and wind speed predictions are only part of it. There are two types of NWP models: global and regional models. An overview of NWP global models and NWP regional models, and all the commercial and operational wind power forecasting systems and their main features are provided in Ref. [26]. The physical method can reflect the essence of atmospheric motion so that the accuracy is higher. However, this method needs to process an extremely large amount of data and carry out complex calculations. There are extremely high requirements for computing power, which leads to significant hinders for ordinary researchers [32]. Meanwhile, due to the chaotic nature of the partial differential equations in the mathematical model, it is impossible to obtain an accurate solution, and the error will be multiplied with the increase of time. In light of this, NWP models are not suitable for short forecast times but more suitable for medium-term or long-term forecasting [8, 21]. In recent research works, the forecasting periods are generally focussed on very short-term or short-term predictions [13], so the applications of NWP are less. The conventional statistics method uses the collected wind speed time-series data to deliver predictions. After many years of development, there have been many statistical models for wind speed forecasting. Poggi et al. [33] started to utilize auto-regressive (AR) to simulate wind speed time series, and Nielsen et al. [34] used quantile regression (QR) to make predictions independently. In order to improve the forecasting performance, many auto-regressive moving average models have been developed [35-38]. In addition, numerous AR-based models have also been developed for wind speed prediction, such as vector auto-regressive [39], auto-regressive with exogenous input (ARX) [40], auto-regressive conditional heteroskedasticity [41, 42], auto-regressive integrated moving average (ARIMA) [43-46], seasonal ARIMA [47], fractional ARIMA [48], and ARFIMA [49]. In order to improve the accuracy of prediction and the robustness of the model, researchers have also developed many hybrid models based on the ARIMA model, such as WT-ARIMA [50], RWT-ARIMA [43], and VMD-ARIMA [51]. However, these models only analyse the superficial relationship between the variables in the time series, and it is difficult to deal with the complicated and non-linear relationship. For obtaining more satisfactory prediction results, numerous non-linear statistic models have been proposed [52]. Zhang et al. [53] combined AR and Gaussian process regression (GPR) to improve prediction accuracy. In Karakucs et al. [54], polynomial auto-regressive is proposed, which is a non-linear model with linear parameters. Due to the non-linear term of the Hammerstein model, the Hammerstein auto-regressive model is superior to the ARIMAs [55]. Some enhanced models such as smooth transition auto-regressive, self-exciting threshold auto-regressive [56] and Markov switching auto-regressive [57] have also been proposed. Furthermore, the researchers also used some unusual models, for example, non-linear auto-regressive with exogenous input [58], generalized auto-regressive conditional heteroskedasticity (GARCH) [59], multiple-kernel relevance vector regression [60], threshold seasonal auto-regressive conditional heteroscedasticity [61], Bayesian-based adaptive robust multi-kernel regression [62]. However, with the increasing complexity of time-series data, it is not easy to meet the requirements of prediction accuracy because traditional statistic models have little ability to extract the features of data. The shallow machine learning methods include neural networks with a couple of layers. Marugan et al. [63] summarize most of the shallow neural network models. Compared with the persistence method and the traditional statistical method, the shallow machine learning method has higher prediction accuracy and better effect in practice. Nevertheless, these models can only learn the shallow features in the wind time-series data and need extensive feature engineering [64]. Deep learning is a machine learning method for deep network architecture. The characteristics of input data are learnt through a computational model composed of multiple non-linear processing layers. Compared with shallow machine learning models and traditional statistical models, deep learning methods can extract more abstract and hidden features in data, so as to obtain better accuracy in prediction tasks. The effectiveness and accuracy of prediction models based on deep learning have been widely recognized. 3 DEEP-LEARNING-BASED WIND FORECASTING There are usually three steps in wind speed prediction: wind energy data processing, predictor prediction and model performance evaluation. The deep neural network (DNN) is generally used as a feature extractor and a predictor. At present, many DNNs have been applied to wind forecasting. The basic prediction structures based on deep learning mainly include recurrent neural network (RNN), convolutional neural network (CNN), restricted Boltzmann machine (RBM) and so on. Additionally, there are some other deep networks such as generative adversarial network, extreme learning machine (ELM), stacked auto-encoder (SAE), stacked denoising auto-encoders (SDAE) etc. Table 1 provides the summary of models with long short-term memory (LSTM) predictor, and Table 2 shows the summary of other forecasting models. RNN-based models RNN originated from a feed-forward neural network. Unlike conventional feed-forward neural networks, it adopts a cyclic connection structure that reuses the calculation result of the previous iteration of the loop, gaining a memory function [65]. RNN has a great learning advantage for the non-linear characteristics of sequence data. 3.1.1 Models with long short-term memory predictor LSTM network is designed to solve the vanishing gradient problem that occurs when RNN learns sequences with long-term dependence [66]. Compared to the simple structure of RNN, LSTM is far more complicated. Due to its versatility, its principle will not be introduced in detail. It consists of input gate it, forget gate ft, update gate gt and output gate ot. Figure 3 illustrates a single-LSTM cell. The calculation formulas of LSTM are as follows: i t = σ W i h t − 1 , x t + b i (4) f t = σ W f h t − 1 , x t + b f (5) g t = tanh W g h t − 1 , x t + b g (6) o t = σ W o h t − 1 , x t + b o (7) c t = c t − 1 ⊙ f t + g t ⊙ i t (8) h t = tanh c t ⊙ o t (9)where Wi,f,g,o is the weight matrices, bi,f,g,o is the bias vectors, ct is the memory cell, and σ is the sigmoid activation function. FIGURE 3Open in figure viewerPowerPoint The structure of long short-term memory Wu et al. [67] adopted a CNN to extract features and then used LSTM for short-term prediction. However, there are shortcomings such as long training time and insufficient prediction accuracy. For optimizing the performance of LSTM, researchers have also made many improvements on the basis of it. Extending the LSTM cell through peephole connections solves the problem that when the LSTM closes the output gate, the gate cannot obtain any information from the output of the storage unit, bringing better prediction effects [68]. Yu et al. [69] proposed LSTM-EFG, which enhances the effect of forgetting the door and improves the activation function. The shared weight long short-term memory network model is introduced to reduce the training time and the variables that need to be optimized [70]. For further controlling the over-fitting problem of LSTM, Eze et al. [71] designed an oLSTM model based on the mixed regularization of LSTM and dropout. The proposed model is an energy-based regression method that captures the cooperative adaptation of input variables. This method can effectively control the vanishing gradient problem of mapping input and output wind data. An LSTM-Ms model was designed to use feed-forward neural networks to construct rougher time-scale sequences than the original model and then used LSTM to process these sequences [72]. Through LSTM-Ms, it is easier to learn the long-term dependence of wind speed sequences. Pei et al. [73] proposed an EWT-NCULSTM. Compared with the traditional LSTM, the proposed model combines the input gate and the forget gate as an update gate and improves the update method of the storage unit with reference to the gated recurrent unit (GRU). The empirical wavelet transform (EWT) strategy is employed to decompose wind speed data to achieve the purpose of noise reduction. After that, the new cell update long short-term memory network model is adopted to predict each sub-sequence and lastly sum up to get the final result. Many methods only consider the correlation of meteorological factors but do not consider their causality. Zhang et al. [74] employed a new method, namely long short-term memory network based on neighbourhood gates (NLSTM), which dynamically adjusts the network structure according to the specific equivalent tree causality to handle the complex causality in wind speed prediction, thereby improving the accuracy of prediction. Excessive stacking of LSTM units may lead to a decrease in training accuracy and efficiency. Lopez et al. [75] found a better starting point for training by evaluating a number of instances and using these output signals to perform a ridge regression to obtain the output layer weights. Generally, the high-frequency wind speed sub-series has short-term dependence, whereas the low-frequency sub-series has short-term and long-term dependence. Liu et al. [12] proposed models with different characteristics to predict sub-sequence with different frequencies are more likely to achieve the satisfying result. To further improve the accuracy of predictions, researchers have developed many hybrid models. The basic idea is to use various signal processing and analysis methods to refine the input data, and then use one or more predictors to make predictions. Qu et al. [76] employed a principal components analysis (PCA) to extract valid information from NWP and input it into LSTM for prediction. It is proposed that Adaptive LSTM uses the Pearson analysis to extract strong correlation factors and input them into LSTM for prediction [77]. Huang et al. [78] designed an EEMD-GPR-LSTM method, where ensemble empirical mode decomposition (EEMD) is adopted to decompose the original data of the wind speed. Afterwards, the LSTM and GPR methods are used to predict the inherent mode functions, respectively. Finally, determine the weight of the two prediction results by the variance-covariance method and provide combined prediction results. Liu et al. [79] designed a new hybrid model that mixes two RNNs. The proposed EWT-LSTM-Elman model uses EWT to get multiple sub-signals and uses LSTM to predict low-frequency sub-signals and ElmanNN to predict high-frequency sub-signals. The experimental results are satisfactory. Li et al. [80] adopted MM to process the wind speed sequence into a stationary long-term baseline and a non-stationary short-term residue and then use LSTM to make predictions. Liu et al. [81] introduced a DWT-LSTM model for short-term wind power forecasting. The DWT is utilized to handle the non-stationary time series into multiple highly stationary components, then use LSTM to independently predict each component and finally obtain the final prediction result by linearly summing the prediction values of each component. Liu et al. [82] proposed a deep architecture SDAE-LSTM with feature selection. In this model, a feature selection framework based on mutual information was first developed to determine the most suitable input for the prediction model. Then, the authors used SDAE to capture the inherent features contained in the original data and used LSTM to output the results. Wu et al. [83] proposed a DBSCAN-SDAE-LSTM model, which first selected representative training samples from NWP data by density-based spatial clustering of applications with noise (DBSCAN), used SDAE together with batch normalization for deep feature extraction and finally utilized LSTM for prediction. Liu et al. [84] utilized wavelet packet decomposition (WPD) to process the original data into two levels of high and low frequencies, 1D-CNN is adopted to predict high-frequency sub-sequences, and low-frequency sub-sequences is predicted by CNNLSTM, forming a WPD-LSTMCNN-CNN hybrid architecture. Li et al. [85] developed a combined EWT-LSTM-RELM-IEWT model. Unlike other models, the hybrid model used regularized extreme learning machine to model the error sequence of each sub-signals and adopted an inverse empirical wavelet transform (IEWT) to construct the final prediction sequence and filter outliers. Jaseena et al. [86] proposed an SAE-LSTM model, which made use of SAE to recognize the deep features of the input series and then employed StackedLSTM to make predictions. Moreno et al. [87] proposed a four-step forecasting framework: (1) AM-FM demodulation; (2) VMD-SSA (singular spectrum analysis) decomposition; (3) Ensemble forecasting and reconstruction; (4) Model accuracy verification. The literature considered preliminary prediction errors and proposed CEEMDAM-error-VMD-LSTM that used a multi-step decomposition prediction strategy. First of all, the original data is processed into sub-sequences and residual sequences by complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) algorithm and then each sequence is predicted using LSTM. The error sequence is obtained by subtracting the original sequence prediction result and the original observation value. Variational mode decomposition (VMD