A HYBRID LEAST SQUARES SUPPORT VECTOR MACHINE WITH BAT AND CUCKOO SEARCH ALGORITHMS FOR TIME SERIES FORECASTING

Least Squares Support Vector Machine (LSSVM) has been known to be one of the effective forecasting models. However, its operation relies on two important parameters (regularization and kernel). Pre-determining the values of parameters will affect the results of the forecasting model; hence, to find the optimal value of these parameters, this study investigates the adaptation of Bat and Cuckoo Search algorithms to optimize LSSVM parameters. Even though Cuckoo Search has been proven to be able to solve global optimization in various areas, the algorithm leads to a slow convergence rate when the step size is large. Hence, to enhance the search ability of Cuckoo Search, it is integrated with Bat algorithm that offers a balanced search between global and local. Evaluation was performed separately to further analyze the strength of Bat and Cuckoo Search to optimize LSSVM parameters. Five evaluation metrics were utilized; mean average percent error (MAPE), accuracy, symmetric mean absolute percent error (SMAPE), root mean square percent error (RMSPE) and fitness value. Experimental results on diabetes forecasting demonstrated that the proposed BAT-LSSVM and CUCKOO-LSSVM generated lower MAPE and SMAPE, at the same time produced higher accuracy and fitness value compared to particle swarm optimization (PSO)-LSSVM and a non-optimized LSSVM. Following the success, this study has integrated the two algorithms to better optimize the LSSVM. The newly proposed forecasting algorithm, termed as CUCKOO-BAT-LSSVM, produces better forecasting in terms of MAPE, accuracy and RMSPE. Such an outcome provides an alternative model to be used in facilitating decision-making in forecasting.


INTRODUCTION
Time Series Forecasting is a machine learning field, where it uses historical data to build a model before utilizing it to predict future observations. Technically, it can be defined as "A time series is a set of observations x t , each one being recorded at a specific time" (Brockwell & Davis, 2002). Least squares support vector machine termed as LSSVM is a machine learning technique that is widely used in forecasting. LSSVM is different from support vector machine (SVM); SVM offers quadratic programming with inequality constraints, in contrast, LSSVM exhibits a system of a linear equations with equality constraints (Suykens, Gestel, Brabanter, Moor, & Vandewalle, 2002). The performance of LSSVM is related to two important parameters: regularization parameter (Y) and kernel parameter (σ2). LSSVM is susceptible to a problem (of over fitting or under fitting) when the selection of parameters is inadequate. In literature, optimizing LSSVM hyper parameters encompasses two approaches: cross validation (CV) and theoretical technique. The first approach is incompetent due to the comprehensiveness of the parameters search whereas, the second approach embraces meta-heuristic search algorithms that perform well in most cases (Mustaffa, Yusof, & Kamaruddin, 2014).
Based on the literature, meta-heuristic algorithms comprise two categories: local search-based meta-heuristic and population search-based meta-heuristic. Local search-based meta-heuristic algorithm focuses on a single solution and iterates to improve it, such as Simulated Annealing (Hu, Wang, & Ma, 2015) and Tabu Search (Yao, Hu, Zhang, & Jin, 2014). On the other hand, population search-based meta-heuristic algorithm randomly generates a set of solutions and chooses the optimal solution by evaluating them using Singh, & Chang, 2012) whereas Autoregressive Integrated Moving Average (Lee & Tong, 2011), K-nearest neighbor (Fan, Guo, Zheng, & Hong, 2019) and Bayesian (Thompson & Miller, 1986) are examples of the multivariate approach. Despite the statistical approach yielding acceptable estimates, they do not address the nonlinear characteristics of forecasting (Chan et al., 2013). Thus, artificial intelligence approaches such as Neural Network (NN) has been presented to overcome this shortcoming. NN has been employed in many fields and proven successful in generating high accuracy. Despite this, NN is very complicated as it requires estimation and may be trapped into local minima. To overcome this problem, there has been studies on integrating NN with metaheuristic algorithm (Ozerdem, Olaniyi, & Oyedotun, 2017).
On the other hand, machine learning offers the ability for learning computers without explicit programming (Samuel, 1959). Among the methods offered by machine learning are the SVM and LSSVM. SVM is a large margin algorithm that works to separate training samples into two classes by maximum margin hyper plane between classes. It is a powerful tool for large dimensional data. Since SVM is very effective, it has been used in many application domains. In SVM, two important parameters (i.e. the regularization and the kernel parameters) need to be correctly determined in order to minimize generalization error. Different parameter settings will affect the results of the prediction model (Mustaffa et al., 2014). The computational process of standard SVM relies on quadratic programming solvers which are difficult to perform and requires high computational cost. On the other hand, LSSVM approach solves a set of linear equations without quadratic programming solvers (Mustaffa, Sulaiman, Ernawan, & Noor, 2018;Mustaffa et al., 2014). LSSVM has good convergence and produces high accuracy which has led to its use in forecasting. However, LSSVM relies on the initialization of its parameters; regularization and kernel parameters, like SVM, so as to minimize generalization errors. Determining the best values for the parameters in order to generate the least error can be formulated as an optimization problem. Luo et al. (2008) proposed to tune the parameters of LSSVM by quantum-inspired evaluation algorithm (QEA) which is an example of evaluation algorithm used in various optimization applications. QEA can speed up the evolution by mutation steps that lead to diverse solutions. Hybrid models generate good accuracy as compared with LSSVM tuned by cross validation method with wavelet kernel and Gaussian kernel (Luo et al., 2008). Mustaffa et al. (2014) proposed LSSVM that is tuned by using Improved Artificial Bee Colony (IABC) for gasoline price forecasting. The MAPE and RMSPE results of the proposed method are better than LSSVM tuned by Artificial Bee Colony (ABC) and LSSVM tuned by Back-Propagation Neural Network. There were also studies on the use of Bat algorithm to optimize LSSVM (Hegazy et al., 2015;Soliman & Salam, 2014;Wu & Peng, 2015). Soliman and Salam (2014) proposed weekly stock price forecasting and the results indicated that their proposed method achieved better performance in terms of root mean squared error (RMSE), mean absolute error (MAE), SMAPE and percent mean relative error (PMRE) (Soliman & Salam, 2014). This is similar to Wu and Peng (2015) who proposed China wind power forecasting. Their results demonstrated higher accuracy and lower mean squared error (MSE), RMSE and MAE (Wu & Peng, 2015). A similar trend in results was obtained by Hegazy et al. (2015) where they compared five swarm algorithms namely Bat algorithm, Artificial Bee Colony, Flower Pollination algorithm, modified CS, and PSO to optimize LSSVM using stock historical data. The results showed that the Flower Pollination algorithm with LSSVM was a better method in terms of error rate (Hegazy et al., 2015).
In previous work, researchers (Mustaffa et al., 2018;Ong & Zainuddin, 2019;Sun & Sun, 2017) proved the success of the CS in different fields. Sun and Sun (2017) demonstrated the success of a hybrid model for concentration forecasting using Principle Component Analysis and LSSVM optimized by CS against LSSVM. Mustaffa et al. (2018) introduced a hybrid LSSVM with four swarm algorithms namely CS, Genetic Algorithm, Differential Evolution and Grey Wolf Optimizer for water level forecasting. All proposed algorithms were able to produce small MSE, RMSPE and high Theil's U value. Ong and Zainuddin (2019) proposed an improved CS algorithm for optimizing wavelet Neural Network. Zheng et al. (2018) developed PSO with mutation operation to optimize the parameters of constructed wavelet LSSVM and used it for forecasting dissolved gas. Zhang, Tan and Yang (2012) proposed a hybrid method for predicting electricity price by combining wavelet transformed with auto-regressive integrated moving average (ARIMA) while LSSVM was optimized by PSO.
Due to the success of Bat and CS algorithms in earlier studies, investigations on how they can help to tune LSSVM parameters is worth conducting. Nevertheless, the slow convergence rate of CS algorithm needs to be resolved and since Bat algorithm offers a balanced search, integration of the two optimization algorithms may benefit machine learning methods such as LSSVM.

Least Squares Support Vector Machine
In LSSVM, suppose a set of M points {x j , y j }, where x j means the input values while y j refers to the output values. LSSVM estimation function for nonlinear regression is shown in Equation 1 .
Due to the success of Bat and CS algorithms in earlier studies, investigations on ho help to tune LSSVM parameters is worth conducting. Nevertheless, the slow convergence algorithm needs to be resolved and since Bat algorithm offers a balanced search, integration optimization algorithms may benefit machine learning methods such as LSSVM.

Least Squares Support Vector Machine
In LSSVM, suppose a set of M points {x j , y j }, where x j means the input values while y j r output values. LSSVM estimation function for nonlinear regression is shown in Equation et al., 2002).
Where: w means the weight vector, φ(x j ) means nonlinear function, B means the bias term the error between the actual and the predicted output. The weight vector w and the bias ter Where: w means the weight vector, φ(x j ) means nonlinear function, B means the bias term, e j means the error between the actual and the predicted output. The weight vector w and the bias term b can be realized using optimizing function which is displayed in the following Equation 2 . (2) This equation is subject to equality constraints as defined in the following: Where: e j means the error variable, J(w,e) means the loss function, Γ means the adjustable constant. The Lagrangian multiplier function is applied to Equation 3 which yields, Where: α j means the Lagrangian multiplier, Γ means the regularization parameters. The conditions for optimality upper function of this problem can be generated through all derivatives set to equal zero, which is formulated using Equation 4.
Where: j=1, 2,…, M. After ignoring w and e j , the Karush-Kuhn-Tucker (KKT) conditions for optimality are transformed into a set of linear equations as shown in Equation 5.
The LSSVM model for function estimation is as shown in Equation 6.
Where: w means the weight vector, φ(x j ) means nonlinear function, B means the bias term, the error between the actual and the predicted output. The weight vector w and the bias term realized using optimizing function which is displayed in the following Equation 2 This equation is subject to equality constraints as defined in the following: Where: ej means the error variable, J(w,e) means the loss function, Γ means the adjustable The Lagrangian multiplier function is applied to Equation 3 which yields, Where: αj means the Lagrangian multiplier, Γ means the regularization parameters. The cond optimality upper function of this problem can be generated through all derivatives set to eq which is formulated using Equation 4.
Where: w means the weight vector, φ(x j ) means nonlinear function, B means the bias ter the error between the actual and the predicted output. The weight vector w and the bias te realized using optimizing function which is displayed in the following Equation 2 This equation is subject to equality constraints as defined in the following: Where: ej means the error variable, J(w,e) means the loss function, Γ means the adjustab The Lagrangian multiplier function is applied to Equation 3 which yields, Where: αj means the Lagrangian multiplier, Γ means the regularization parameters. The co optimality upper function of this problem can be generated through all derivatives set to which is formulated using Equation 4.
Where: w means the weight vector, φ(x j ) means nonlinear function, B means the bias term, the error between the actual and the predicted output. The weight vector w and the bias term This equation is subject to equality constraints as defined in the following: Where: ej means the error variable, J(w,e) means the loss function, Γ means the adjustable The Lagrangian multiplier function is applied to Equation 3 which yields, Where: αj means the Lagrangian multiplier, Γ means the regularization parameters. The cond optimality upper function of this problem can be generated through all derivatives set to eq which is formulated using Equation 4.
Where: w means the weight vector, φ(x ) means nonlinear function, B means the bias term, the error between the actual and the predicted output. The weight vector w and the bias term This equation is subject to equality constraints as defined in the following: Where: ej means the error variable, J(w,e) means the loss function, Γ means the adjustable The Lagrangian multiplier function is applied to Equation 3 which yields, Where: αj means the Lagrangian multiplier, Γ means the regularization parameters. The cond optimality upper function of this problem can be generated through all derivatives set to eq which is formulated using Equation 4. ] The LSSVM model for function estimation is as shown in Equation 6.
The LSSVM model for function estimation is as shown in Equation 6.
Where: αj, b are the solutions for a linear system in Equation 5, K(x,xj) is the kernel function study, the kernel function is the radial basis function (RBF) kernel as defined in the fo The LSSVM model for function estimation is as shown in Equation 6.
Where: αj, b are the solutions for a linear system in Equation 5, K(x,xj) is the kernel function.
study, the kernel function is the radial basis function (RBF) kernel as defined in the fol Where: α j , b are the solutions for a linear system in Equation 5, K(x,x j ) is the kernel function. In this study, the kernel function is the radial basis function (RBF) kernel as defined in the following Equation 7.

(7)
In RBF, σ is a tuning parameter where such value and the value of γ regularization parameter in Equation 2 need to be optimized in order to minimize the generalization error. This study proposes the optimization of the two parameters using population meta-heuristic algorithms.

Bat Algorithm
Bat algorithm is a meta-heuristic algorithm. It was formulated by Yang (2010) and has been efficiently identified as a global solution. Bats utilize echolocation to sense distance; besides, they have positions (Xj) that represent the solutions, velocities (Vj), frequencies (Fj) that represent the objective function, loudness (A) and pulse rate (r). All bats move randomly and update their frequencies, velocities and positions based on Equations 8, 9 and 10, respectively as shown in the following (Yang, 2010).
Where: β is a random number between 0 and 1. X* is the best position (best solution) among all bats. The pseudo-code of basic Bat algorithm is shown in Algorithm 1.

Algorithm 1: Basic Bat algorithm (Yang, 2010)
Input : Determine the number of Bats (n), Randomly, determine the initialized position (X n ), Randomly, determine the initialized velocity (V n ). Determine the frequencies Fn, pulse rate r, loudness A.

Output : Return the best solution
Step 1 : Evaluate each Bat and find best Step 2 : While not stopping criterion do Step 3 : For j = 1 to n do j 1

(6)
Where: αj, b are the solutions for a linear system in Equation 5, K(x,xj) is the kernel function. study, the kernel function is the radial basis function (RBF) kernel as defined in the fol In RBF, σ is a tuning parameter where such value and the value of γ regularization param Equation 2 need to be optimized in order to minimize the generalization error. This study propo optimization of the two parameters using population meta-heuristic algorithms.

Bat Algorithm
Bat algorithm is a meta-heuristic algorithm. It was formulated by Yang (2010) and has been effi identified as a global solution. Bats utilize echolocation to sense distance; besides, they have po (Xj) that represent the solutions, velocities (Vj), frequencies (Fj) that represent the objective fu loudness (A) and pulse rate (r). All bats move randomly and update their frequencies, velocit positions based on Equations 8, 9 and 10, respectively as shown in the following (Yang, 2010).
In RBF, σ is a tuning parameter where such value and the value of γ regular Equation 2 need to be optimized in order to minimize the generalization error. Thi optimization of the two parameters using population meta-heuristic algorithms.

Bat Algorithm
Bat algorithm is a meta-heuristic algorithm. It was formulated by Yang (2010) and identified as a global solution. Bats utilize echolocation to sense distance; besides (Xj) that represent the solutions, velocities (Vj), frequencies (Fj) that represent th loudness (A) and pulse rate (r). All bats move randomly and update their freque positions based on Equations 8, 9 and 10, respectively as shown in the following (Y (1 study, the kernel function is the radial basis function (RBF) kernel as defined In RBF, σ is a tuning parameter where such value and the value of γ regularizat Equation 2 need to be optimized in order to minimize the generalization error. This st optimization of the two parameters using population meta-heuristic algorithms.

Bat Algorithm
Bat algorithm is a meta-heuristic algorithm. It was formulated by Yang (2010) and ha identified as a global solution. Bats utilize echolocation to sense distance; besides, the (Xj) that represent the solutions, velocities (Vj), frequencies (Fj) that represent the ob loudness (A) and pulse rate (r). All bats move randomly and update their frequencie positions based on Equations 8, 9 and 10, respectively as shown in the following (Yan study, the kernel function is the radial basis function (RBF) kernel as defined i In RBF, σ is a tuning parameter where such value and the value of γ regularizati Equation 2 need to be optimized in order to minimize the generalization error. This stu optimization of the two parameters using population meta-heuristic algorithms.

Bat Algorithm
Bat algorithm is a meta-heuristic algorithm. It was formulated by Yang (2010) and has identified as a global solution. Bats utilize echolocation to sense distance; besides, the (Xj) that represent the solutions, velocities (Vj), frequencies (Fj) that represent the ob loudness (A) and pulse rate (r). All bats move randomly and update their frequencie positions based on Equations 8, 9 and 10, respectively as shown in the following (Yang Algorithm 1: Basic Bat algorithm (Yang, 2010) Step 4 : Generate new solution by adjusting frequency, update velocity, and update position using Equations 8, 9, and 10 Step 5 : If (rand > r) Step 6 : Generate a solution around the best solution Step 7 : End if Step 8 : If (rand<A & F(Xj) < F(X*)) Step 9 : Replace the current solution with new solution Step 10 : Replace the current fitness with new fitness Step 11 : End if Step 12 : End while

Cuckoo Search
Prior to Bat algorithm, Yang (2010) has also designed an optimization algorithm inspired by the cuckoo species, with their obligated brood parasitism by laying their eggs in the nests of other species or hosts. There are three rules in which the CS algorithm is based; the first rule is that the appropriate place (i.e. the nest) for the egg is chosen randomly. Second, the bird puts only one egg at a time and finally, one of the best nests is chosen for the second generation. The next step is to determine the best nest and the best eggs to be used as input data for future generations.
The available bird nests of hosts is ascertained by a pre-determined value while the detection rate of the cuckoo's egg(s) by host birds takes a probability [0; 1]. If an exotic egg is discovered, the host bird either eliminates the egg or leaves the nest and builds a new nest. The pseudo-code of basic CS is shown in Algorithm 2 (Hegazy et al., 2015;Shehab, Khader & Laouchedi, 2018: Yang, 2010.

Algorithm 2: Basic Cuckoo Search (Yang, 2010)
Input : Determine the initial population of (n), and the hosts' nests (X j ), objective function f(x) Output: Return the best solution Step 1 : While not stopping criterion do Step 2 : For j = 1 to n do Step 3 : Generate new solution by selecting a cuckoo randomly, generate a solution and then evaluate it by objective function Step 4 : Randomly select a nest (i) Step 5 : If (Fj > Fi) (continued) Algorithm 2: Basic Cuckoo Search (Yang, 2010) Step 6 : Update the solution with new one i 7: End if Step 8: A rate (pa) of poor-quality nests are ignored and new nests are generated Step 9: Find the best solution by ranking the solutions Step 10: End while

Particle Swarm Optimization
PSO is the behavior studied in bird flocking. PSO was first introduced by Step Kennedy and Ebehart (1995) where each single solution represents a bird (it is called a particle) in search space. A particle has initial random position vector (Xi) and velocity (Vi) in search area. The best position is determined by the objective function that measures the cost of position. In each iteration, each particle updates the velocity of the particle at time (t) based on Equation 11 and calculates a new position of the particle at time (t) depending on Equation 12. The pseudo code of a basic PSO is shown in Algorithm 3 (Yang, 2010).

Output: Return the best solution
Step 1 : Evaluate each particle and find pbest and gbest Step 2 : While not stopping criterion do Step 3 : For i = 1 to number of particles do Step 4 : Calculate the fitness function f Step 5 : Update personal best and global best of each particle Step 6 : Update velocity of the particle by using Equation 11 Step 7 : Update the position of the particle using Equation 12 Step 8 : End for Step 9 : End while

PROPOSED METHODS
This section includes the discussion on the three optimized LSSVMs; BAT-LSSVM, CUCKOO-LSSVM and PSO-LSSVM. The process flow and Step 9: Find the best solution by ranking the solutions Step 10: End while

Particle Swarm Optimization
PSO is the behavior studied in bird flocking. PSO was first introduced by Kennedy and Ebehart where each single solution represents a bird (it is called a particle) in search space. A partic initial random position vector (Xi) and velocity (Vi) in search area. The best position is determi the objective function that measures the cost of position. In each iteration, each particle upda velocity of the particle at time (t) Algorithm 3: Basic PSO (Yang, 2010) Input: Determine the number of particles (n). Randomly, determine the initialized position (Xn), Randomly; determine the initialized velocity (Vn).

Output: Return the best solution
Step 1: Evaluate each particle and find pbest and gbest Step 2: While not stopping criterion do Step 3: For i = 1 to number of particles do Step 4: Calculate the fitness function f Step 5: Update personal best and global best of each particle Step 6: Update velocity of the particle by using Equation 11 Step 7: Update the position of the particle using Equation 12 Step 8: End for Step 9: End while

PROPOSED METHODS
This section includes the discussion on the three optimized LSSVMs; BAT-LSSVM, CUC Step 9: Find the best solution by ranking the solutions Step 10 Algorithm 3: Basic PSO (Yang, 2010) Input: Determine the number of particles (n). Randomly, determine the initialized position (Xn), Randomly; determine the initialized velocity (Vn).

Output: Return the best solution
Step 1: Evaluate each particle and find pbest and gbest Step 2: While not stopping criterion do Step 3: For i = 1 to number of particles do Step 4: Calculate the fitness function f Step 5: Update personal best and global best of each particle Step 6: Update velocity of the particle by using Equation 11 Step 7: Update the position of the particle using Equation 12 Step 8: End for Step 9: End while

PROPOSED METHODS
This section includes the discussion on the three optimized LSSVMs; BAT-LSSVM, CU performance comparisons among the three methods (i.e., BAT-LSSVM, CUCKOO-LSSVM and PSO-LSSVM) are presented and conducted, respectively. Then a detailed procedure in combining two algorithms (i.e., Bat and CS) to optimize LSSVM is presented. The purpose of this comparison is to evaluate and determine which combination yielded the highest accuracy. Performance is measured using five metrics; MAPE (Yusof et al., 2015) as shown in Equation 13, accuracy (Yusof, Kamaruddin, Husni, Ku-Mahamud, & Mustaffa, 2013)

BAT-LSSVM
In order to employ Bat algorithm to optimize the hyper-parameters of LSSVM (σ, γ), the value of each pair of parameters (σ, γ), is a potential solution in the search area with restricted boundaries, where, the frequency of each bat is represented by objective function, MAPE. The objective function evaluates the solutions that are generated randomly in the beginning, then, the bat with the minimum or maximum frequency (depending on the problem) will be identified. New solutions are generated by adjusting frequency, velocity and position. If a stopping criterion is satisfied, the best solution is obtained, where; it has the lowest MAPE. The process flow of proposed BAT-LSSVM is shown in Algorithm 4. .

BAT-LSSVM
In order to employ Bat algorithm to optimize the hyper-parameters of LSSVM (σ, γ), the value pair of parameters (σ, γ), is a potential solution in the search area with restricted boundaries, . .

BAT-LSSVM
In order to employ Bat algorithm to optimize the hyper-parameters of LSSVM (σ, γ), the value pair of parameters (σ, γ), is a potential solution in the search area with restricted boundaries, .

BAT-LSSVM
In order to employ Bat algorithm to optimize the hyper-parameters of LSSVM (σ, γ), the value o pair of parameters (σ, γ), is a potential solution in the search area with restricted boundaries,

BAT-LSSVM
In order to employ Bat algorithm to optimize the hyper-parameters of LSSVM (σ, γ), the value of ea pair of parameters (σ, γ), is a potential solution in the search area with restricted boundaries, whe the frequency of each bat is represented by objective function, MAPE. The objective functi evaluates the solutions that are generated randomly in the beginning, then, the bat with the minimu or maximum frequency (depending on the problem) will be identified. New solutions are generated adjusting frequency, velocity and position. If a stopping criterion is satisfied, the best solution obtained, where; it has the lowest MAPE. The process flow of proposed BAT-LSSVM is shown Algorithm 4.

Output: Return the best solution
Step 1 : Initialize the LSSVM model using generated solution.
Step 3 : Evaluate the LSSVM model using Equation 13 Step 4 : Frequencies, Fn, at xi is determined by objective function, f(xi): Step 5 : Evaluate each Bat and find best F(X*) Step 6 : While not stopping criterion do Step 7 : For j = 1 to n do Step 8 : Generate new solution by adjusting frequency, update velocity, update position using Equations 8, 9, and 10 Step 9 : If (rand > r) Step 10: Generate a solution around the best solution Step 11: End if Step 12: Train the LSSVM model with new solution Step 13: Evaluate new solutions and update F(Xj) Step 14: If (rand<A & F(Xj) > F(X*)) Step 15: Replace the current solution with new solution Step 16: Replace the best F(X*) Step 17: End if Step 18: End while

CUCKOO-LSSVM
In optimizing the hyper-parameters of LSSVM (σ, γ) using Cuckoo Search algorithm, the solutions of algorithm are depicted by a pair of parameters (σ, γ), where they represent the nest. In this study, each nest evaluates by objective function, where the MAPE is used as an objective function. In the beginning process of the CUCKOO-LSSVM, initial solutions are generated randomly; then, the solutions are trained by the LSSVM model and the LSSVM model is then evaluated using MAPE. If a stopping criterion is not satisfied, the generation of new solutions and evaluation continues until the conditions are met. After that, the best solution is chosen which holds the minimum value of MAPE (i.e. maximum value of fitness). The process flow of the proposed CUCKOO-LSSVM is shown in Algorithm 5. 9 thm to optimize the hyper-parameters of LSSVM (σ, γ), the value of each potential solution in the search area with restricted boundaries, where, is represented by objective function, MAPE. The objective function re generated randomly in the beginning, then, the bat with the minimum nding on the problem) will be identified. New solutions are generated by and position. If a stopping criterion is satisfied, the best solution is west MAPE. The process flow of proposed BAT-LSSVM is shown in BAT-LSSVM er of Bats (n). Randomly, determine the initialized position (Xn), where, xi eters of LSSVM (σ, γ). Randomly, determine the initialized velocity (Vn). Fn, pulse rate r, loudness A. lution M model using generated solution.
odel. model using Equation 13 xi is determined by objective function, f(xi):

Output : Return the best solution
Step 1 : Initialize the LSSVM model using generated solution.
Step 3 : Evaluate the LSSVM model using Equation 13 Step 4 : Determine the fitness, Fn, at xj is determined by objective function, f(xj): Step 5 : Evaluate each nest Step 6 : While not stopping criterion do Step 7 : Find best nest F(X*), which represents maximum value.
Step 8 : Choose a nest randomly xi and avoid best nest.
Step 9 : Train the LSSVM model, evaluate new nest by objective function, and find fitness F(xi) Step 10 : If (F(xi)>=F(X*)) Step 11 : Replace the current solution with new solution Step 12 : Replace the best F(X*) Step 13 : End if Step 14 : If (rand>pa) Step 15 : Replace some of the worse nests.
Step 16 : End if Step 17 : End while

PSO-LSSVM
In order to employ PSO in optimizing the hyper-parameters of LSSVM (σ, γ), the value of each pair of parameters (σ, γ) is determined by the value of swarm position in the search area. The position of each swarm changes as it depends on the velocity of each swarm and the change of best swarm (pbest) and global swarm (gbest). The evaluation of each position is based on objective function that is formulated using MAPE. The process of LSSVM model training continues using the new solutions (positions) until the stopping conditions are met. Then, the best solution is chosen and stored. The process flow of the proposed PSO-LSSVM is shown in Algorithm 6. 10 are met. After that, the best solution is chosen which holds the minimum value of MAPE um value of fitness). The process flow of the proposed CUCKOO-LSSVM is shown in .

thm 5: Proposed CUCKOO-LSSVM
Determine the initial population of (n), and the hosts' nests (Xj), where, Xj represents the arameters of LSSVM (σ, γ). Determine the rate (pa), Randomly, generate a solution. : Return the best solution Initialize the LSSVM model using generated solution. Train the LSSVM model. Evaluate the LSSVM model using Equation 13 Determine the fitness, Fn, at xj is determined by objective function, f(xj):

Output: Return the best solution
Step 1 : Initialize the LSSVM model using generated solution.
Step 3 : Evaluate the LSSVM model using Equation 13 Step 4 : Frequencies, Fn, at xi is determined by objective function, f(xi): Step 5 : Evaluate each swarm and find pbest and gbest Step 6 : While not stopping criterion do Step 7 : For j = 1 to n do Step 8 : Update personal best and global best of each swarm Step 9 : Update velocity of the particle by using Equation 11 Step 10 : Update the position of the particle using Equation 12 Step 11 : Train the LSSVM model with new solution.
Step 12 : Evaluate new solutions, and calculate the fitness function f Step 13 : Update personal best and global best of each swarm Step 14 : End while

BAT-CUCKOO-LSSVM
In this integration, two algorithms are used to optimize LSSVM hyperparameters (σ, γ). First, the Bat algorithm is employed, and its outcome is used as the input for CS. The value of each pair of parameters (σ, γ) is a potential solution in the search area with restricted boundaries, where, the frequency of each bat is represented by objective function (i.e., MAPE). The objective function is used to evaluate solutions which are randomly generated. This is followed by determining the best bat (i.e., the one with minimum frequency). New solutions are produced by adjusting the frequency, velocity, and position of the bats. The process is repeated until the stopping criterion is fulfilled (i.e., the smallest MAPE). The solutions produced by the Bat algorithm are passed to CS as initial solutions, where each of the solution is considered as a nest. Each nest is evaluated using the objective function (also based on MAPE). Then, the solutions are trained by LSSVM where the model with the smallest MAPE is stored for prediction purposes. Similarly, it all relies on the achievement of the stopping criterion. Generations of new solutions and evaluations continue until the stopping condition is met. The process flow of the proposed BAT-CUCKOO-LSSVM is shown in Algorithm 7. 13 s not met, generations of new solutions and evaluations continue. After that, the solutions nes with the smallest MAPE) will be used to represent potential solutions in the search area orithm. The standard Bat operation is then followed until the stopping criteria is fulfilled. ss flow of the proposed CUCKOO-BAT-LSSVM is shown in Algorithm 8.

rithm 8: Proposed CUCKOO-BAT-LSSVM
: Determine the initial population of (n), and the host nests (Xj), where, Xj represents the hypereters of LSSVM (σ, γ). Determine the rate (pa). Randomly, generate a solution. ut: Return the solutions that include the best one.
: Initialize the LSSVM model using generated solution.
: Evaluate the LSSVM model using Equation 13 : Determine the fitness, Fn, at xj is determined by objective function, f(xj):  (σ, γ). Randomly, determine the initialized velocity (Vn). Determine the frequencies Fn, pulse rate r, loudness A. Output : Return the solutions that include the best.
Step 1 : Initialize the LSSVM model using generated solution.
Step 3 : Evaluate the LSSVM model using Equation 13 Step 4 : Frequencies, Fn, at xi is determined by objective function, f(xi): Step 5 : Evaluate each Bat and find the best F(X*) Step 6 : While not stopping criterion do Step 7 : For j = 1 to n do Step 8 : Generate new solutions by adjusting frequency, update velocity, update position using Equations 8, 9, and 10 Step 9 : If (rand > r) Step 10 : Generate a solution around the best solution Step 11 : End if Step 12 : Train the LSSVM model with new solutions.
Step 13 : Evaluate new solutions and update F(Xj) Step 14 : If (rand<A & F(Xj) > F(X*)) Step 15 : Replace the current solution with new solution Step 16 : Replace the best F(X*) Step 17 : End if Step 18 : End while Input : Determine the initial population of (n), and the host nests (Xj), where, Xj represents the hyper-parameters of LSSVM (σ, γ) that gained from the Bat algorithm. Determine the rate (pa) Output : Return the best solution Step 1 : Nests equal gained solutions.
Step 2 : Evaluate each nest Step 3 : While not stopping criterion do Step 4 : Find best nest F(X*), which represents maximum value.
Step 5 : Choose a nest randomly xi and avoid best nest.
Step 6 : Train the LSSVM model, evaluate new nest by objective function, and find fitness F(xi) Step 7 : If (F(xi)>=F(X*)) Step 8 : Replace the current solution with new solution Step 9 : Replace the best F(X*) Step 10 : End if Step 11 : If (rand>pa) Step 12 : Replace some of the worse nest.
Step 13 : End if Step 14 : End while 13 s with the smallest MAPE) will be used to represent potential solutions in the search area thm. The standard Bat operation is then followed until the stopping criteria is fulfilled. flow of the proposed CUCKOO-BAT-LSSVM is shown in Algorithm 8.

m 8: Proposed CUCKOO-BAT-LSSVM
etermine the initial population of (n), and the host nests (Xj), where, Xj represents the hyperrs of LSSVM (σ, γ). Determine the rate (pa). Randomly, generate a solution. Return the solutions that include the best one. itialize the LSSVM model using generated solution. rain the LSSVM model. valuate the LSSVM model using Equation 13 etermine the fitness, Fn, at xj is determined by objective function, f(xj): valuate each nest hile not stopping criterion do ind best nest F(X*), which represents maximum value. hoose a nest randomly xi and avoid best nest. rain the LSSVM model, evaluate new nest by objective function, and find fitness F(xi) If (F(xi)>=F(X*)) Replace the current solution with new solution Replace the best F(X*) End if If (rand>pa) Replace some of the worse nests. End if End while etermine the number of Bats (n). Determine the initialized position (Xn), where, xi represents r-parameters of LSSVM (σ, γ) that is gained from the Cuckoo algorithm. Randomly, e the initialized velocity (Vn). Determine the frequencies Fn, pulse rate r, loudness A. Return the best solution valuate each Bat and find best F(X*) hile not stopping criterion do or j = 1 to n do enerate new solution by adjusting frequency, update velocity, update position using quations 8, 9, and 10 (rand > r) enerate a solution around the best solution nd if rain the LSSVM model with new solution. valuate new solutions and update F(Xj) If (rand<A & F(Xj) > F(X*)) Replace the current solution with new solution

CUCKOO-BAT-LSSVM
As for the CUCKOO-BAT-LSSVM, the operation starts with CS. Cuckoo Search solutions are depicted by a pair of parameters (σ, γ), that indicates the nest. Each nest is evaluated using objective function (i.e., MAPE). Similar to the earlier process in BAT-CUCKOO-LSSVM, initial solutions are generated randomly. Then, they are trained by LSSVM and evaluated using MAPE. If the stopping criterion is not met, generations of new solutions and evaluations continue. After that, the solutions (i.e., the ones with the smallest MAPE) will be used to represent potential solutions in the search area of Bat algorithm. The standard Bat operation is then followed until the stopping criteria is fulfilled. The process flow of the proposed CUCKOO-BAT-LSSVM is shown in Algorithm 8.

Input
: Determine the initial population of (n), and the host nests (Xj), where, Xj represents the hyper-parameters of LSSVM (σ, γ). Determine the rate (pa). Randomly, generate a solution. Output : Return the solutions that include the best one.
Step 1 : Initialize the LSSVM model using generated solution.
Step 3 : Evaluate the LSSVM model using Equation 13 Step 4 : Determine the fitness, Fn, at xj is determined by objective function, f(xj): Step 5 : Evaluate each nest Step 6 : While not stopping criterion do Step 7 : Find best nest F(X*), which represents maximum value.
Step 8 : Choose a nest randomly xi and avoid best nest.
Step 9 : Train the LSSVM model, evaluate new nest by objective function, and find fitness F(xi) Step 10 : If (F(xi)>=F(X*)) Step 11 : Replace the current solution with new solution Step 12 : Replace the best F(X*) Step 13 : End if Step 14 : If (rand>pa) Step 15 : Replace some of the worse nests.
Step 16 : End if Step 17 aluate each nest hile not stopping criterion do nd best nest F(X*), which represents maximum value. hoose a nest randomly xi and avoid best nest. ain the LSSVM model, evaluate new nest by objective function, and find fitness F(xi) f (F(xi)>=F(X*)) eplace the current solution with new solution eplace the best F(X*) nd if f (rand>pa) eplace some of the worse nests. nd if nd while etermine the number of Bats (n). Determine the initialized position (Xn), where, xi represents -parameters of LSSVM (σ, γ) that is gained from the Cuckoo algorithm. Randomly, the initialized velocity (Vn). Determine the frequencies Fn, pulse rate r, loudness A. eturn the best solution aluate each Bat and find best F(X*) hile not stopping criterion do r j = 1 to n do enerate new solution by adjusting frequency, update velocity, update position using uations 8, 9, and 10

EXPERIMENTAL SETUP
Implementation of BAT-LSSVM, CUCKOO-LSSVM, PSO-LSSVM, BAT-CUCKOO-LSSVM, and CUCKOO-BAT-LSSVM are conducted using LSSVM lab toolbox (Pelkmans et al., 2002). The parameter setting of these experiments is presented in Table 1. Step 1 : Evaluate each Bat and find best F(X*) Step 2 : While not stopping criterion do Step 3 : For j = 1 to n do Step 4 : Generate new solution by adjusting frequency, update velocity, update position using Equations 8, 9, and 10 Step 5 : If (rand > r) Step 6 : Generate a solution around the best solution Step 7 : End if Step 8 : Train the LSSVM model with new solution.
Step 9 : Evaluate new solutions and update F(Xj) Step 10 : If (rand<A & F(Xj) > F(X*)) Step 11 : Replace the current solution with new solution Step 12 : Replace the best F(X*) Step 13 : End if Step 14 : End while

Data Description
In this study, real datasets, named Diabetes Dataset (Dua & Graff, 2017), was obtained from UCI repository which was utilized to test the effectiveness of the forecasting. The samples covered were from 10 October 1989 to 21 April 1991. Table 2 shows the descriptive statistics of the dataset which includes 1683 cases and four variables. The variables are: date, time, code and value of diabetes mellitus. Before utilizing the data, the dataset was divided into three sub-datasets, 70% for training, 15% for validation and 15% for testing.

Data Normalization
In order to simplify the training task and produce better results, data normalization was performed using min-max normalization (Yusof et al., 2015) and is expressed in Equation 18.
Where, x is the normalized data, x is the original data, x min refers to minimum value in dataset while x max refers to maximum value.

RESULTS
The forecasted values for the testing dataset (i.e., values for attribute diabetes)

Data Normalization
In order to simplify the training task and produce better results, data normalization wa using min-max normalization (Yusof et al., 2015) and is expressed in Equation 18.
Where, is the normalized data, x is the original data, xmin refers to minimum value in d xmax refers to maximum value.

RESULTS
The forecasted values for the testing dataset (i.e., values for attribute diabetes) generat

Data Normalization
In order to simplify the training task and produce better results, data normalization wa using min-max normalization (Yusof et al., 2015) and is expressed in Equation 18.
Where, is the normalized data, x is the original data, xmin refers to minimum value in d xmax refers to maximum value.

RESULTS
The forecasted values for the testing dataset (i.e., values for attribute diabetes) generat   A comparison of the results among the three optimized methods, BAT-LSSVM, CUCKOO-LSSVM, and PSO-LSSVM, and non-optimized LSSVM are presented in Table 3. It is noted that BAT-LSSVM generated the lowest MAPE i.e. 21.26162 (as highlighted in Table 3) which is smaller than CUCKOO-LSSVM which is 21.36823, PSO-LSSVM is 24.7515 and LSSVM is 26.84089. Figure 2 shows the change in MAPE for the four methods.      Table 3. The highest score can be seen in the BAT-LSSVM (as highlighted in Table 3 which is 78.73838) whereas the lowest is produced by the non-optimized LSSVM. In contrast to the MAPE, the larger the value for accuracy, the better the method. Figure 3 illustrates the difference in the predicted accuracy. In addition, the data in Table 3 Table  3. The highest score can be seen in the BAT-LSSVM (as highlighted in Table  3 which is 78.73838) whereas the lowest is produced by the non-optimized LSSVM. In contrast to the MAPE, the larger the value for accuracy, the better the method. Figure 3 illustrates the difference in the predicted accuracy.
In addition, the data in Table 3 also includes information on SMAPE for the four LSSVM methods. The average SMAPE test of BAT-LSSVM is 18.26129948, CUCKOO-LSSVM is 18.20837469, PSO-LSSVM is 22.76379234, and LSSVM is 24.55659539. The CUCKOO-LSSVM has the smallest SMAPE compared with the other methods. Illustration of the change is shown in Figure 4.  Table 3, it can also be observed that the RMSPE value for BAT-LSSVM is the smallest. Figure 5 shows the change in RMSPE test values for all methods under analysis. Besides the error rate, Table 3 also shows the fitness value for the methods, LSSVM, CUCKOO-LSSVM, PSO-LSSVM, and BAT-LSSVM. The average fitness of BAT-LSSVM is 0.03434636 which is similar to CUCKOO-LSSVM (i.e., 0.035003629). While, the fitness for  Table 3, it can also be observed that the RMSPE value for BAT-LSSVM is the smallest. Figure   5 shows the change in RMSPE test values for all methods under analysis. Besides the error rate, Table   3 also shows the fitness value for the methods, LSSVM, CUCKOO-LSSVM, PSO-LSSVM, and BAT-LSSVM. The average fitness of BAT-LSSVM is 0.03434636 which is similar to CUCKOO-LSSVM (i.e., 0.035003629). While, the fitness for PSO-LSSVM is 0.019302678, which is closer to the one obtained by LSSVM (i.e., 0.016658599). Nevertheless, CUCKOO-LSSVM is a better method as it generated the strongest fitness. Illustration of the results is provided in Figure 6 and based on these PSO-LSSVM is 0.019302678, which is closer to the one obtained by LSSVM (i.e., 0.016658599). Nevertheless, CUCKOO-LSSVM is a better method as it generated the strongest fitness. Illustration of the results is provided in Figure  6 and based on these results (i.e. error rates, accuracy and fitness), it is noted that the two strong methods are BAT-LSSVM (i.e. the best in MAPE, accuracy and RMSPE) and CUCKOO-LSSVM (i.e. the best in terms of SMAPE and fitness).     Experimental results using two algorithms to optimize the LSSVM are shown in Table 4. Data in Table 4 compares the outcome of four methods: CUCKOO-LSSVM, BAT-LSSVM, CUCKOO-BAT-LSSVM, and BAT-CUCKOO-LSSVM. Investigation on the effectiveness of using two optimizers has shown that CUCKOO-BAT-LSSVM is better than BAT-CUCKOO-LSSVM. The average MAPE for CUCKOO-BAT-LSSVM is 21.2158 in contrast to the BAT-CUCKOO-LSSVM which generated a higher value, i.e. 21.2943. Figure 7 shows the change in MAPE values for four of the methods. As CUCKOO-BAT-LSSVM produced the smallest MAPE and RMSPE (Figure 10), hence, it also generated the highest forecasting accuracy (i.e., CUCKOO-BAT-LSSVM with 78.7842% and BAT-CUCKOO-LSSVM with 78.7057%). Figure 8 shows the change in accuracy values for four of the methods. However, the best fitness (on average) was produced by the CUCKOO-LSSVM which was 0.035003629 as compared with CUCKOO-BAT-LSSVM which produced the worst (i.e., 0.033579709). The change in the results is presented in Figure 11. Nevertheless, CUCKOO-BAT-LSSVM is still the best method as the difference is not significant.     Figure 11. Nevertheless, CUCKOO-BAT-LSSVM is still the best method as the difference is not significant. Journal of ICT, 19, No. 3 (July) 2020, pp: 351-379 CONCLUSION This study focuses on how to optimize the hyper-parameters of LSSVM, one of the efficient models in machine learning. LSSVM was optimized using three swarm intelligence algorithms; Bat, CS and PSO. The methods were evaluated on medical data which was to predict the value of diabetes. The success of the CS algorithm in finding solutions for global optimization problems in different fields and the strength of the Bat algorithm in exploration and exploitation have yielded good results. The experimental results have demonstrated that BAT-LSSVM has proven to be a good forecasting model in terms of MAPE, accuracy, and RMSPE, whereas CUCKOO-LSSVM is better in terms of SMAPE and fitness value. In addition, the use of the two swarm algorithms (i.e., Bat and CS) to optimize LSSVM have increased LSSVM performance. A lower error rate and higher accuracy have been obtained by both the optimizers. The experimental results also showed that the hybrid CUCKOO-BAT-LSSVM produced better results than the hybrid BAT-CUCKOO-LSSVM in terms of accuracy, MAPE, RMSPE and fitness value. Besides, it also obtained better results compared with CUCKOO-LSSVM and BAT-LSSVM in terms of accuracy, MAPE and RMSPE. With this, the proposed CUCKOO-BAT-LSSVM can be acknowledged as an alternative solution for an accurate forecasting model. To further enhance the forecasting, an investigation of parameter selection for swarm algorithms is worth exploring.