Document Type : Research Article
Authors
^{1} Department of Electrical Engineering, Nowshahr Branch, Islamic Azad University, Nowshahr, Iran
^{2} Department of Mechatronics Engineering, K. N. Toosi University of Technology, Tehran, Iran
^{3} Department of Mechanical, Electrical, and computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran
Abstract
Keywords
Computational intelligence (CI) is the knowledge for the emulation of humanmind reasoning or instinct of living creatures by a computer program. Learning is one of the foremost cognitive phenomena of CI. The phenomenon employs acquired data along with training algorithms to generate functional information. CI refers to the methodologies for solving or modeling complex problems with uncertainties where the traditional modeling approaches are ineffective. Two main divisions of CI are Fuzzy reasoning and the neural network. The fusion of Fuzzy reasoning as linguistic rules and network structure of the neural network with learning capability, namely ANFIS has the potential to possess the benefits of both in a unified framework. So that in recent studies its usage has been cited in many engineering problems such as control, system identification, prediction, etc.
Zadeh (1975), proposed type 2 Fuzzy systems that are undoubtedly prominent because of confronting uncertainty and dealing with noisy data. [1]. For this reason, T2FS and T2ANFIS seem to score superiority compared with their type 1 family. Unlike T1FS, where their membership grades are precise, the membership grades of T2FS are Fuzzy [2].
System identification requires a suitable model and an algorithm (or algorithms) to update its parameters. In this way, the model’s output should have the lowest difference with the output of the identified system. Training algorithms of ANFIS and T2ANFIS as a model in system identification are divided into three types; derivativebased, free derivative, and hybrid algorithms. Hybrid algorithms are made by merging of prior mentioned algorithms.
Among derivativebased algorithms, the gradient descent (GD) algorithm is a traditional algorithm for optimizing the parameters of ANN, ANFIS, and T2ANFIS. The learning rate has a crucial role in the convergence speed of GD. Selecting a high value of learning rate to acquire a faster convergence may result in instability of the training process. By contrast, selecting a small value for the learning rate reduces the convergence rate, and hence the algorithm may stagnate in the local minimum [3, 4]. Levenberg Marquardt (LM) algorithm evolved from the interpolation of GD and GaussNewton (GN). Its momentum coefficient determines how much it is close to GD or GN. Although it has a faster convergence rate compared with GD, it seems that its convergence rate slows whenever the dimension of the parameters is large [5]. Kalman filter families, like KF [6], EKF [7], and UKF [8] are the other cases of derivativebased algorithms. Convenient consistency and good estimation in the Kalman filter family, even with the existence of process noise or uncertainty, are the main reasons for the preference of the KF family compared with other derivativebased algorithms in recent studies.
More probability of stagnation in the local minimum is the main problem of derivativebased algorithms [9]. However, stochastic or adaptive selection of the learning parameters (e.g. learning rate, and momentum coefficient) may tackle this problem to a certain extent [ 1012].
Freederivative algorithms are distinguished from derivativebased algorithms in many aspects. One of the most brilliant advantages of the freederivative algorithms such as genetic algorithm (GA; [13]), simulated annealing (SA; [14]), artificial bee colony (ABC; [15]), ant colony optimization (ACO; [16]), search group algorithm (SGA; [17]), water cycle algorithm (WCA; [18]), sliding mode control (SMC; [19]) and PSO [20] is that they do not require derivative of the objective function for MF parameters in the update process. Besides, they have more robustness in finding a global minimum. Another noteworthy point is that they are implementable to any membership function, mainly in the antecedent part of T2ANFIS (with any shape and structure) [21]. But the main drawback of these algorithms is their high computational burden that makes their usage inconceivable in online identification and control issues. Therefore, Hybrid algorithms have been introduced, which can integrate the favorable features of derivativebased and freederivative algorithms. This integration provides a synergy that improves the unfavorable characteristics of both algorithms as well.
Training stability, as one of the most significant issues in the identification process, has attracted the attention of many researchers in this field. Generally, Lyapunov stability analysis is the most commonly used analytical approach for this purpose. Lyapunov stability theorem determines adaptive boundaries for the APAs so that these boundaries guarantee the stability of the training process. Because of the probabilistic nature of some algorithms like SA or the existence of mutation in GA, the application of the Lyapunov stability theorem on derivativebased algorithms seems to be more implementable than freederivative or hybrid algorithms. However, some researchers reported the implementation of the Lyapunov stability theorem on freederivative algorithms like PSO [22] and SMC [19].
Authors of [23] presented stability analysis for a type 2 fuzzy Neural network using LM. They used the firstorder expansion of the Taylor series for all of the model parameters separately. Despite providing suitable degrees of freedom by this issue which can result in a proper performance, the LM has an intrinsically high computational burden. The authors in [22] obtained stable adaptive boundaries for different algorithms. They provided the details of mathematical proofs to attain the stabilizing boundaries of GD and then LM in training both the antecedent and the consequent parameters of the ANFIS model. They also obtained adaptive lower and upper boundaries of a hybrid algorithm in which PSO and EKF were used for training the antecedent and consequent parameters of the model (PSO+EKF), respectively. Different from [23], they divided the model parameters into two distinct forms, vector and matrix form parameters. The same analysis was provided by the authors of [24] and [25] for obtaining the learning rate boundary. A type 2 Fuzzy wavelet Neural network model was presented in [24] where its parameters were optimized by GD. In addition to defining a stabilizing boundary for learning rate, they started learning with a small learning rate. During learning, the proposed algorithm changed the learning rate to guarantee stability and speed up the learning. The stabilizing boundaries of their work were similar to that in [22]. In [25], a selfcompensatory interval type‐2 FNN (TSCIT2FNN) was utilized as a model. The Antecedent and the consequent parameters were optimized by GD and variable EKF, respectively. To guarantee the learning stability, they examined the convergence of GD and did not provide any analysis for the convergence of the consequent parameters. A similar approach was reported in [26] and [27]. The authors of [26] examined the training stability and presented two Fuzzy models. They also provided their firstorder derivatives in full detail. In [27], the input data were clustered to reduce the computational complexity. In their work, the algorithm updated the weights by pseudoinverse matrix instead of calculating them iteratively. Moreover, the authors provided a stability analysis similar to those proved in [24], [25], and [26].
In most of the above references or other related publications, the authors' focus has been on how to extend Lyapunov functions in terms of model parameters to obtain stabilizing boundaries. They usually used the Lyapunov function in the form of the squared error between the model output and the output of the real system (the system to be identified). Zabihi shesh Poli, Aliyari Shoorehdeli, and Moarefianpour (2019), proposed a novel Lyapunov function in the form of squared of error difference for the first time [28]. They analyzed the stability of the training process in the identification of an Interval type 2 adaptive NeuroFuzzy inference system (IT2ANFIS) at first by GD and then by a hybrid algorithm based on GD and FRLS. They provided new adaptive laws for the learning rate in GD and the covariance matrix in FRLS. The same authors in [29] studied the stability of the priormentioned system by considering the GD and KF for training antecedent and consequent parameters, respectively. They showed, when KF was substituted instead of FFRLS, higher degrees of freedom were attained. In addition, they reached a relatively better performance compared with GD+FFRS. This paper utilizes the same Lyapunov function used in [28] and [29] to obtain stability conditions of a swarmbased algorithm. In doing so, stability analysis is conducted based on the identification of the IT2ANFIS model when its antecedent and consequent parameters are trained by PSO and KF, respectively (PSO+KF).
2.1. The structure of the Type 2 fuzzy system (T2FS) of this study
From the idea of blurring the points on the type 1 fuzzy membership function (MF), a type 2 MF can be attained, which defines a type 2 fuzzy set (T2FS) [30]. Unlike type 1 fuzzy sets, the membership degree in T2FS is a fuzzy set in [0, 1]. This evolution provides a threedimensional, bivariate MFs represented as in which for a general T2FS, . A T2FS is represented as

(1) 
The variables and are the primary and the secondary variables respectively. However, restricting to 1, forms interval T2FS (IT2FS) that trivialize the third dimension that can be expressed as follows:

(2) 
The Fuzzy system of this study is a specific Takagi Sugeno Kang (TSK) type IT2FS. This structure employs interval type 2 MFs (T2MF) with Gaussian spread in the antecedent and summation of crisp multiples of inputs in the consequent. A Gaussian T2MF with the uncertain mean and an identical standard deviation is shown in Fig. 1. Similar to type 1 TSK Fuzzy systems, the rule governing the Fuzzy system is represented as below [2]:

(3) 
Where is a T2MF corresponds to the rule, and input in the antecedent, is the input, and are crisp coefficients in the consequent.
Fig. 1. A Gaussian T2MF with the uncertain mean and an
2.2. IT2ANFIS and its output calculation
Representation of IT2FS in a network structure with learning capability creates a model which is called IT2ANFIS. This model is shown in Fig. 2. The input to the model is fuzzified as

(4) 

(5) 
In equations (4) and (5), are the lower and upper mean values correspond to the rule and input respectively. As defined in above equations, is the standard deviation. The lower and upper membership grades i.e. and are determined as

(6) 

(7) 
Using (6) and (7), the lower and upper firing strengths are defined as follows:

(8) 
Different from T1FS, defuzzification, and output
calculation in T2FS require type reduction. Iterative type reduction mechanisms were presented in some pieces of literatures [31 34]. But for performing stability analysis, the output of the model requires a closedform. On that account, this paper utilizes a closed solution provided by [35] that calculates the model output as follows:
(9) 
Fig. 2. The IT2ANFIS model [28]
2.3. Stability analysis
Generally, in an identification process, the algorithms tune the model parameters so that the model output best fits the actual output of an identified system. Selecting suitable values of APAs (such as covariance matrix in the KF or inertia factor and maximum gain in PSO) is of high priority. As explained previously, to escape from local minimums, as well as to have a consistent convergence, these parameters must be selected adaptively. For determining APAs adaptively and finding their stabilizing boundaries, the Lyapunov stability theorem requires a suitable positive definite Lyapunov function. These boundaries are calculated in such a way to make the difference of the Lyapunov function negative. The boundaries guarantee the stability of the identification process. Although outside of these boundaries, the algorithm may be stable, this stability is not guaranteed. The boundaries of the APAs acquired from the Lyapunov stability theorem are applied to the algorithms at each data entry and iteration to maintain the training stability of the model (i. e. IT2ANFIS). The main contribution of this study is the exploitation of a newly found Lyapunov function in finding the stabilizing boundaries for the APAs of the KF and PSO based on a hybrid combination, namely PSO+KF.
2.3.1. Proposed Lyapunov function
Nearly in most studies related to this field, the objective function is the square of the error between the model and the actual system to be identified. Moreover, the references that selected the Lyapunov function in the same way as this objective function did not consider the error rate. For this reason, the Lyapunov function proposed by the authors of [28] and [29] is utilized. This Lyapunov function is the square of the error difference between the model and the actual system. The proposed Lyapunov function can be interpreted as the kinetic energy of the system that considers the error rate. The error itself is minimized because the objective function is the square of this error. Therefore, the Lyapunov function of this study is represented as
(10) 
And,
(11) 
Error e(k) is the difference between the real output of the system to be identified , and the output of the IT2ANFIS model . According to (10), the change of Lyapunov function ΔV(k) can be written as:
(12) 
Where Δe(k) is the error difference that can be expressed based on the firstorder Taylor series expansion of error with respect to the IT2ANFIS parameters as follows:
(13) 
The error at the time k+1 is defined as
(14) 
And ΔM(k) is the difference of Δe(k) (the rate of error difference) that can be expressed in the same way as (13), as follows:
(15) 
The error difference at the time k+1 is defined as
(16) 
Where x and Γ are the antecedent and consequent parameters in vector form, respectively. x and Γ are denoted by
(17) 
2.3.2. Stabilizing conditions of the IT2ANFIS identification trained with PSO+KF
Theorem. Given that the antecedent and the consequent parameters of the IT2ANFIS are trained with PSO and KF respectively, Lyapunov stability theorem of this study requires the following conditions:
(18) 
Proof. To attain the above condition according to the proposed theorem, and in the PSO algorithm, and and in the KF algorithm, should be calculated separately. Then by substituting them into )12(, ΔV(k) can be defined. Therefore, sections 2.3.2.1 and 2.3.2.2 give a detailed explanation of acquiring the Δe(k) and ΔM(k).
2.3.2.1 Particle Swarm Optimization (PSO) algorithm
The PSO is an intelligent algorithm that was introduced by Eberhart and Kennedy in 1995 [37]. This algorithm is an iterative technique that is artificially inspired by the social behavior of flocks of birds or schools of fish. In this algorithm, each particle is a possible solution to the optimization problem. Each particle's velocity and position span a multidimensional space at each stage of the optimization process. Each element of the position vector is a variable for the problem (here, each component of the antecedent parameters). The participation of particles in the optimization problem improves the state of the particle population, and the algorithm reaches an optimal solution. The optimization procedure of this algorithm is summarized as follows:
Where the objective function is defined as follow:
(19) 
(20) 
Where, , and are random parameters in which, and . and are random numbers in the interval [0,1]. and are the acceleration constants and is the inertia weight such that .
(21) 
Proving the stability for the global best particle can guarantee the convergence of other particles. With this condition, It is evident that the best experience of the global best particle equals the best particle in all particles (global best particle or the ). Therefore, for the stability analysis of the global best particle according to the Lyapunov stability of the proposed theorem, equations (20) and (21) can be written for the global best particle as:
(22) 
Where , , and .
The parameter k is called the maximum gain.
Based on (22), following relation can be deduced:
(23) 
By defining d(k) as follow
(24) 
Equation (23) is written as
(25) 
Thus, in (13) can be developed as:
(26) 
Similarly, the expansion of the first part of (15) is:
(27) 
By the further deepening of (27), we have:
(28) 
Where .In which H is the pseudo diagonal Hessian matrix that contains second derivatives of w.r.t the antecedent parameters. We also define the derivative operator as .The first part of (28) is developed as:
(29) 
For expanding the second part of (28), the following relation should be determined:
(30) 
The term in particle dynamics is defined as:
(31) 
is the particle acceleration vector. Denoting that this study has a discretetime nature, and the dimension of particles is , equation (31) can be rewritten as follows:
(32) 
Indices "1" to "n" are the numbers attributed to each variable of the particle's position and velocity. Moreover, , in which I is the identity matrix. Therefore,
(33) 
consequently,
(34) 
Finally, the is derived as:
(35) 
By defining as follow:
(36) 
Equation (35) is written as below:
(37) 
2.3.2.2 Kalman Filter (KF) algorithm
The equations related to the KF are accessible in the literature [6, 36]. Suppose is the output of IT2ANFIS model, and Γ is the consequent parameters of this model as defined in (17), then IT2ANFIS model is represented in statespace form as:
(38) 
Where ω(k) and ν(k) are the process and the measurement white noises, respectively, which are artificially added to the statespace equation. The is a transfer function between the output of the IT2ANFIS and the parameters to be estimated, which is built up by inputs, governing Fuzzy rules, and the antecedent parameters. The task of the KF is to estimate Γ in the model. Thus, the updated rules of the KF is written as below:
(39) 
Where R(k) and Q(k) are the covariance matrices of 𝜔(k), and ν(k), respectively. Here, is the prediction error, P(k) is the covariance matrix of the state variables Γ (k), and L(k) is the Kalman gain. With some straight mathematical operations on (38) and (39), the second part of (13) can be written as follows:
(40) 
By similar mathematical operations, the second part of (15) is expanded as:
(41) 
As stated in (36), and defining 𝜉 as
(42) 
The change of Lyapunov function, ΔV(k), can be written as bellow:
(43) 
For stability of the identification process according to the proposed theorem, equation (43) must be negative definite. For that reason, we have the following conditions:
(44) 
Considering the sign of the second part of (44) and rearranging its first part, we can obtain the following inequality which results in the following inequality:
(45) 
Although there are several solutions to satisfy the negative definiteness of (43), one of the most available solutions for (45) is to consider the following inequalities:
(46) 
The stabilizing boundaries of the proposed theorem based on PSO+KF are:
(47) 
(48) 
By a similar procedure, the stabilizing boundaries are:
(49) 
The desirable regions attained from (47) and (49) on the 𝜔 k axis cover the first and the third quadrants of the trigonometric coordinate. Since these boundaries are not predefined and are obtained during simulation, for presentable boundaries and better performance of the proposed theorem we derive the intersection of boundaries in the theorem with the boundaries provided in [37]. According to [37], PSO was represented as the following statespace form:
(50) 
Where the state variables and the input of the system were determined by:
(51) 
Therefore, for the asymptotic stability of the origin, the following condition was obtained:
(52) 
Finally, the reliable regions obtained from the intersection of the proposed theorem and [38] are shown in Fig. 3. This region is broader than what was defined in [22]. One of the reasons for this occurrence is the existence of matrix H that adds more flexibility in selecting the 𝜔 and the . Moreover, minimizing the error rate besides the error increases the capability of the algorithms in the optimization problem.
2.4. Computational burden
In the IT2ANFIS model of this study, n inputs, m Fuzzy rules per inputs, and h training data are considered. In the proposed theorem, if p is considered as the number of particles, then the computational burden is 4mnhp+3mhp+9mnp+3hp+const. Since h has the highest dimension among others, when all parameters have the same stopping condition, the computational burden of the proposed theorem will be of order .
The adaptive stabilizing boundaries of this study are utilized in the identification of the MackeyGlass Chaotic time series and a stochastic non‐linear system. The parameters of IT2ANFIS are tuned adaptively in such a way as to predict the future values of these systems. The results of this study are compared with the results of the LM algorithm in [23, 39], EKF and PSO+KF in [40], GD+FFRLS in [28], and GD+KF in [29]. Rootmeansquare errors (RMSEs) derived in both the training and the checking data are the expected values of 10 separate runs. MATLAB R2015b is the simulation software of the benchmark. The hardware used in this study is a computer with a 2.4 GHz processor and 4 GB installed memory.
Fig. 3. Reliable regions of inertia factor and maximum gain obtained from the intersection of the proposed theorem and [38]
3.1. Implementation of the proposed theorems in predicting the future values of MackeyGlass chaotic time series
Example 1. The equation concerning the dynamics of the MackeyGlass chaotic time series is given below [41]:
(53) 
Where x(0)=1.2 and τ=17. The trained IT2ANFIS model of this study should be able to predict the values of x(t+1) that are defined as the output of the system, using the input values of x(t), x(t1), x(t2), and x(t3). For the training phase, 750 sets of data are considered, and 335 sets of data are considered for the prediction phase (checking data). In this system, the number of tuning parameters in the antecedent and the consequent are 48 and 20, respectively. For comparison, a noise with the level of signaltonoise ratio (SNR) of 10 dB is added to the data during the identification process. The following relation is used for describing the SNR:
(54) 
In the simulation of the benchmarks based on the proposed theorem, seven particles are considered for tuning the consequent parameters by the PSO. Fig. 4 shows the training RMSE of the IT2ANFIS model in example 1 when the inertia factor and the maximum gain in PSO and 𝜉 in KF are defined within the boundaries of the proposed theorem.
Fig. 4. Training RMSE of the IT2ANFIS, stabilizing values of inertia factor and the maximum gain in PSO, and 𝜉 in KF when the APAs are defined within the boundaries of the proposed theorem
(Ex. 1)
The orangecolored area at the uppermost part of Fig. 4 is the region that is selected for the APAs of the PSO (Inertia factor and maximum gain). The middle part of Fig. 4 shows the variations of the 𝜉 in the KF algorithm during the tuning of the consequent parameters. As shown, the variations of the 𝜉 are bounded in the boundaries defined in the theorem. The lower part of Fig. 4 shows the convergence of the training RMSE based on the boundaries shown. Fig. 5 shows the predicted output of the IT2ANFIS model (blue dashed line) versus real checking data (solid red line) that confirms the proper performance of this theorem in terms of prediction RMSE. The convergence of the antecedent parameters in example 1 is shown in Fig. 6. Similarly, Fig. 7 shows the convergence of the consequent parameters. The fixation of these parameters at a constant value after some iterations indicates the stability of the training process. As shown in Fig. 8, when the APAs are chosen outside of the boundaries defined in the proposed theorem, instability will be inevitable. To compare the performance of this study with other related researches, white noise with SNR of 10 dB is added to the data sets. The result of this case is shown in Fig. 9. As shown, the solid blue line is the predicted noisy output of IT2ANFIS during checking data, and the red line represents the real checking data without noise. Table (1) shows the comparison of the RMSEs resulting from example 1 based on the proposed study with those in EKF and PSO+KF [40], LM [23, 39], GD+FFRLS [28], and GD+KF [29].
As stated, the results of this study have better RMSE compared with others. But has a higher simulation time (35 seconds) than [28], [29], and [40], because in PSO each particle is considered as a candidate solution that increases the computational burden. However, it has a less simulation time than [39] and the PSO+KF algorithm used in [40]. The results of table (1) are given in two situations. Initially, the results are obtained when the data sets are not subjected to noise. Although, whenever the KF is utilized as an optimizer, a small value of artificial white noise (higher than 50 dB) should be added to the data sets. In this case, the RMSEs of the training and checking data are the smallest value. Then, the results are given when the data sets are subjected to the white noise of about 10dB. Two states are considered:
At first, the RMSEs are given Based on the difference between predicted noisy data and real noisy data. Then they are given Based on the difference between predicted noisy data and real nonnoisy data. The mean value, the best value, and the standard deviation of the RMSEs are computed for 10 separate runs. Given that the checking data are more important, it makes sense to compare these data. The Mean checking RMSE of the proposed theorem is about 0.0642 which is the smallest value among the other methods.
Fig. 5. Predicted data during checking data using the identified model versus the real checking data (Ex. 1)
Fig. 6. Convergence of the antecedent parameters of IT2ANFIS (Ex. 1)
Fig. 7. Convergence of the consequent parameters of IT2ANFIS (Ex. 1)
Fig. 8. Training instability when the APAs are chosen outside of the boundaries defined in the proposed theorem (Ex. 1)
Fig. 9. The predicted noisy output of IT2ANFIS during checking data (the blue solid line) and the real checking data without noise (the red line)
(Ex. 1)
3.2. Application to the Prediction of the future values in a stochastic non‐linear system.
Example 2. The system in this example is a stochastic nonlinear system represented by
(55) 
The variables x, y, and z in the training process are random numbers in the range [1,6].
Similarly, for checking data, these variables are randomly selected in the range [1.5, 5.5]. For this, 216 and 125 sets of data are considered for the training and checking data, respectively. Fig. 10 shows the training and the checking RMSE of the IT2ANFIS when the APAs in the PSO and KF are defined within the boundaries of the proposed theorem. Fig. 11 shows the predicted output of the IT2ANFIS (blue dashed line) versus real checking data (solid red line). The figure demonstrates the proper performance of this theorem in terms of prediction RMSE for this example. Figure 12 shows the convergence of some of the antecedent and consequent parameters that suggests the correctness of the proposed theorem. As depicted in Fig. 13, training instability occurs when the APAs are chosen outside of the defined boundaries. Table (2) shows the comparison of the RMSEs resulting from this study with those in GD+FFRLS [28] and [29]. Although the results of [28] and [29] show lower simulation time, they have higher training and checking RMSE compared with this study.
Fig. 10. RMSE of training and checking data (Ex. 2)
Fig. 11. Predicted data during checking data using the identified model versus the real checking data (Ex. 2)
Fig. 12. Convergence of some of the antecedent and consequent parameters of IT2ANFIS (Ex. 2)
Fig. 13. Training instability when the APAs are chosen outside of the boundaries defined in the proposed theorem (Ex. 2)
Table (1): Comparison of the RMSEs resulting from the proposed study with other references (Ex. 1)
Example 1 
Inputs: x[t], x[t1] x[t2], x[t3] Output: x[t+1] No. of training/Checking data 750/335 

Signal to Noise Ratio (SNR) 
Without Noise 
10 dB 

Based on the difference between predicted noisy data and real noisy data 
Based on the difference between predicted noisy data and real nonnoisy data 

LM [23, 39] 
Training 
Mean 
0.008 

0.0914 
Checking 
Mean 
0.04 

0.0913 

Simulation Time 
70 sec 

EKF [40] 
Training 
Mean 
 

0.0770 
Checking 
Mean 
 

0.0943 

Simulation Time 
32.2 sec 

PSO+KF [40] 
Training 
Mean 
 

0.0765 
Checking 
Mean 
 

0.0933 

Simulation Time 
59.53 sec 

GD+FFRLS [28] 
Training 
Mean 
9.61e4 
0.0256 
0.0594 
Best 
8.6e4 
0.0225 
0.0588 

Standard Dev. 
6e5 
0.0027 
0.002 

Checking 
Mean 
9.77e4 
0.0261 
0.0695 

Best 
8.7e4 
0.023 
0.0664 

Standard Dev. 
5.1e4 
0.0026 
0.0025 

Simulation Time 
21 sec 

GD+KF [29] 
Training 
Mean 
9.5e4 
10.2e4 
0.0536 
Best 
9.1e4 
9.6e4 
0.0524 

Standard Dev. 
2.35e4 
4.28e5 
0.0015 

Checking 
Mean 
9.7e4 
11.3e4 
0.067 

Best 
9.4e4 
10.8e4 
0.0645 

Standard Dev. 
5.53e5 
4.15e5 
0.0018 

Simulation Time 
18 sec 

Proposed theorem based on PSO+KF 
Training 
Mean 
8.3e4 
9.8e4 
0.0510 
Best 
7.9e4 
9.2e4 
0.0485 

Standard Dev. 
3.5e4 
1.93e5 
0.0016 

Checking 
Mean 
9.2e4 
11e4 
0.0642 

Best 
8.79e4 
10.6e4 
0.0625 

Standard Dev. 
3.2e5 
4.9e5 
0.0015 

Simulation Time 
35 sec 
Abbreviations: LM: LevenbergMarquardt; GD: gradient descent; FFRLS: forgettingfactor recursive least square; KF: Kalman filter; EKF: Extended Kalman filter; PSO: particle swarm optimization.
Table (2): Comparison of the RMSEs resulted from the proposed study with [28] and [29] (Ex. 2)
Example 2 
Inputs: x, y, z Output: u No. of training/Checking data 216/125 

GD+FFRLS [28] 
Training 
Mean 
0.1586 
Best 
0.0758 

Standard Dev. 
0.0312 

Checking 
Mean 
0.1678 

Best 
0.1053 

Standard Dev. 
0.0263 

Simulation Time 
4.5 sec 

GD+KF [29] 
Training 
Mean 
0.1025 
Best 
0.0835 

Standard Dev. 
0.0121 

Checking 
Mean 
0.118 

Best 
0.0956 

Standard Dev. 
0.0124 

Simulation Time 
1 sec 

Proposed theorem based on PSO+KF 
Training 
Mean 
0.1012 
Best 
0.0841 

Standard Dev. 
0.0126 

Checking 
Mean 
0.11 

Best 
0.0924 

Standard Dev. 
0.0128 

Simulation Time 
8 sec 
3.3. Discussions
In training the antecedent part of the IT2ANFIS by PSO, matrix H, known as the Hessian matrix, provides more flexibility in inertia factor and maximum gain. This flexibility has led to the choice of a wide range of APAs. The existence of this matrix, which contains secondorder derivatives, can increase the computational load in the identification process to some extent. But this matrix is the inseparable part of the proposed theorem. Matrices R and Q are weight factors between the state (prediction) and the measurement. The ratio of the two is representative of Kalman gain. Large values of Q mean higher uncertainty in the states. In other words, less confidence in the results of the states (prediction), and the filter must be optimized further by measurements. Similarly, large values of R mean more uncertainty in the measurements, and the filter should be updated less with the measurements. By a large R, the value of the measurements decreases, and the RMSE increases during training and checking data. The performance of the algorithm improves when the
APAs of the PSO are selected randomly and purposefully. In this study, in addition to providing adaptive stabilizing boundaries, their randomness is also preserved. One of the points that should be mentioned in the stability of model parameters is that if one of the membership function’s mean in the antecedent part of the model goes toward infinity, it results in the zeroing of the firing strength. Similarly, if one of the membership function’s standard deviation goes toward infinity, the related firing strength becomes one and instability will not occur.
Utilizing the type 2 Fuzzy system in the IT2ANFIS model, has improved the ability of the system in the presence of noise and uncertainties. The main Contribution of this study compared with other related studies was finding a novel Lyapunov function to introduce newer and broader stabilizing boundaries to enhance the performance of the algorithms in an identification process. Therefore, the introduction of the novel Lyapunov function in this study not only has led to providing new adaptive stabilizing boundaries for the APAs but also increased the performance of the identification process (training and checking data). The adaptive stabilizing boundaries attained from this study have resulted in the stability of the training process. While choosing them outside of these boundaries will not guarantee the stability of the training process. Adaptive selection of APAs in the proposed theorem has increased the chance of escaping from the local minimum traps. Implementation of the proposed theorem in predicting the future values of MackeyGlass chaotic time series and a stochastic nonlinear system shows that the performance of the theorem is comparable with those presented in [28] and [29], and better than [23], [39], and [40]. However, the proposed theorem has slightly better performance than [28] and [29]. In terms of simulation time, the proposed theorem has a higher simulation time compared to [28], [29], and [40], but less than others. This matter is due to the high computational burden of PSO compared to the other algorithms of this study. As shown in tables 1 and 2, Replacing FFRLS with KF can increase the noise confronting capability of the algorithm. Moreover, replacing GD with PSO reinforced this issue. Another issue is that the derivativebased algorithms may not be implementable to the models with membership functions that do not have explicit derivatives at some points. This problem has been solved by substituting PSO with the derivativebased algorithms in this study.
Acknowledgment
The authors gratefully acknowledge the ISEE journal editorial board for their work on the original version of this document.
[1] Submission date:06, 07, 2021
Acceptance date: 15 , 12, 2021
Corresponding author: Mahdi Aliyari Shoorehdeli, Department of Mechatronics Engineering, K. N. Toosi University of Technology, Tehran, Iran