Title: Optimal distributed learning for disturbance rejection in networked non‐linear games under unknown dynamics
Abstract: IET Control Theory & ApplicationsVolume 13, Issue 17 p. 2838-2848 Special Issue: Distributed Optimisation and Learning for Networked SystemsFree Access Optimal distributed learning for disturbance rejection in networked non-linear games under unknown dynamics Farzaneh Tatari, Corresponding Author Farzaneh Tatari [email protected] Electrical Engineering Department, Semnan University, Semnan, IranSearch for more papers by this authorKyriakos G. Vamvoudakis, Kyriakos G. Vamvoudakis Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 USASearch for more papers by this authorMajid Mazouchi, Majid Mazouchi orcid.org/0000-0003-2069-8760 Electrical Engineering Department, Ferdowsi University of Mashhad, Mashhad, IranSearch for more papers by this author Farzaneh Tatari, Corresponding Author Farzaneh Tatari [email protected] Electrical Engineering Department, Semnan University, Semnan, IranSearch for more papers by this authorKyriakos G. Vamvoudakis, Kyriakos G. Vamvoudakis Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, 30332 USASearch for more papers by this authorMajid Mazouchi, Majid Mazouchi orcid.org/0000-0003-2069-8760 Electrical Engineering Department, Ferdowsi University of Mashhad, Mashhad, IranSearch for more papers by this author First published: 01 November 2019 https://doi.org/10.1049/iet-cta.2018.5832Citations: 6AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat Abstract In this study, an online distributed optimal adaptive algorithm is introduced for continuous-time non-linear differential graphical games under unknown systems subject to external disturbances. The proposed algorithm learns online the approximate solution to the coupled Hamilton–Jacobi–Isaacs equations. Each of the players in the game uses an actor-critic network structure and an intelligent identifier to find the unknown parameters of the systems. The authors use recorded past observations concurrently with current data to speed up convergence by exploring the state space. The closed-loop stability and convergence of the policies to Nash equilibrium are ensured by using Lyapunov stability theory. Finally, a simulation example shows the efficiency of the proposed algorithm. 1 Introduction Distributed control of multi-agent systems (MASs) on communication graphs has attracted great attention, motivated by its possible applications in many engineering systems that involve networks. There is considerable literature on distributed control methods to solve the consensus problem [1–6]. This problem is mainly divided into two categories: leaderless consensus problem and leader–follower problem. In the second one, which is the problem of interest in this study, the objective is to design a local control protocol for each agent, which depends only on local information, to ensure that all agents follow the trajectory of an agent called as the leader. It has been recognised in the literature [7–12] that game theory provides a proper framework to study multi-agent problems. Based on differential game theory, the differential graphical game concept is introduced in [7] to provide a framework to solve the leader–follower problem in an optimal manner where the tracking error dynamics, actions, and performance index of each follower agent depend on local neighbour information. The solution to differential graphical games considering the existence of unknown external disturbances is an important issue. However, unknown external disturbances exist in many practical MASs, which are inevitable, and can be a principal cause of poor performance or even worse instability. It is known that solving the differential graphical game with external disturbances for the non-linear systems relies on finding the Nash equilibrium solutions to coupled Hamilton–Jacobi–Isaacs (HJI) equations. However, the coupled HJI equations are non-linear partial differential equations (PDEs) and are difficult or impossible to solve, and may not have global analytical solutions even in simple cases. Therefore, numerical methods are required in order to approximately solve them. Reinforcement learning (RL) techniques [13] have been employed to solve optimal control problems with and without disturbances and modelling uncertainties [14–16]. RL techniques have also emerged as an efficient tool to approximately solve the coupled HJI equations online [9, 17, 18]. In [9], a policy iteration algorithm is provided to find the solution to coupled HJI equations, but it is limited to linear systems and closed-loop system stability of the equilibrium point is not provided. In [17, 18], the authors presented an RL method to design robust adaptive tracking control laws for multi-wheeled mobile robots. However, they rely on either complete knowledge of the systems dynamics [9] or at least partial knowledge of the systems dynamics [17, 18]. However, most of the practical systems are difficult to model exactly. Furthermore, it is well known that non-linearities commonly exist in physical systems, and many of the physical systems possess higher-order dynamics. Therefore, finding the solution to the coupled HJI equations of non-linear systems with higher-order unknown dynamics is an important issue from a practical point of view, and is also challenging due to the dependency of coupled HJI equations on the communication graphs. To the best of our knowledge, there have not been any results on differential graphical games of non-linear systems with higher-order dynamics in the presence of disturbance and completely unknown dynamics. This motivates our research. Contributions: The contributions of the present study are three-fold. We formulate the problem of a non-linear leader–follower consensus in the presence of disturbances as multi-agent zero-sum differential graphical games under completely unknown non-linear dynamics. An optimal distributed learning algorithm is proposed to approximately solve the problem of multi-agent zero-sum differential graphical games of general affine non-linear systems in the presence of external disturbances under unknown dynamics. To this end, the completely unknown non-linear dynamics is identified online through learning-based identifiers while also using an experience replay technique. Finally, rigorous proofs provide guarantees for convergence of the policies to the approximate Nash equilibrium while guaranteeing closed-loop stability. Background on graphs: The communication network is described by a graph , where is the set of vertices representing N agents and is the set of edges of the graph. shows that there is an edge from node i to node j. An adjacency matrix is often used to represent the graph topology where if and otherwise. The set of neighbours of node i is and indicates the set of nodes where node i is in their neighbourhood. is the weighted in-degree of node i. The leader is represented by 0 and information is sent from the leader to the agents for which the leader is in their neighbourhood. Structure: The paper is organised as follows. The problem formulation is explained in Section 2 and coupled HJI equations are derived in Section 3. Section 4 explains the MAS approximation-based identifiers. The proposed distributed optimal adaptive learning algorithm in the presence of disturbance and unknown dynamics is introduced in Section 5. The simulation results are discussed in Section 6 and the conclusions are drawn in Section 7. 2 Problem formulation Consider the dynamics of each agent, as physical components in a directed strongly connected communication graph (cyber component) to be (1)where is the measurable state vector, is the control input, is the external disturbance input, , , and are, respectively, the drift, the input and the disturbance dynamics that will be considered unknown in our developments. It is assumed that the closed-loop system is locally Lipschitz (a classical assumption to have a unique solution for any initial condition ). Consider the uncontrolled leader dynamics that generates the target state as (2) The local neighbourhood tracking error for every agent can be defined as (3)where the pinning gain is non-zero for at least one agent, which communicates directly with the leader agent and otherwise. The time derivative of (3) is given by (4) In order to achieve synchronisation, a distributed control shall be designed, which can keep the tracking error (3) -bounded for , under the unknown dynamics of the MAS. 2.1 Bounded -gain synchronisation problem Consider system (4) with measured outputs (where is the left invertible and can be directly measured), disturbances with and performance outputs with , . It is desired to design control to solve the synchronisation problem when and also to satisfy the following bounded -gain condition (disturbance attenuation level) for a given when for all agents, for a bounded function such that [19], with , and the weighting matrices , , and are symmetric and constant. Let be the minimum value of for which the above disturbance attenuation condition is satisfied. The local performance index for every agent i is defined as (5) It is shown in [9] that the solution of bounded -gain synchronisation problem is equivalent to the solution of the following multi-player zero-sum differential graphical game: where the control and disturbance players try to minimise and maximise the value, respectively. The game has a unique saddle point solution for every agent if [20] The associated value is the value of the game. This is equivalent to the following Nash equilibrium condition: Therefore, given (4) the value function for every node i is given as (6) Remark 1.The inclusion of a game-theoretic control framework to the learning setting guarantees a high degree of robustness, which is required to maintain a sufficient stability margin of the closed-loop system in terms of parametric uncertainties and output disturbances. 3 Coupled HJI equations The value function (6) can be equivalently described by the following Lyapunov equation in terms of the Hamiltonian function: (7)where and . After employing the stationarity conditions in the Hamiltonians, one has (8) (9) Substituting (8) and (9) in (7) yields the following coupled HJI equations: (10)where and with a boundary condition . For a given solution to (10), after defining in terms of , we can rewrite (10) as The coupled HJI equations (10) are highly non-linear PDEs and require the complete knowledge of the dynamics, which make these equations difficult to solve. Therefore, we will use approximation-based techniques. Remark 2.In system (4) with the corresponding value function (6), the optimal control policy and the worst case disturbance, minimise and maximise the cost function (6), respectively. Therefore, the optimal control policy and the worst-case disturbance can be obtained by employing the stationarity conditions (8) and (9), respectively. As is shown in (8) and (9), the optimal control policy and the worst case disturbance are both functions of the local tracking error , due to the term . Hence, and are both distributed optimal control and worst case disturbance policies. 4 Approximation-based system identification Before we proceed, the following definition and assumptions are needed. Definition 1 (persistence of excitation (PE)).The bounded vector signal is PE over the interval if there exists and such that for all t where I is the identity matrix with conformable dimension. Assumption 1.Given admissible feedback control policies, then the non-linear Lyapunov equations (7) have locally smooth solutions . Remark 3.Assumption 1 is widely used, since optimal control problems do not necessarily have smooth or even continuous value functions [21]. In this study, all derivations are performed under the assumption of smooth solutions to (7) and (10) (see for instance [7, 8, 11].) This will allow us to use the Weierstrass high-order approximation theorem [21]. Assumption 2.For a given compact set and , the reconstruction errors, the approximator basis functions, and the gradients of both are bounded. Remark 4.Assumption 2 is standard in the literature [7, 8] according to Weierstrass high-order approximation theorem. Note further that the approximators used are the so-called functional link neural networks (see [22] for more details), for which the activation functions for can be some squashing functions, such as the standard sigmoid, Gaussian, and hyperbolic tangent functions. Furthermore, the bounds mentioned above are only used for the stability analysis, and they are actually not used in the controller design. Motivated by Modares et al. [23], in order to identify the unknown dynamics of every agent in a compact set , we will use identifiers as follows: (11)where , , and are unknown weights, , , and are basis functions, , , and are the reconstruction errors. By using (11), system (1) can be re-written as (12)where is the regressor vector, and . Using Assumption 2, one has , where , , , and . The dynamics (12) can be written as where , , and . The following lemma adopted from [23] provides a filtered regressor for (12). Lemma 1.The solution to (12) can be expressed as (13) (14)where is the filtered regressor version of , , , and is the initial state of (12). Each side of (13) is divided by a normalising signal to obtain (15)where , , , and . Based on Lemma 1 and (15), we consider the identifier weights estimator to be of the form (16)where is the estimated value of weights matrix at time t, for agent i. We shall define the state estimation error of agent as (17)where is the parameter estimation error, , and . We shall use the idea of experience replay [23], which employs recorded observations along with current data to obtain the tuning law of the identifier weights. Define the recorded past data that is collected and stored in the history stack of each agent at times as Consider now as the number of data points stored in the history stack of agent i as , which must contain as many linearly independent elements as the dimension of the basis of the uncertainty in (13) ( rank condition) in order to satisfy the PE condition. The tuning algorithm for the agent identifier weights is given as (18)where indicates a positive definite learning rate matrix, which affects the speed of learning. Theorem 1.Consider the system given by (12). Let the online identifier tuning law be given by the update law of (18) with a filtered regressor given by (14). Then, given that the recorded data points vector has full rank condition, for a bounded model approximation error, the identifier weights estimation errors are uniformly ultimately bounded (UUB), i.e. there exists a bound and time such that for all . Proof.The proof is an extension of the proof in [23]. It can be shown that given that the rank condition is satisfied, the identifier weights estimation error is bounded outside area (19)where stands for the smallest singular value and finally one has □ Remark 5.In order to minimise , a and must be chosen appropriately. One can decrease by choosing a large design parameter a and the number of recorded data points shall maximise to reduce the error bound. Now, (4) can be written in a compact form as (20)where for where is the cardinality measure. Therefore, the local error dynamics (4) is approximated as (21)where are the estimated values of with and . It is worth noting that is UUB based on Theorem 1 (i.e. ). Remark 6.In RL, there exist some methods which are model-free, and system identification is not required. However, due to the coupling terms in the coupled HJI equations (10) and their dependence on graph topology and unknown dynamics, the model-free solution in [24] cannot be straightforwardly extended to solve the existing coupled HJI equations. To overcome the difficulty of solving the coupled HJI equations for MASs under unknown dynamics, this study proposes to use a simple system identifier along with a learning algorithm for every agent to approximately solve the coupled HJI equation and identify the unknown dynamics simultaneously. 5 Learning algorithm We will now use, critic and actor approximators to solve the coupled HJI equations (10). The critic will approximate the cost of each agent, and two actors will be used to approximate the optimal control and worst-case disturbance. 5.1 Critic approximators According to the Weierstrass higher-order approximation theorem [21], there exist independent basis sets such that , and constant approximator weights , for such that the solutions and are uniformly approximated on a compact set as follows: (22) (23)where are the activation function vectors, , for is the number of basis functions, and are the residuals. Remark 7.The approximators (11) and (22) are functional link approximators in a Fourier series form, which can approximate every smooth function and its derivative. Note that the approximation errors and uniformly, as . Moreover, according to Assumption 2, , , , and , . Using critic approximators (22) and fixed feedback policies and , the Hamiltonians (7) can be approximated as follows: Note that and according to Assumption 2, on the compact set , for . Assumption 3.For a given compact set and we assume that: (i) , (ii) and are bounded by constants and , respectively, and (iii) the critic approximators weights are bounded by known constants . Remark 8.Assumption 3 is a standard assumption in neuroadaptive control literature (see [8, 22, 23]). Although Assumption 3 restricts the considered class of non-linear systems, many practical systems (e.g. robotic systems [25] and aircraft systems [26]) satisfy such a property. The critic approximators output and approximate Bellman equations can, respectively, be written as (24) (25)where and are the current estimated values of and , respectively. It is desired to pick to minimise the squared residual error, . Hence, the gradient based tuning law for the critic weights of each player is selected as follows: (26)where , , , and is the learning rate that determines the speed of convergence. Lemma 2.Consider be a set containing given admissible feedback control policies and disturbances, let (26) be the tuning of the critic approximator weights, along with (18) for tuning the identifiers weights and assume that , is PE. Then, for bounded reconstruction errors, the critic weights estimation errors converge exponentially to the residual set for some Proof.From the coupled HJI equations one has (27) Now, substituting (27) in (25) and doing some manipulations, yields (28)Substituting (28) in (26), yields (29)One can see (29) is a linear time-varying system with an input given by , and, therefore, the closed-form solution for is given as (30)where the state transition matrix can be found from (31)with .The state transition matrix , has an exponentially stable equilibrium point provided that , is PE [27]. Using Assumption 2 and the fact that is PE and , , one has (32)for some , for . This completes the proof. □ 5.2 Actor approximators for optimal control and worst-case disturbance Based on (8) and (9), the estimates of control and worst-case disturbance policies can be approximated as follows: (33) (34)where and denote the current estimated values of the ideal weight by the actor approximators, respectively. and are the estimated values of the ideal weights and , respectively. Define the critic and the actors weight estimation errors, , respectively, as (35) In order to ensure the closed-loop system stability and that the policies form a Nash equilibrium, the tuning laws for the two actors are selected as (36) (37)where , , and , , are diagonal positive definite tuning matrices. Finally, the proposed scheme is summarised in Algorithm 1 (see Fig. 1). Moreover, the block diagram of the proposed distributed learning algorithm for every agent is depicted in Fig. 2, where the solid lines show the associated signals and the dashed lines show approximators weights tunings. Fig. 1Open in figure viewerPowerPoint Algorithm 1: Disturbance rejection algorithm in non-linear networked games with unknown dynamics Fig. 2Open in figure viewerPowerPoint Optimal distributed disturbance rejection algorithm under unknown dynamics for every agent i 5.3 Stability and convergence analysis The main theorem, which provides closed-loop system stability and convergence of the policies to Nash equilibrium, is now presented. Theorem 2.Consider the dynamical system (20) with , , , , and unknown. Suppose that is PE and Assumptions 1–3 hold. Let the approximator identifiers weights be updated by (18), the value function, control and worst-case disturbance of each agent be, respectively, given by (24), (33), and (34) and that the tuning laws of agent i critic, the optimal control actor and the worst-case disturbance actor, are, respectively, given by (26), (36) and (37). Then, the closed-loop system states , the critic approximators errors , the actor approximator errors , and the disturbance approximator errors are UUB, for a sufficiently large number of approximators basis. Proof.See the Appendix. □ Corollary 1.Let the assumptions and statements of Theorem 2 hold. Then, the solution provided by the critic, and the two actor approximators provides a solution to the coupled HJI equations (10). The inputs form a Nash equilibrium solution of the zero-sum game. Proof.The proof is a direct consequence of Theorem 2. □ 6 Simulations Consider a network of three single-link manipulators with revolute joints actuated by a DC motor (as shown in Fig. 3). Every single-link manipulator state has motor and link position and velocity [28, 29]. The pinning gain and edge weights are chosen as one. In the graph structure of Fig. 3, the manipulators' dynamics for all are as follows: Fig. 3Open in figure viewerPowerPoint MAS of three single-link manipulators It is assumed that all the agents' structures are known and that the MAS parameters vector is unknown. The unknown identifier weights are considered to be . As pointed out in Remark 5, in order to reduce the system identification error bound in (19), we pick . The sample rate constant is chosen as 0.001 s in order to satisfy the rank condition and limit the identification error bound (19) for every agent. We pick and , for . In order to guarantee that inequality (47) holds the design parameters are selected as . Since the critic approximators are needed to be faster than the actors to guarantee the closed-loop stability, we pick . The critic and actor approximators' activation functions are chosen as for . The PE is guaranteed by adding a small exponentially decreasing probing noise to the control inputs. Fig. 4 shows that the unknown parameters of the agents' dynamics converge to their true values. The evolution of the critic weights is shown in Fig. 5. Fig. 6 shows the local tracking errors and their convergence to a neighborhood of zero. Fig. 4Open in figure viewerPowerPoint Evolution of the unknown parameters of the manipulators Fig. 5Open in figure viewerPowerPoint Evolution and convergence of the critic weights Fig. 6Open in figure viewerPowerPoint Evolution of the manipulators' tracking errors Since the motor and link positions and velocities are limited, the state vector is limited to a compact set () and hence it can be inferred that Assumptions 1–3 are satisfied. As shown in Fig. 4, the unknown parameters uniformly converge to their true values and considering the chosen and consequent , it is evident that Assumption 2 is satisfied. Also, the obtained is a local smooth solution, which indicates the satisfaction of Assumption 1. Assumption 3 is also satisfied since in every compact set , , , and are all bounded and as is depicted in Fig. 6, there is a constant matrix such that . 7 Conclusion An online distributed learning control algorithm based on RL techniques is presented to solve the continuous-time unknown multi-player non-linear graphical games in the presence of disturbances. The distributed learning algorithm is implemented in the form of actor-critic structures to approximate the optimal policies of the players. In order to identify the unknown dynamics, we use identifiers in conjunction with the actor–critic networks. The coupled HJI equations of the agents are approximately solved. The boundedness of the closed-loop signals is proved according to Lyapunov stability and it is ensured that the policies form Nash equilibrium. Future research efforts will focus on extending the model-based technique to a model-free approach. 8 Acknowledgments Kyriakos G. Vamvoudakis is supported in part by the National Science Foundation (grant no. NSF CAREER CPS-1851588), in part by NATO (grant no. SPS G5176), and in part by ONR Minerva (grant no. N00014-18-1-2160). 10 Appendix 10.1 Proof of Theorem 2 The following fact [30] will be used in the proof. Fact 1.For any two vectors x and y and any , it holds that We consider the following Lyapunov function: (38) The time derivative of Lyapunov function is given by (39) The first term in (39), using (17), (21), (33)–(35) and doing some manipulation, can be written as (40)where and For the second term in (39), i.e. , utilising (21), (26), and (35), one has (41)where Using (26), (35) and Fact 1, (41) can be written as (42)where and Note that , and are constant scalars. Using (40), (42), (17), and (35), (39) becomes (43)where Using and , then and are obtained as defined in (36) and (37), respectively. Replacing (36) and (37), then and are given (44) (45) Since , for , there exists , such that . Now, can be reformed as (46)where . Let the parameters and be chosen such that the squared matrix is positive definite. Finally, (43) becomes (47)The Lyapunov derivative is negative as long as (48)According to [31], we can show that if (48) exceeds a certain bound, then, is negative and the closed-loop signals are UUB. 9 References 1Dimarogonas D.V. Frazzoli E., and Johansson K.H.: 'Distributed event-triggered control for multi-agent systems', IEEE Trans. Autom. Control, 2012, 57, (5), pp. 1291– 1297 2Li Z. Ren W., and Liu X. et al.: 'Consensus of multi-agent systems with general linear and Lipschitz nonlinear dynamics using distributed adaptive protocols', IEEE Trans. Autom. Control, 2013, 58, (7), pp. 1786– 1791 3Yang T. Roy S., and Wan Y. et al.: 'Constructing consensus controllers for networks with identical general linear agents', Int. J. Robust Nonlinear Control, 2011, 21, pp. 1237– 1256 4Meng Z. Ren W., and Cao Y. et al.: 'Leaderless and leader-following consensus with communication and input delays under a directed network topolog', IEEE Trans. Syst. Man Cybern. B, Cybern., 2011, 41, (1), pp. 75– 88 5Rezaei M.H., and Menhaj M.B.: 'Adaptive output stationary average consensus for heterogeneous unknown linear multi-agent systems', IET Control Theory Applic., 2018, 12, (7), pp. 847– 856 6Li C.J., and Liu G.P.: 'Data-driven consensus for non-linear networked multi-agent systems with switching topology and time-varying delays', IET Control Theory Applic., 2018, 12, (12), pp. 1773– 1779 7Vamvoudakis K.G. Lewis F.L., and Hudas G.R.: 'Multi-agent differential graphical games: online adaptive learning solution for synchronization with optimality', Automatica, 2012, 48, pp. 1598– 1611 8Tatari F. Naghibi-Sistani M.B., and Vamvoudakis K.G.: 'Distributed learning algorithm for nonlinear differential graphical games', Trans. Inst. Meas. Control, 2017, 39, (2), pp. 173– 182 9Jiao Q. Modares H., and Xu S. et al.: 'Multi-agent zero-sum differential graphical games for disturbance rejection in distributed control', Automatica, 2016, 69, pp. 24– 34 10Kamalapurkar R. Klotz J.R., and Walters P. et al.: 'Model-based reinforcement learning in differential graphical games', IEEE Trans. Control Netw. Syst., 2018, 5, (1), pp. 423– 433 11Tatari F. Naghibi-Sistani M.B., and Vamvoudakis K.G.: ' Distributed optimal synchronization control of linear networked systems under unknown dynamics'. Proc. American Control Conf., Seattle, WA, 2017, pp. 668– 673 12Mazouchi M. Naghibi-Sistani M.B., and Hosseini-Sani S.K.: 'A novel distributed optimal adaptive control algorithm for nonlinear multi-agent differential graphical games', IEEE/CAA J. Automatica Sin., 2018, 5, (1), pp. 331– 341 13Sutton R.S., and Barto A.G.: ' Reinforcement learning an introduction' ( MIT Press, USA, 1998, 1st edn.) 14Gao W., and Jiang Z.: 'Adaptive dynamic programming and adaptive optimal output regulation of linear systems', IEEE Trans. Autom. Control, 2016, 61, (12), pp. 4164– 4169 15Modares H. Lewis F.L., and Naghibi-Sistani M.B.: 'Online solution of nonquadratic two-player zero-sum games arising in the control of constrained-input systems', Int. J. Adapt. Control Signal Process., 2014, 28, pp. 232– 254 16Vamvoudakis K.G., and Lewis F.L.: 'Online solution of nonlinear two-player zero-sum games using synchronous policy iteration', Int. J. Robust Nonlinear Control, 2012, 22, (13), pp. 1460– 1483 17Luy N.T. Thanh N.T., and Tri H.M.: ' Reinforcement learning-based robust adaptive tracking control for multi-wheeled mobile robots synchronization with optimality'. IEEE Workshop on Robotic Intelligence in Informationally Structured Space, Singapore, 2013, pp. 74– 81 18Tan L.N.: 'Distributed optimal integrated tracking control for separate kinematic and dynamic uncertain non-holonomic mobile mechanical multi-agent systems', IET Control Theory Applic., 2017, 11, (18), pp. 3249– 3260 19Aliyu M.D.S.: ' Nonlinear H∞ control, Hamiltonian systems and Hamilton–Jacobi equations' ( CRC Press, Taylor and Francis Group, Boca Raton, 2011) 20Basar T., and Bernhard P.: ' H∞ optimal control and related minimax design problems: a dynamic game approach' ( Springer, USA, 2008, 2nd edn.) 21Abu-Khalaf M., and Lewis F.L.: 'Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach', Automatica, 2005, 41, pp. 779– 791 22Lewis F.L. Jagannathan S., and Yesildirek A.: ' Neural network control of robot manipulators and nonlinear systems' ( Taylor and Francis, London, UK, 1999) 23Modares H. Lewis F.L., and Naghibi-Sistani M.B.: 'Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks', IEEE Trans. Neural Netw. Learn. Syst., 2013, 24, (10), pp. 1513– 1525 24Vamvoudakis K.G.: 'Non-zero sum nash Q-learning for unknown deterministic continuous-time linear systems', Automatica, 2015, 61, pp. 274– 281 25Slotine J.J.E., and Li W.P.: ' Applied nonlinear control' ( Prentice Hall, Englewood Cliffs, NJ, USA, 1991) 26Sastry S., and Bodson M.: ' Adaptive control: stability, convergence, and robustness' ( Prentice Hall, Englewood Cliffs, NJ, 1989) 27Ioannou P., and Sun J.: ' Robust adaptive control' ( Prentice Hall, New Jersey, 1996) 28Spong M.: 'Modeling and control of elastic joint robots', ASME J. Dyn. Syst. Meas. Control, 1987, 109, pp. 31– 319 29Raghavan S.: 'Observers and compensators for nonlinear systems, with application to exible-joint robots'. PhD dissertation, University of California at Berkeley, Berkeley, CA 94720, U.S.A, 1992 30Hardy G. Littlewood J., and Polya G.: ' Inequalities' ( Cambridge University Press, UK, 1989, 2nd edn.) 31Khalil H.K.: ' Nonlinear systems' ( Prentice-Hall, USA, 1996) Citing Literature Volume13, Issue17November 2019Pages 2838-2848 FiguresReferencesRelatedInformation