Tag Archives: Adaptive Dynamic Programming

A novel method of mathematical compression of the value function for polynomial (in the state) time complexity of value iteration / policy iteration

Alex Gorodetsky, Sertac Karaman, and Youssef Marzouk, High-dimensional stochastic optimal control using continuous tensor decompositions, The International Journal of Robotics Research Vol 37, Issue 2-3, pp. 340 – 377, DOI: 10.1177/0278364917753994.

Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions (e.g. the value function) in a compressed format, and directly perform dynamic programming computations (e.g. value iteration, policy iteration) in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in “compressible” problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to 10 orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.

Value iteration applied in control systems when the model of the plant is substituted by data acquired from the plant

Yongqiang Li, Zhongsheng Hou, Yuanjing Feng, Ronghu Chi, Data-driven approximate value iteration with optimality error bound analysis, Automatica, Volume 78, April 2017, Pages 79-87, ISSN 0005-1098, DOI: 10.1016/j.automatica.2016.12.019.

Features of the data-driven approximate value iteration (AVI) algorithm, proposed in Li et al. (2014) for dealing with the optimal stabilization problem, include that only process data is required and that the estimate of the domain of attraction for the closed-loop is enlarged. However, the controller generated by the data-driven AVI algorithm is an approximate solution for the optimal control problem. In this work, a quantitative analysis result on the error bound between the optimal cost and the cost under the designed controller is given. This error bound is determined by the approximation error of the estimation for the optimal cost and the approximation error of the controller function estimator. The first one is concretely determined by the approximation error of the data-driven dynamic programming (DP) operator to the DP operator and the approximation error of the value function estimator. These three approximation errors are zeros when the data set of the plant is sufficient and infinitely complete, and the number of samples in the interested state space is infinite. This means that the cost under the designed controller equals to the optimal cost when the number of iterations is infinite.

NOTE: Another paper on the same issue in the same journal.

Model-based reinforcement learning with a reduced number of basis functions to aproximate the value function, a study of its convergence guarantees, and a nice state of the art on the use of (mdel-based) reinforcement learning for automatic control

Rushikesh Kamalapurkar, Joel A. Rosenfeld, Warren E. Dixon, Efficient model-based reinforcement learning for approximate online optimal control, Automatica, Volume 74, 2016, Pages 247-258, ISSN 0005-1098, DOI: 10.1016/j.automatica.2016.08.004.

An infinite horizon optimal regulation problem is solved online for a deterministic control-affine nonlinear dynamical system using a state following (StaF) kernel method to approximate the value function. Unlike traditional methods that aim to approximate a function over a large compact set, the StaF kernel method aims to approximate a function in a small neighborhood of a state that travels within a compact set. Simulation results demonstrate that stability and approximate optimality of the control system can be achieved with significantly fewer basis functions than may be required for global approximation methods.

Reinforcement learning in the automatic control area

Yu Jiang; Zhong-Ping Jiang, Global Adaptive Dynamic Programming for Continuous-Time Nonlinear Systems, in Automatic Control, IEEE Transactions on , vol.60, no.11, pp.2917-2929, Nov. 2015, DOI: 10.1109/TAC.2015.2414811.

This paper presents a novel method of global adaptive dynamic programming (ADP) for the adaptive optimal control of nonlinear polynomial systems. The strategy consists of relaxing the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, which is solved via a new policy iteration method. The proposed method distinguishes from previously known nonlinear ADP methods in that the neural network approximation is avoided, giving rise to significant computational improvement. Instead of semiglobally or locally stabilizing, the resultant control policy is globally stabilizing for a general class of nonlinear polynomial systems. Furthermore, in the absence of the a priori knowledge of the system dynamics, an online learning method is devised to implement the proposed policy iteration technique by generalizing the current ADP theory. Finally, three numerical examples are provided to validate the effectiveness of the proposed method.

Nice summary of reinforcement learning in control (Adaptive Dynamic Programming) and the use of Q-learning plus NN approximators for solving a control problem under a game theory framework

Kyriakos G. Vamvoudakis, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica, Volume 61, November 2015, Pages 274-281, ISSN 0005-1098, DOI: 10.1016/j.automatica.2015.08.017.

This work proposes a novel Q-learning algorithm to solve the problem of non-zero sum Nash games of linear time invariant systems with N -players (control inputs) and centralized uncertain/unknown dynamics. We first formulate the Q-function of each player as a parametrization of the state and all other the control inputs or players. An integral reinforcement learning approach is used to develop a model-free structure of N -actors/ N -critics to estimate the parameters of the N -coupled Q-functions online while also guaranteeing closed-loop stability and convergence of the control policies to a Nash equilibrium. A 4th order, simulation example with five players is presented to show the efficacy of the proposed approach.