Tag Archives: Modelless Reinforcement Learning

Value iteration applied in control systems when the model of the plant is substituted by data acquired from the plant

Yongqiang Li, Zhongsheng Hou, Yuanjing Feng, Ronghu Chi, Data-driven approximate value iteration with optimality error bound analysis, Automatica, Volume 78, April 2017, Pages 79-87, ISSN 0005-1098, DOI: 10.1016/j.automatica.2016.12.019.

Features of the data-driven approximate value iteration (AVI) algorithm, proposed in Li et al. (2014) for dealing with the optimal stabilization problem, include that only process data is required and that the estimate of the domain of attraction for the closed-loop is enlarged. However, the controller generated by the data-driven AVI algorithm is an approximate solution for the optimal control problem. In this work, a quantitative analysis result on the error bound between the optimal cost and the cost under the designed controller is given. This error bound is determined by the approximation error of the estimation for the optimal cost and the approximation error of the controller function estimator. The first one is concretely determined by the approximation error of the data-driven dynamic programming (DP) operator to the DP operator and the approximation error of the value function estimator. These three approximation errors are zeros when the data set of the plant is sufficient and infinitely complete, and the number of samples in the interested state space is infinite. This means that the cost under the designed controller equals to the optimal cost when the number of iterations is infinite.

NOTE: Another paper on the same issue in the same journal.