{"id":291,"date":"2015-10-19T11:04:52","date_gmt":"2015-10-19T10:04:52","guid":{"rendered":"http:\/\/babel.isa.uma.es\/kipr\/?p=291"},"modified":"2015-10-19T11:04:52","modified_gmt":"2015-10-19T10:04:52","slug":"nice-summary-of-reinforcement-learning-in-control-adaptive-dynamic-programming-and-the-use-of-q-learning-plus-nn-approximators-for-solving-a-control-problem-under-a-game-theory-framework","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=291","title":{"rendered":"Nice summary of reinforcement learning in control (Adaptive Dynamic Programming) and the use of Q-learning plus NN approximators for solving a control problem under a game theory framework"},"content":{"rendered":"<h4>Kyriakos G. Vamvoudakis, <strong>Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems<\/strong>, Automatica, Volume 61, November 2015, Pages 274-281, ISSN 0005-1098, <a href=\"http:\/\/dx.doi.org\/10.1016\/j.automatica.2015.08.017\" target=\"_blank\">DOI: 10.1016\/j.automatica.2015.08.017<\/a>.<\/h4>\n<blockquote><p>This work proposes a novel Q-learning algorithm to solve the problem of non-zero sum Nash games of linear time invariant systems with N -players (control inputs) and centralized uncertain\/unknown dynamics. We first formulate the Q-function of each player as a parametrization of the state and all other the control inputs or players. An integral reinforcement learning approach is used to develop a model-free structure of N -actors\/ N -critics to estimate the parameters of the N -coupled Q-functions online while also guaranteeing closed-loop stability and convergence of the control policies to a Nash equilibrium. A 4th order, simulation example with five players is presented to show the efficacy of the proposed approach.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Kyriakos G. Vamvoudakis, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica, Volume 61, November 2015, Pages 274-281, <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=291\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[139,138,51,13,15],"class_list":["post-291","post","type-post","status-publish","format-standard","hentry","category-applications-of-reinforcement-learning-to-control-engineering","tag-adaptive-dynamic-programming","tag-game-theory","tag-neural-networks","tag-q-learning","tag-reinforcement-learning"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/291"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=291"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/291\/revisions"}],"predecessor-version":[{"id":292,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/291\/revisions\/292"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}