{"id":1370,"date":"2023-07-11T09:34:48","date_gmt":"2023-07-11T08:34:48","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1370"},"modified":"2023-07-11T09:34:48","modified_gmt":"2023-07-11T08:34:48","slug":"state-of-the-art-of-the-convergence-of-monte-carlo-exploring-starts-rl-policy-iteration-kind-method","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1370","title":{"rendered":"State of the art of the convergence of Monte Carlo Exploring Starts RL, policy iteration kind, method"},"content":{"rendered":"<h4>Jun Liu, <strong>On the convergence of reinforcement learning with Monte Carlo Exploring Starts,<\/strong> . Automatica, Volume 129, 2021 <a href=\"https:\/\/doi.org\/10.1016\/j.automatica.2021.109693\" target=\"_blank\">DOI: 10.1016\/j.automatica.2021.109693<\/a>.<\/h4>\n<blockquote><p>A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring Starts (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby help further settle the open problem.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Jun Liu, On the convergence of reinforcement learning with Monte Carlo Exploring Starts, . Automatica, Volume 129, 2021 DOI: 10.1016\/j.automatica.2021.109693. <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1370\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[465,15],"class_list":["post-1370","post","type-post","status-publish","format-standard","hentry","category-reinforcement-learning-in-ai","tag-monte-carlo-exploring-starts-rl","tag-reinforcement-learning"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1370"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1370"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1370\/revisions"}],"predecessor-version":[{"id":1371,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1370\/revisions\/1371"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1370"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1370"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1370"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}