{"id":1767,"date":"2024-07-18T12:42:39","date_gmt":"2024-07-18T11:42:39","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1767"},"modified":"2024-07-18T12:42:39","modified_gmt":"2024-07-18T11:42:39","slug":"rl-in-periodic-scenarios","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1767","title":{"rendered":"RL in periodic scenarios"},"content":{"rendered":"<h4>A. Aniket and A. Chattopadhyay, <strong>Online Reinforcement Learning in Periodic MDP,<\/strong> IEEE Transactions on Artificial Intelligence, vol. 5, no. 7, pp. 3624-3637, July 2024 <a href=\"https:\/\/10.1109\/TAI.2024.3375258\" target=\"_blank\">DOI: 10.1109\/TAI.2024.3375258<\/a>.<\/h4>\n<blockquote><p>We study learning in periodic Markov decision process (MDP), a special type of nonstationary MDP where both the state transition probabilities and reward functions vary periodically, under the average reward maximization setting. We formulate the problem as a stationary MDP by augmenting the state space with the period index and propose a periodic upper confidence bound reinforcement learning-2 (PUCRL2) algorithm. We show that the regret of PUCRL2 varies linearly with the period N and as O(TlogT\u2212\u2212\u2212\u2212\u2212\u221a) with the horizon length T . Utilizing the information about the sparsity of transition matrix of augmented MDP, we propose another algorithm [periodic upper confidence reinforcement learning with Bernstein bounds (PUCRLB) which enhances upon PUCRL2, both in terms of regret ( O(N\u2212\u2212\u221a) dependency on period] and empirical performance. Finally, we propose two other algorithms U-PUCRL2 and U-PUCRLB for extended uncertainty in the environment in which the period is unknown but a set of candidate periods are known. Numerical results demonstrate the efficacy of all the algorithms.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>A. Aniket and A. Chattopadhyay, Online Reinforcement Learning in Periodic MDP, IEEE Transactions on Artificial Intelligence, vol. 5, no. 7, <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1767\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[556],"class_list":["post-1767","post","type-post","status-publish","format-standard","hentry","category-reinforcement-learning-in-ai","tag-periodic-rl"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1767"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1767"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1767\/revisions"}],"predecessor-version":[{"id":1768,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1767\/revisions\/1768"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1767"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1767"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1767"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}