{"id":1966,"date":"2025-10-09T08:51:09","date_gmt":"2025-10-09T07:51:09","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1966"},"modified":"2025-10-09T08:51:09","modified_gmt":"2025-10-09T07:51:09","slug":"improvements-in-offline-rl-from-previously-acquired-datasets","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1966","title":{"rendered":"Improvements in offline RL (from previously acquired datasets)"},"content":{"rendered":"\n<h4 class=\"wp-block-heading\">Lan Wu, Quan Liu, Renyang You, <strong>State slow feature softmax Q-value regularization for offline reinforcement learning,<\/strong> Engineering Applications of Artificial Intelligence, Volume 160, Part A, 2025, <a href=\"https:\/\/doi.org\/10.1016\/j.engappai.2025.111828\">10.1016\/j.engappai.2025.111828<\/a>.<\/h4>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Offline reinforcement learning is constrained by its reliance on pre-collected datasets, without the opportunity for further interaction with the environment. This restriction often results in distribution shifts, which can exacerbate Q-value overestimation and degrade policy performance. To address these issues, we propose a method called state slow feature softmax Q-value regularization (SQR), which enhances the stability and accuracy of Q-value estimation in offline settings. SQR employs slow feature representation learning to extract dynamic information from state trajectories, promoting the stability and robustness of the state representations. Additionally, a softmax operator is incorporated into the Q-value update process to smooth Q-value estimation, reducing overestimation and improving policy optimization. Finally, we apply our approach to locomotion and navigation tasks and establish a comprehensive experimental analysis framework. Empirical results demonstrate that SQR outperforms state-of-the-art offline RL baselines, achieving performance improvements ranging from 2.5% to 44.6% on locomotion tasks and 2.0% to 71.1% on navigation tasks. Moreover, it achieves the highest score on 7 out of 15 locomotion datasets and 4 out of 6 navigation datasets. Detailed experimental results confirm the stabilizing effect of slow feature learning and the effectiveness of the softmax regularization in mitigating Q-value overestimation, demonstrating the superiority of SQR in addressing key challenges in offline reinforcement learning.\n<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Lan Wu, Quan Liu, Renyang You, State slow feature softmax Q-value regularization for offline reinforcement learning, Engineering Applications of Artificial <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1966\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[84],"tags":[536],"class_list":["post-1966","post","type-post","status-publish","format-standard","hentry","category-reinforcement-learning-in-ai","tag-offline-rl"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1966"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1966"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1966\/revisions"}],"predecessor-version":[{"id":1967,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1966\/revisions\/1967"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}