Improvements in offline RL (from previously acquired datasets)

October 9, 2025 08:51 , Juan-Antonio Fernández-Madrigal

Lan Wu, Quan Liu, Renyang You, State slow feature softmax Q-value regularization for offline reinforcement learning, Engineering Applications of Artificial Intelligence, Volume 160, Part A, 2025, 10.1016/j.engappai.2025.111828.

Offline reinforcement learning is constrained by its reliance on pre-collected datasets, without the opportunity for further interaction with the environment. This restriction often results in distribution shifts, which can exacerbate Q-value overestimation and degrade policy performance. To address these issues, we propose a method called state slow feature softmax Q-value regularization (SQR), which enhances the stability and accuracy of Q-value estimation in offline settings. SQR employs slow feature representation learning to extract dynamic information from state trajectories, promoting the stability and robustness of the state representations. Additionally, a softmax operator is incorporated into the Q-value update process to smooth Q-value estimation, reducing overestimation and improving policy optimization. Finally, we apply our approach to locomotion and navigation tasks and establish a comprehensive experimental analysis framework. Empirical results demonstrate that SQR outperforms state-of-the-art offline RL baselines, achieving performance improvements ranging from 2.5% to 44.6% on locomotion tasks and 2.0% to 71.1% on navigation tasks. Moreover, it achieves the highest score on 7 out of 15 locomotion datasets and 4 out of 6 navigation datasets. Detailed experimental results confirm the stabilizing effect of slow feature learning and the effectiveness of the softmax regularization in mitigating Q-value overestimation, demonstrating the superiority of SQR in addressing key challenges in offline reinforcement learning.

Posted in: Reinforcement learning in AI , Tagged: Offline RL

Comments are closed.

Post Navigation

← Previous Post