On the theoretical convergence of Q-learning when the environment is not stationary

Diogo S. Carvalho, Pedro A. Santos, Francisco S. Melo, Reinforcement learning in convergently non-stationary environments: Feudal hierarchies and learned representations, Artificial Intelligence, Volume 347, 2025, 10.1016/j.artint.2025.104382.

We study the convergence of Q-learning-based methods in convergently non-stationary environments, particularly in the context of hierarchical reinforcement learning and of dynamic features encountered in deep reinforcement learning. We demonstrate that Q-learning achieves convergence in tabular representations when applied to convergently non-stationary dynamics, such as the ones arising in a feudal hierarchical setting. Additionally, we establish convergence for Q-learning-based deep reinforcement learning methods with convergently non-stationary features, such as the ones arising in representation-based settings. Our findings offer theoretical support for the application of Q-learning in these complex scenarios and present methodologies for extending established theoretical results from standard cases to their convergently non-stationary counterparts.

Comments are closed.

Post Navigation