Y. Fu and Y. Gao, Learning Hidden Transition for Nonstationary Environments With Multistep Tree Search, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 10, pp. 7012-7023, Oct. 2025, 10.1109/TSMC.2025.3578730.
Deep reinforcement learning (DRL) algorithms have shown impressive results in various applications, but nonstationary environments, such as varying operating conditions and external disturbances, remain a significant challenge. To address this challenge, we propose the hidden transition inference (HTI) framework for learning nonstationary transitions in multistep tree search. Different from previous methods that focus on single-step transition changes, the HTI framework improves decision-making by inferring multistep environmental variations. Specifically, this framework constructs a probabilistic graphical model for Monte Carlo tree search (MCTS) in latent space and utilizes the variational lower bound of hidden states for policy improvement. Furthermore, this work theoretically proves the convergence of the HTI framework, ensuring its effectiveness in handling nonstationary environments. The proposed framework is integrated with the state-of-the-art MCTS-based algorithm sampled MuZero and evaluated on multiple control tasks with different nonstationary dynamics transitions. Experimental results show that the HTI framework can improve the inference capability of tree search in nonstationary environments, showcasing its potential for addressing the control challenges in nonstationary environments.