Monthly Archives: November 2025

You are browsing the site archives by month.

Analysis of using RL as a PID tuning method

Ufuk Demircioğlu, Halit Bakır, Reinforcement learning–driven proportional–integral–derivative controller tuning for mass–spring systems: Stability, performance, and hyperparameter analysis, Engineering Applications of Artificial Intelligence, Volume 162, Part D, 2025, 10.1016/j.engappai.2025.112692.

Artificial intelligence (AI) methods—particularly reinforcement learning (RL)—are used to tune Proportional–Integral–Derivative (PID) controller parameters for a mass–spring–damper system. Learning is performed with the Twin Delayed Deep Deterministic Policy Gradient (TD3) actor–critic algorithm, implemented in MATLAB (Matrix Laboratory) and Simulink (a simulation environment by MathWorks). The objective is to examine the effect of critical RL hyperparameters—including experience buffer size, mini-batch size, and target policy smoothing noise—on the quality of learned PID gains and control performance. The proposed method eliminates the need for manual gain tuning by enabling the RL agent to autonomously learn optimal control strategies through continuous interaction with the Simulink-modeled mass–spring–damper system, where the agent observes responses and applies control actions to optimize the PID gains. Results show that small buffer sizes and suboptimal batch configurations cause unstable behavior, while buffer sizes of 105 or larger and mini-batch sizes between 64 and 128 yield robust tracking. A target policy smoothing noise of 0.01 produced the best performance, while values between 0.05 and 0.1 also provided stable results. Comparative analysis with the classical Simulink PID tuner indicated that, for this linear system, the conventional tuner achieved slightly better transient performance, particularly in overshoot and settling time. Although the RL-based method showed adaptability and generated valid PID gains, it did not surpass the classical approach in this structured system. These findings highlight the promise of AI- and RL-driven control in uncertain, nonlinear, or variable dynamics, while underscoring the importance of hyperparameter optimization in realizing the potential of RL-based Proportional–Integral–Derivative tuning.

RL with both discrete and continuous actions

Chengcheng Yan, Shujie Chen, Jiawei Xu, Xuejie Wang, Zheng Peng, Hybrid Reinforcement Learning in parameterized action space via fluctuates constraint, Engineering Applications of Artificial Intelligence, Volume 162, Part C, 2025 10.1016/j.engappai.2025.112499.

Parameterized actions in Reinforcement Learning (RL) are composed of discrete-continuous hybrid action parameters, which are widely employed in game scenarios. However, previous works have often concentrated on the network structure of RL algorithms to solve hybrid actions, neglecting the impact of fluctuations in action parameters for agent move trajectory. Due to the coupling between discrete and continuous actions, instability in discrete actions influences the selection of corresponding continuous parameters, resulting in the agent deviating from the optimal move path. In this paper, we propose a parameterized RL approach based on parameter fluctuation restriction (PFR) to address this problem, called CP-DQN. Our method effectively mitigated value fluctuation in action parameters by constraining the action parameter between adjacent time steps. Additionally, we have incorporated a supervision module to optimize the entire training process. To quantify the superiority of our approach in minimizing trajectory deviations for agents, we propose an indicator to measure the influence of parameter fluctuations on performance in hybrid action space. Our method is evaluated in three environments with hybrid action spaces, and the experiments demonstrate the superiority of our method compared to existing approaches.

A variant of RL aimed at reducing bias of conventional Q-learning

Fanghui Huang, Wenqi Han, Xiang Li, Xinyang Deng, Wen Jiang, Reducing the estimation bias and variance in reinforcement learning via Maxmean and Aitken value iteration, Engineering Applications of Artificial Intelligence, Volume 162, Part C, 2025, 10.1016/j.engappai.2025.112502.

The value-based reinforcement leaning methods suffer from overestimation bias, because of the existence of max operator, resulting in suboptimal policies. Meanwhile, variance in value estimation will cause the instability of networks. Many algorithms have been presented to solve the mentioned, but these lack the theoretical analysis about the degree of estimation bias, and the trade-off between the estimation bias and variance. Motivated by the above, in this paper, we propose a novel method based on Maxmean and Aitken value iteration, named MMAVI. The Maxmean operation allows the average of multiple state–action values (Q values) to be used as the estimated target value to mitigate the bias and variance. The Aitken value iteration is used to update Q values and improve the convergence rate. Based on the proposed method, combined with Q-learning and deep Q-network, we design two novel algorithms to adapt to different environments. To understand the effect of MMAVI, we analyze it both theoretically and empirically. In theory, we derive the closed-form expressions of reducing bias and variance, and prove that the convergence rate of our proposed method is faster than the traditional methods with Bellman equation. In addition, the convergence of our algorithms is proved in a tabular setting. Finally, we demonstrate that our proposed algorithms outperform the state-of-the-art algorithms in several environments.