{"id":1982,"date":"2025-11-06T16:29:46","date_gmt":"2025-11-06T15:29:46","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1982"},"modified":"2025-11-06T16:29:46","modified_gmt":"2025-11-06T15:29:46","slug":"analysis-of-using-rl-as-a-pid-tuning-method","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1982","title":{"rendered":"Analysis of using RL as a PID tuning method"},"content":{"rendered":"\n<h4 class=\"wp-block-heading\">Ufuk Demircio\u011flu, Halit Bak\u0131r,  <strong>Reinforcement learning\u2013driven proportional\u2013integral\u2013derivative controller tuning for mass\u2013spring systems: Stability, performance, and hyperparameter analysis,<\/strong> Engineering Applications of Artificial Intelligence, Volume 162, Part D, 2025, <a href=\"https:\/\/doi.org\/10.1016\/j.engappai.2025.112692\">10.1016\/j.engappai.2025.112692<\/a>.<\/h4>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>Artificial intelligence (AI) methods\u2014particularly reinforcement learning (RL)\u2014are used to tune Proportional\u2013Integral\u2013Derivative (PID) controller parameters for a mass\u2013spring\u2013damper system. Learning is performed with the Twin Delayed Deep Deterministic Policy Gradient (TD3) actor\u2013critic algorithm, implemented in MATLAB (Matrix Laboratory) and Simulink (a simulation environment by MathWorks). The objective is to examine the effect of critical RL hyperparameters\u2014including experience buffer size, mini-batch size, and target policy smoothing noise\u2014on the quality of learned PID gains and control performance. The proposed method eliminates the need for manual gain tuning by enabling the RL agent to autonomously learn optimal control strategies through continuous interaction with the Simulink-modeled mass\u2013spring\u2013damper system, where the agent observes responses and applies control actions to optimize the PID gains. Results show that small buffer sizes and suboptimal batch configurations cause unstable behavior, while buffer sizes of 105 or larger and mini-batch sizes between 64 and 128 yield robust tracking. A target policy smoothing noise of 0.01 produced the best performance, while values between 0.05 and 0.1 also provided stable results. Comparative analysis with the classical Simulink PID tuner indicated that, for this linear system, the conventional tuner achieved slightly better transient performance, particularly in overshoot and settling time. Although the RL-based method showed adaptability and generated valid PID gains, it did not surpass the classical approach in this structured system. These findings highlight the promise of AI- and RL-driven control in uncertain, nonlinear, or variable dynamics, while underscoring the importance of hyperparameter optimization in realizing the potential of RL-based Proportional\u2013Integral\u2013Derivative tuning.\n<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Ufuk Demircio\u011flu, Halit Bak\u0131r, Reinforcement learning\u2013driven proportional\u2013integral\u2013derivative controller tuning for mass\u2013spring systems: Stability, performance, and hyperparameter analysis, Engineering Applications of <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1982\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[590],"class_list":["post-1982","post","type-post","status-publish","format-standard","hentry","category-applications-of-reinforcement-learning-to-control-engineering","tag-pid-tuning"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1982"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1982"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1982\/revisions"}],"predecessor-version":[{"id":1983,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1982\/revisions\/1983"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1982"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1982"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1982"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}