{"id":1806,"date":"2024-09-12T14:51:43","date_gmt":"2024-09-12T13:51:43","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1806"},"modified":"2024-09-12T14:51:43","modified_gmt":"2024-09-12T13:51:43","slug":"integrating-the-physical-model-of-a-model-predictive-controller-into-an-actor-critic-rl-framework-to-improve-safety-and-flexibility-at-the-same-time","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1806","title":{"rendered":"Integrating the physical model of a Model Predictive Controller into an Actor-Critic RL framework to improve safety and flexibility at the same time"},"content":{"rendered":"<h4>Angel Romero, Yunlong Song, Davide Scaramuzza, <strong>Actor-Critic Model Predictive Control,<\/strong> IEEE International Conference on Robotics and Automation, Yokohama, 2024 <a href=\"https:\/\/arxiv.org\/abs\/2306.09852\" target=\"_blank\">arXiv:2306.09852 [cs.RO]<\/a>.<\/h4>\n<blockquote><p>An open research question in robotics is how<br \/>\nto combine the benefits of model-free reinforcement learning<br \/>\n(RL)\u2014known for its strong task performance and flexibility in<br \/>\noptimizing general reward formulations\u2014with the robustness<br \/>\nand online replanning capabilities of model predictive control<br \/>\n(MPC). This paper provides an answer by introducing a new<br \/>\nframework called Actor-Critic Model Predictive Control. The<br \/>\nkey idea is to embed a differentiable MPC within an actor-<br \/>\ncritic RL framework. The proposed approach leverages the<br \/>\nshort-term predictive optimization capabilities of MPC with<br \/>\nthe exploratory and end-to-end training properties of RL. The<br \/>\nresulting policy effectively manages both short-term decisions<br \/>\nthrough the MPC-based actor and long-term prediction via<br \/>\nthe critic network, unifying the benefits of both model-based<br \/>\ncontrol and end-to-end learning. We validate our method in<br \/>\nboth simulation and the real world with a quadcopter platform<br \/>\nacross various high-level tasks. We show that the proposed<br \/>\narchitecture can achieve real-time control performance, learn<br \/>\ncomplex behaviors via trial and error, and retain the predictive<br \/>\nproperties of the MPC to better handle out of distribution<br \/>\nbehaviour.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Angel Romero, Yunlong Song, Davide Scaramuzza, Actor-Critic Model Predictive Control, IEEE International Conference on Robotics and Automation, Yokohama, 2024 arXiv:2306.09852 <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1806\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18],"tags":[245,469],"class_list":["post-1806","post","type-post","status-publish","format-standard","hentry","category-applications-of-reinforcement-learning-to-control-engineering","tag-actor-critic","tag-model-predictive-control"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1806"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1806"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1806\/revisions"}],"predecessor-version":[{"id":1807,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1806\/revisions\/1807"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1806"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1806"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1806"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}