Tag Archives: Model Predictive Control

Improving sample efficiency under sparse rewards and large continuous action spaces through predictive control in RL

Antonyshyn, L., Givigi, S., Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards, J Intell Robot Syst 110, 100 (2024) DOI: 10.1007/s10846-024-02118-y.

Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.

Integrating the physical model of a Model Predictive Controller into an Actor-Critic RL framework to improve safety and flexibility at the same time

Angel Romero, Yunlong Song, Davide Scaramuzza, Actor-Critic Model Predictive Control, IEEE International Conference on Robotics and Automation, Yokohama, 2024 arXiv:2306.09852 [cs.RO].

An open research question in robotics is how
to combine the benefits of model-free reinforcement learning
(RL)—known for its strong task performance and flexibility in
optimizing general reward formulations—with the robustness
and online replanning capabilities of model predictive control
(MPC). This paper provides an answer by introducing a new
framework called Actor-Critic Model Predictive Control. The
key idea is to embed a differentiable MPC within an actor-
critic RL framework. The proposed approach leverages the
short-term predictive optimization capabilities of MPC with
the exploratory and end-to-end training properties of RL. The
resulting policy effectively manages both short-term decisions
through the MPC-based actor and long-term prediction via
the critic network, unifying the benefits of both model-based
control and end-to-end learning. We validate our method in
both simulation and the real world with a quadcopter platform
across various high-level tasks. We show that the proposed
architecture can achieve real-time control performance, learn
complex behaviors via trial and error, and retain the predictive
properties of the MPC to better handle out of distribution
behaviour.

A MPC-based (non-POMDP) approach to sequential decision planning with partial observability in continuous time and space

Nishimura H, Schwager M., SACBP: Belief space planning for continuous-time dynamical systems via stochastic sequential action control, . The International Journal of Robotics Research. 2021;40(10-11):1167-1195 DOI: 10.1177/02783649211037697.

We propose a novel belief space planning technique for continuous dynamics by viewing the belief system as a hybrid dynamical system with time-driven switching. Our approach is based on the perturbation theory of differential equations and extends sequential action control to stochastic dynamics. The resulting algorithm, which we name SACBP, does not require discretization of spaces or time and synthesizes control signals in near real-time. SACBP is an anytime algorithm that can handle general parametric Bayesian filters under certain assumptions. We demonstrate the effectiveness of our approach in an active sensing scenario and a model-based Bayesian reinforcement learning problem. In these challenging problems, we show that the algorithm significantly outperforms other existing solution techniques including approximate dynamic programming and local trajectory optimization.