Applications of reinforcement learning to robots

Improving the generalization of robotic RL by inspiration in the humman motion control system

September 4, 2025 08:28 , Juan-Antonio Fernández-Madrigal

P. Zhang, Z. Hua and J. Ding, A Central Motor System Inspired Pretraining Reinforcement Learning for Robotic Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 9, pp. 6285-6298, Sept. 2025, 10.1109/TSMC.2025.3577698.

Robots typically encounter diverse tasks, bringing a significant challenge for motion control. Pretraining reinforcement learning (PRL) enables robots to adapt quickly to various tasks by exploiting reusable skills. The existing PRL methods often rely on datasets and human expert knowledge, struggle to discover diverse and dynamic skills, and exhibit generalization and adaptability to different types of robots and downstream tasks. This article proposes a novel PRL algorithm based on the central motor system mechanisms, which can discover diverse and dynamic skills without relying on data and expert knowledge, effectively enabling robots to tackle different types of downstream tasks. Inspired by the cerebellum’s role in balance control and skill storage within the central motor system, an intrinsic fused reward is introduced to explore dynamic skills and eliminate dependence on data and expert knowledge during pretraining. Drawing from the basal ganglia’s function in motor programming, a discrete skill encoding method is designed to increase the diversity of discovered skills, improving the performance of complex robots in challenging environments. Furthermore, incorporating the basal ganglia’s role in motor regulation, a skill activity function is proposed to generate skills at varying dynamic levels, thereby improving the adaptability of robots in multiple downstream tasks. The effectiveness of the proposed algorithm has been demonstrated through simulation experiments on four different morphological robots across multiple downstream tasks.

Posted in: Applications of reinforcement learning to robots, Psycho-physiological bases of engineering , Tagged: Reward generation, RL pre-training

Stacking multiple MDPs in an abstraction hierarchy to better solve RL

September 4, 2025 08:21 , Juan-Antonio Fernández-Madrigal

Roberto Cipollone, Marco Favorito, Flavio Maiorana, Giuseppe De Giacomo, Luca Iocchi, Fabio Patrizi, Exploiting robot abstractions in episodic RL via reward shaping and heuristics, Robotics and Autonomous Systems, Volume 193, 2025, 10.1016/j.robot.2025.105116.

One major limitation to the applicability of Reinforcement Learning (RL) to many domains of practical relevance, in particular in robotic applications, is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose novel techniques to automatically define Reward Shaping and Reward Heuristic functions that are based on the solution obtained at a higher level of abstraction and provide rewards to the finer (possibly the concrete) MDP at the lower level, thus inducing an exploration heuristic that can effectively guide the learning process in the more complex domain. In contrast with other works in Hierarchical RL, our technique imposes fewer requirements on the design of the abstract models and is tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain, we prove that the method guarantees optimal convergence, and finally demonstrate its effectiveness experimentally in several complex robotic domains.

Posted in: Applications of reinforcement learning to robots , Tagged: Hierarchical MDPs, Hierarchical reinforcement learning

Improving the adaptation of RL to robots with different parameters through Fuzzy

May 2, 2025 08:42 , Juan-Antonio Fernández-Madrigal

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

This paper presents a novel approach to improving the generalization capabilities of reinforcement learning (RL) agents for robotic systems with varying physical parameters. We propose the Fuzzy Ensemble of RL policies (FERL), which enhances performance in environments where system parameters differ from those encountered during training. The FERL method selectively fuses aligned policies, determining their collective decision based on fuzzy memberships tailored to the current parameters of the system. Unlike traditional centralized training approaches that rely on shared experiences for policy updates, FERL allows for independent agent training, facilitating efficient parallelization. The effectiveness of FERL is demonstrated through extensive experiments, including a real-world trajectory tracking application in a quadrotor slung-load system. Our method improves the success rates by up to 15.6% across various simulated systems with variable parameters compared to the existing benchmarks of domain randomization and robust adaptive ensemble adversary RL. In the real-world experiments, our method achieves a 30% reduction in 3D position RMSE compared to individual RL policies. The results underscores FERL robustness and applicability to real robotic systems.

Posted in: Applications of reinforcement learning to robots , Tagged: Deep reinforcement learning, Fuzzy logic, Simulation-to-real problem

On the explainability of Deep RL and its improvement through the integration of human preferences

April 3, 2025 06:38 , Juan-Antonio Fernández-Madrigal

Georgios Angelopoulos, Luigi Mangiacapra, Alessandra Rossi, Claudia Di Napoli, Silvia Rossi, What is behind the curtain? Increasing transparency in reinforcement learning with human preferences and explanations, Engineering Applications of Artificial Intelligence, Volume 149, 2025, 10.1016/j.engappai.2025.110520.

In this work, we investigate whether the transparency of a robot’s behaviour is improved when human preferences on the actions the robot performs are taken into account during the learning process. For this purpose, a shielding mechanism called Preference Shielding is proposed and included in a reinforcement learning algorithm to account for human preferences. We also use the shielding to decide when to provide explanations of the robot’s actions. We carried out a within-subjects study involving 26 participants to evaluate the robot’s transparency. Results indicate that considering human preferences during learning improves legibility compared with providing only explanations. In addition, combining human preferences and explanations further amplifies transparency. Results also confirm that increased transparency leads to an increase in people’s perception of the robot’s safety, comfort, and reliability. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications when a robot has to learn a task in the presence of or in collaboration with a human.

Posted in: Applications of reinforcement learning to robots , Tagged: Deep reinforcement learning, Explainability, Human-robot integration

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

March 20, 2025 15:41 , Juan-Antonio Fernández-Madrigal

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform, making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.

Posted in: Applications of reinforcement learning to robots , Tagged: Simulation-to-real problem

RL for multiple tasks in the case of quadrotors and a short state of the art about the general problem

February 20, 2025 10:24 , Juan-Antonio Fernández-Madrigal

J. Xing, I. Geles, Y. Song, E. Aljalbout and D. Scaramuzza, Multi-Task Reinforcement Learning for Quadrotors, IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2112-2119, March 2025, DOI: 10.1109/LRA.2024.3520894.

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance. Video is available at https://youtu.be/HfK9UT1OVnY.

Posted in: Applications of reinforcement learning to control engineering, Applications of reinforcement learning to robots , Tagged: Quadrotor

Survey on robotics navigation, particularly using RL and other approaches for self-learning that task

December 12, 2024 16:07 , Juan-Antonio Fernández-Madrigal

Suaib Al Mahmud, Abdurrahman Kamarulariffin, Azhar Mohd Ibrahim, Ahmad Jazlan Haja Mohideen, Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self‐Learning Approaches, Journal of Intelligent & Robotic Systems (2024) 110:120, DOI: 10.1007/s10846-024-02149-5.

Mobile robot navigation has been a very popular topic of practice among researchers since a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous algorithms (traditional AI-based, swarm intelligence-based, self-learning-based) have been built and implemented independently, and also in blended manners. Nevertheless, the problem of efficient autonomous robot navigation persists in multiple degrees due to the limitation of these algorithms. The lack of knowledge on the implemented techniques and their shortcomings act as a hindrance to further development on this topic. This is why an extensive study on the previously implemented algorithms, their applicability, their weaknesses as well as
their potential needs to be conducted in order to assess how to improve mobile robot navigation performance. In this review paper, a comprehensive review of mobile robot navigation algorithms has been conducted. The findings suggest that, even though the self-learning algorithms require huge amounts of training data and have the possibility of learning erroneous behavior, they possess huge potential to overcome challenges rarely addressed by the other traditional algorithms. The findings also insinuate that in the domain of machine learning-based algorithms, integration of knowledge representation with a neuro-symbolic approach has the capacity to improve the accuracy and performance of self-robot navigation training by a significant margin.

Posted in: Applications of reinforcement learning to robots, Robot motion planning , Tagged: Navigation

A particular action space for human-manipulator physical interaction learning through RL

November 21, 2024 13:10 , Juan-Antonio Fernández-Madrigal

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at this https URL.

Posted in: Applications of reinforcement learning to robots , Tagged: Human-Machine Interaction, Physical interaction

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

November 14, 2024 16:31 , Juan-Antonio Fernández-Madrigal

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Hierarchical Reinforcement Learning (HRL) has shown superior performance for mapless navigation tasks. However, it remains limited in unstructured environments that might contain terrains like long corridors and dead corners, which can lead to local minima. This is because most HRL-based mapless navigation methods employ a simplified reward setting and exploration strategy. In this work, we propose a novel reward function for training the high-level (HL) policy, which contains two components: extrinsic reward and intrinsic reward. The extrinsic reward encourages the robot to move towards the target location, while the intrinsic reward is computed based on novelty, episode memory and memory decaying, making the agent capable of accomplishing spontaneous exploration. We also design a novel neural network structure that incorporates an LSTM network to augment the agent with memory and reasoning capabilities. We test our method in unknown environments and specific scenarios prone to the local minimum problem to evaluate the navigation performance and local minimum resolution ability. The results show that our method significantly increases the success rate when compared to advanced RL-based methods, achieving a maximum improvement of nearly 28%. Our method demonstrates effective improvement in addressing the local minimum issue, especially in cases where the baselines fail completely. Additionally, numerous ablation studies consistently confirm the effectiveness of our proposed reward function and neural network structure.

Posted in: Applications of reinforcement learning to robots, Robot motion planning, Robot task planning , Tagged: Mapless navigation, Reactive navigation

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

November 14, 2024 16:25 , Juan-Antonio Fernández-Madrigal

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Humans excel at performing a wide range of sophisticated tasks by leveraging skills acquired from prior experiences. This characteristic is especially essential in robotics empowered by deep reinforcement learning, as learning every skill from scratch is time-consuming and may not always be feasible. With the prior skills incorporated, skill composition aims to accelerate the learning process on new robotic tasks. Previous works have given insight into combining pre-trained task-agnostic skills, whereas skills are transformed into fixed order representation, resulting in poor capturing of potential complex skill relations. In this paper, we novelly propose a Graph-based framework for Skill Composition (GSC). To learn rich structural information, a carefully designed skill graph is constructed, where skill representations are taken as nodes and skill relations are utilized as edges. Furthermore, to allow it trained efficiently on large-scale skill set, a transformer-style graph updating method is employed to achieve comprehensive information aggregation. Our simulation experiments indicate that GSC outperforms the state-of-the-art methods on various challenging tasks. Additionally, we successfully apply the technique to the navigation task on a real quadruped robot. The project homepage can be found at Graph Skill Composition.

Posted in: Applications of reinforcement learning to robots , Tagged: Skill learning

1 2 3 … 9 Next »

Category Archives: Applications Of Reinforcement Learning To Robots

Improving the generalization of robotic RL by inspiration in the humman motion control system

P. Zhang, Z. Hua and J. Ding, A Central Motor System Inspired Pretraining Reinforcement Learning for Robotic Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 9, pp. 6285-6298, Sept. 2025, 10.1109/TSMC.2025.3577698.

Stacking multiple MDPs in an abstraction hierarchy to better solve RL

Roberto Cipollone, Marco Favorito, Flavio Maiorana, Giuseppe De Giacomo, Luca Iocchi, Fabio Patrizi, Exploiting robot abstractions in episodic RL via reward shaping and heuristics, Robotics and Autonomous Systems, Volume 193, 2025, 10.1016/j.robot.2025.105116.

Improving the adaptation of RL to robots with different parameters through Fuzzy

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

On the explainability of Deep RL and its improvement through the integration of human preferences

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

RL for multiple tasks in the case of quadrotors and a short state of the art about the general problem

J. Xing, I. Geles, Y. Song, E. Aljalbout and D. Scaramuzza, Multi-Task Reinforcement Learning for Quadrotors, IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2112-2119, March 2025, DOI: 10.1109/LRA.2024.3520894.

Survey on robotics navigation, particularly using RL and other approaches for self-learning that task

A particular action space for human-manipulator physical interaction learning through RL

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Post Navigation

Fields, areas and lines of research

Archives

Category Archives: Applications Of Reinforcement Learning To Robots

P. Zhang, Z. Hua and J. Ding, A Central Motor System Inspired Pretraining Reinforcement Learning for Robotic Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 9, pp. 6285-6298, Sept. 2025, 10.1109/TSMC.2025.3577698.

Roberto Cipollone, Marco Favorito, Flavio Maiorana, Giuseppe De Giacomo, Luca Iocchi, Fabio Patrizi, Exploiting robot abstractions in episodic RL via reward shaping and heuristics, Robotics and Autonomous Systems, Volume 193, 2025, 10.1016/j.robot.2025.105116.

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

J. Xing, I. Geles, Y. Song, E. Aljalbout and D. Scaramuzza, Multi-Task Reinforcement Learning for Quadrotors, IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2112-2119, March 2025, DOI: 10.1109/LRA.2024.3520894.

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Post Navigation

Fields, areas and lines of research

Transversal topics, methods and tools

Archives