Category Archives: Robotics

Improving the adaptation of RL to robots with different parameters through Fuzzy

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

This paper presents a novel approach to improving the generalization capabilities of reinforcement learning (RL) agents for robotic systems with varying physical parameters. We propose the Fuzzy Ensemble of RL policies (FERL), which enhances performance in environments where system parameters differ from those encountered during training. The FERL method selectively fuses aligned policies, determining their collective decision based on fuzzy memberships tailored to the current parameters of the system. Unlike traditional centralized training approaches that rely on shared experiences for policy updates, FERL allows for independent agent training, facilitating efficient parallelization. The effectiveness of FERL is demonstrated through extensive experiments, including a real-world trajectory tracking application in a quadrotor slung-load system. Our method improves the success rates by up to 15.6% across various simulated systems with variable parameters compared to the existing benchmarks of domain randomization and robust adaptive ensemble adversary RL. In the real-world experiments, our method achieves a 30% reduction in 3D position RMSE compared to individual RL policies. The results underscores FERL robustness and applicability to real robotic systems.

Using Deep RL to model transitions and observations in EKF localization

Islem Kobbi, Abdelhak Benamirouche, Mohamed Tadjine, Enhancing pose estimation for mobile robots: A comparative analysis of deep reinforcement learning algorithms for adaptive Extended Kalman Filter-based estimation, Engineering Applications of Artificial Intelligence, Volume 150, 2025 10.1016/j.engappai.2025.110548.

The Extended Kalman Filter (EKF) is a widely used algorithm for state estimation in control systems. However, its lack of adaptability limits its performance in dynamic and uncertain environments. To address this limitation, we used an approach that leverages Deep Reinforcement Learning (DRL) to achieve adaptive state estimation in the EKF. By integrating DRL techniques, we enable the state estimator to autonomously learn and update the values of the system dynamics and measurement noise covariance matrices, Q and R, based on observed data, which encode environmental changes or system failures. In this research, we compare the performance of four DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor-Critic (SAC), and Proximal Policy Optimization (PPO), in optimizing the EKF’s adaptability. The experiments are conducted in both simulated and real-world settings using the Gazebo simulation environment and the Robot Operating System (ROS). The results demonstrate that the DRL-based adaptive state estimator outperforms traditional methods in terms of estimation accuracy and robustness. The comparative analysis provides insights into the strengths and limitations of different DRL agents, showing that the TD3 and the DDPG are the most effective algorithms, with TD3 achieving superior performance, resulting in a 91% improvement over the classic EKF, due to its delayed update mechanism that reduces training noise. This research highlights the potential of DRL to advance state estimation algorithms, offering valuable insights for future work in adaptive estimation techniques.

On the explainability of Deep RL and its improvement through the integration of human preferences

Georgios Angelopoulos, Luigi Mangiacapra, Alessandra Rossi, Claudia Di Napoli, Silvia Rossi, What is behind the curtain? Increasing transparency in reinforcement learning with human preferences and explanations, Engineering Applications of Artificial Intelligence, Volume 149, 2025, 10.1016/j.engappai.2025.110520.

In this work, we investigate whether the transparency of a robot’s behaviour is improved when human preferences on the actions the robot performs are taken into account during the learning process. For this purpose, a shielding mechanism called Preference Shielding is proposed and included in a reinforcement learning algorithm to account for human preferences. We also use the shielding to decide when to provide explanations of the robot’s actions. We carried out a within-subjects study involving 26 participants to evaluate the robot’s transparency. Results indicate that considering human preferences during learning improves legibility compared with providing only explanations. In addition, combining human preferences and explanations further amplifies transparency. Results also confirm that increased transparency leads to an increase in people’s perception of the robot’s safety, comfort, and reliability. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications when a robot has to learn a task in the presence of or in collaboration with a human.

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform, making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.

Planning tasks under uncertainty that have a maximum time to be finished

Michal Staniaszek, Lara Brudermüller, Yang You, Raunak Bhattacharyya, Bruno Lacerda, Nick Hawes, Time-bounded planning with uncertain task duration distributions, Robotics and Autonomous Systems, Volume 186, 2025, DOI: 10.1016/j.robot.2025.104926.

We consider planning problems where a robot must gather reward by completing tasks at each of a large set of locations while constrained by a time bound. Our focus is problems where the context under which each task will be executed can be predicted, but is not known in advance. Here, the term context refers to the conditions under which the task is executed, and can be related to the robot’s internal state (e.g., how well it is localised?), or the environment itself (e.g., how dirty is the floor the robot must clean?). This context has an impact on the time required to execute the task, which we model probabilistically. We model the problem of time-bounded planning for tasks executed under uncertain contexts as a Markov decision process with discrete time in the state, and propose variants on this model which allow adaptation to different robotics domains. Due to the intractability of the general model, we propose simplifications to allow planning in large domains. The key idea behind these simplifications is constraining navigation using a solution to the travelling salesperson problem. We evaluate our models on maps generated from real-world environments and consider two domains with different characteristics: UV disinfection, and cleaning. We evaluate the effect of model variants and simplifications on performance, and show that policies obtained for our models outperform a rule-based baseline, as well as a model which does not consider context. We also evaluate our models in a real robot experiment where a quadruped performs simulated inspection tasks in an industrial environment.

RL for multiple tasks in the case of quadrotors and a short state of the art about the general problem

J. Xing, I. Geles, Y. Song, E. Aljalbout and D. Scaramuzza, Multi-Task Reinforcement Learning for Quadrotors, IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2112-2119, March 2025, DOI: 10.1109/LRA.2024.3520894.

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance. Video is available at https://youtu.be/HfK9UT1OVnY.

A novel safety-critical robotic architecture

Manuel Schrick, Johannes Hinckeldeyn, Marko Thiel, Jochen Kreutzfeldt, A microservice based control architecture for mobile robots in safety-critical applications, Robotics and Autonomous Systems, Volume 183, 2025, DOI: 10.1016/j.robot.2024.104795.

Mobile robots have become more and more common in public space. This increases the importance of meeting safety requirements of autonomous robots. Simple mechanisms, such as emergency braking, alone do not suffice in these highly dynamic situations. Moreover, actual robotic control approaches in literature and practice do not take safety particularly into account. A more sophisticated situational approach for assessment and planning is needed as part of the high-level process control. This paper presents the concept of a safety-critical Robot Control Architecture for mobile robots based on microservices and a Hierarchical Finite State Machine. It expands already existing architectures by drastically reducing the amount of centralized logic and thus increasing the overall system’s level of concurrency, interruptibility and fail-safety. Furthermore, it introduces new potential for code reuse that allows for straightforward implementation of safety mechanisms such as internal diagnostics systems. In doing so, this concept presents the template of a new type of state machine implementation. It is demonstrated with the application of a delivery robot, which was implemented and operated in real public during a broader research project.

Survey on robotics navigation, particularly using RL and other approaches for self-learning that task

Suaib Al Mahmud, Abdurrahman Kamarulariffin, Azhar Mohd Ibrahim, Ahmad Jazlan Haja Mohideen, Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self‐Learning Approaches, Journal of Intelligent & Robotic Systems (2024) 110:120, DOI: 10.1007/s10846-024-02149-5.

Mobile robot navigation has been a very popular topic of practice among researchers since a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous algorithms (traditional AI-based, swarm intelligence-based, self-learning-based) have been built and implemented independently, and also in blended manners. Nevertheless, the problem of efficient autonomous robot navigation persists in multiple degrees due to the limitation of these algorithms. The lack of knowledge on the implemented techniques and their shortcomings act as a hindrance to further development on this topic. This is why an extensive study on the previously implemented algorithms, their applicability, their weaknesses as well as
their potential needs to be conducted in order to assess how to improve mobile robot navigation performance. In this review paper, a comprehensive review of mobile robot navigation algorithms has been conducted. The findings suggest that, even though the self-learning algorithms require huge amounts of training data and have the possibility of learning erroneous behavior, they possess huge potential to overcome challenges rarely addressed by the other traditional algorithms. The findings also insinuate that in the domain of machine learning-based algorithms, integration of knowledge representation with a neuro-symbolic approach has the capacity to improve the accuracy and performance of self-robot navigation training by a significant margin.

A particular action space for human-manipulator physical interaction learning through RL

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at this https URL.

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Hierarchical Reinforcement Learning (HRL) has shown superior performance for mapless navigation tasks. However, it remains limited in unstructured environments that might contain terrains like long corridors and dead corners, which can lead to local minima. This is because most HRL-based mapless navigation methods employ a simplified reward setting and exploration strategy. In this work, we propose a novel reward function for training the high-level (HL) policy, which contains two components: extrinsic reward and intrinsic reward. The extrinsic reward encourages the robot to move towards the target location, while the intrinsic reward is computed based on novelty, episode memory and memory decaying, making the agent capable of accomplishing spontaneous exploration. We also design a novel neural network structure that incorporates an LSTM network to augment the agent with memory and reasoning capabilities. We test our method in unknown environments and specific scenarios prone to the local minimum problem to evaluate the navigation performance and local minimum resolution ability. The results show that our method significantly increases the success rate when compared to advanced RL-based methods, achieving a maximum improvement of nearly 28%. Our method demonstrates effective improvement in addressing the local minimum issue, especially in cases where the baselines fail completely. Additionally, numerous ablation studies consistently confirm the effectiveness of our proposed reward function and neural network structure.