Category Archives: Robotics

Adapting a shared teleoperation system to network delays

B. Güleçyüz et al., Enhancing Shared Autonomy in Teleoperation Under Network Delay: Transparency- and Confidence-Aware Arbitration, IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 9654-9661, Oct. 2025, 10.1109/LRA.2025.3596436.

Shared autonomy bridges human expertise with machine intelligence, yet existing approaches often overlook the impact of teleoperation delays. To address this gap, we propose a novel shared autonomy approach that enables robots to gradually learn from teleoperated demonstrations while adapting to network delays. Our method improves intent prediction by accounting for delayed feedback to the human operator and adjusts the arbitration function to balance reduced human confidence due to delay with confidence in learned autonomy. To ensure system stability, which might be compromised by delay and arbitration of human and autonomy control forces, we introduce a three-port extension of the Time-Domain Passivity Approach with Energy Reflection (TDPA-ER). Experimental validation with 12 participants demonstrated improvements in intent prediction accuracy, task performance, and the quality of final learned autonomy, highlighting the potential of our approach to enhance teleoperation and learning quality in remote environments.

Improving the generalization of robotic RL by inspiration in the humman motion control system

P. Zhang, Z. Hua and J. Ding, A Central Motor System Inspired Pretraining Reinforcement Learning for Robotic Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 9, pp. 6285-6298, Sept. 2025, 10.1109/TSMC.2025.3577698.

Robots typically encounter diverse tasks, bringing a significant challenge for motion control. Pretraining reinforcement learning (PRL) enables robots to adapt quickly to various tasks by exploiting reusable skills. The existing PRL methods often rely on datasets and human expert knowledge, struggle to discover diverse and dynamic skills, and exhibit generalization and adaptability to different types of robots and downstream tasks. This article proposes a novel PRL algorithm based on the central motor system mechanisms, which can discover diverse and dynamic skills without relying on data and expert knowledge, effectively enabling robots to tackle different types of downstream tasks. Inspired by the cerebellum’s role in balance control and skill storage within the central motor system, an intrinsic fused reward is introduced to explore dynamic skills and eliminate dependence on data and expert knowledge during pretraining. Drawing from the basal ganglia’s function in motor programming, a discrete skill encoding method is designed to increase the diversity of discovered skills, improving the performance of complex robots in challenging environments. Furthermore, incorporating the basal ganglia’s role in motor regulation, a skill activity function is proposed to generate skills at varying dynamic levels, thereby improving the adaptability of robots in multiple downstream tasks. The effectiveness of the proposed algorithm has been demonstrated through simulation experiments on four different morphological robots across multiple downstream tasks.

Stacking multiple MDPs in an abstraction hierarchy to better solve RL

Roberto Cipollone, Marco Favorito, Flavio Maiorana, Giuseppe De Giacomo, Luca Iocchi, Fabio Patrizi, Exploiting robot abstractions in episodic RL via reward shaping and heuristics, Robotics and Autonomous Systems, Volume 193, 2025, 10.1016/j.robot.2025.105116.

One major limitation to the applicability of Reinforcement Learning (RL) to many domains of practical relevance, in particular in robotic applications, is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose novel techniques to automatically define Reward Shaping and Reward Heuristic functions that are based on the solution obtained at a higher level of abstraction and provide rewards to the finer (possibly the concrete) MDP at the lower level, thus inducing an exploration heuristic that can effectively guide the learning process in the more complex domain. In contrast with other works in Hierarchical RL, our technique imposes fewer requirements on the design of the abstract models and is tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain, we prove that the method guarantees optimal convergence, and finally demonstrate its effectiveness experimentally in several complex robotic domains.

Improving the adaptation of RL to robots with different parameters through Fuzzy

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

This paper presents a novel approach to improving the generalization capabilities of reinforcement learning (RL) agents for robotic systems with varying physical parameters. We propose the Fuzzy Ensemble of RL policies (FERL), which enhances performance in environments where system parameters differ from those encountered during training. The FERL method selectively fuses aligned policies, determining their collective decision based on fuzzy memberships tailored to the current parameters of the system. Unlike traditional centralized training approaches that rely on shared experiences for policy updates, FERL allows for independent agent training, facilitating efficient parallelization. The effectiveness of FERL is demonstrated through extensive experiments, including a real-world trajectory tracking application in a quadrotor slung-load system. Our method improves the success rates by up to 15.6% across various simulated systems with variable parameters compared to the existing benchmarks of domain randomization and robust adaptive ensemble adversary RL. In the real-world experiments, our method achieves a 30% reduction in 3D position RMSE compared to individual RL policies. The results underscores FERL robustness and applicability to real robotic systems.

Using Deep RL to model transitions and observations in EKF localization

Islem Kobbi, Abdelhak Benamirouche, Mohamed Tadjine, Enhancing pose estimation for mobile robots: A comparative analysis of deep reinforcement learning algorithms for adaptive Extended Kalman Filter-based estimation, Engineering Applications of Artificial Intelligence, Volume 150, 2025 10.1016/j.engappai.2025.110548.

The Extended Kalman Filter (EKF) is a widely used algorithm for state estimation in control systems. However, its lack of adaptability limits its performance in dynamic and uncertain environments. To address this limitation, we used an approach that leverages Deep Reinforcement Learning (DRL) to achieve adaptive state estimation in the EKF. By integrating DRL techniques, we enable the state estimator to autonomously learn and update the values of the system dynamics and measurement noise covariance matrices, Q and R, based on observed data, which encode environmental changes or system failures. In this research, we compare the performance of four DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor-Critic (SAC), and Proximal Policy Optimization (PPO), in optimizing the EKF’s adaptability. The experiments are conducted in both simulated and real-world settings using the Gazebo simulation environment and the Robot Operating System (ROS). The results demonstrate that the DRL-based adaptive state estimator outperforms traditional methods in terms of estimation accuracy and robustness. The comparative analysis provides insights into the strengths and limitations of different DRL agents, showing that the TD3 and the DDPG are the most effective algorithms, with TD3 achieving superior performance, resulting in a 91% improvement over the classic EKF, due to its delayed update mechanism that reduces training noise. This research highlights the potential of DRL to advance state estimation algorithms, offering valuable insights for future work in adaptive estimation techniques.

On the explainability of Deep RL and its improvement through the integration of human preferences

Georgios Angelopoulos, Luigi Mangiacapra, Alessandra Rossi, Claudia Di Napoli, Silvia Rossi, What is behind the curtain? Increasing transparency in reinforcement learning with human preferences and explanations, Engineering Applications of Artificial Intelligence, Volume 149, 2025, 10.1016/j.engappai.2025.110520.

In this work, we investigate whether the transparency of a robot’s behaviour is improved when human preferences on the actions the robot performs are taken into account during the learning process. For this purpose, a shielding mechanism called Preference Shielding is proposed and included in a reinforcement learning algorithm to account for human preferences. We also use the shielding to decide when to provide explanations of the robot’s actions. We carried out a within-subjects study involving 26 participants to evaluate the robot’s transparency. Results indicate that considering human preferences during learning improves legibility compared with providing only explanations. In addition, combining human preferences and explanations further amplifies transparency. Results also confirm that increased transparency leads to an increase in people’s perception of the robot’s safety, comfort, and reliability. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications when a robot has to learn a task in the presence of or in collaboration with a human.

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform, making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.

Planning tasks under uncertainty that have a maximum time to be finished

Michal Staniaszek, Lara Brudermüller, Yang You, Raunak Bhattacharyya, Bruno Lacerda, Nick Hawes, Time-bounded planning with uncertain task duration distributions, Robotics and Autonomous Systems, Volume 186, 2025, DOI: 10.1016/j.robot.2025.104926.

We consider planning problems where a robot must gather reward by completing tasks at each of a large set of locations while constrained by a time bound. Our focus is problems where the context under which each task will be executed can be predicted, but is not known in advance. Here, the term context refers to the conditions under which the task is executed, and can be related to the robot’s internal state (e.g., how well it is localised?), or the environment itself (e.g., how dirty is the floor the robot must clean?). This context has an impact on the time required to execute the task, which we model probabilistically. We model the problem of time-bounded planning for tasks executed under uncertain contexts as a Markov decision process with discrete time in the state, and propose variants on this model which allow adaptation to different robotics domains. Due to the intractability of the general model, we propose simplifications to allow planning in large domains. The key idea behind these simplifications is constraining navigation using a solution to the travelling salesperson problem. We evaluate our models on maps generated from real-world environments and consider two domains with different characteristics: UV disinfection, and cleaning. We evaluate the effect of model variants and simplifications on performance, and show that policies obtained for our models outperform a rule-based baseline, as well as a model which does not consider context. We also evaluate our models in a real robot experiment where a quadruped performs simulated inspection tasks in an industrial environment.

RL for multiple tasks in the case of quadrotors and a short state of the art about the general problem

J. Xing, I. Geles, Y. Song, E. Aljalbout and D. Scaramuzza, Multi-Task Reinforcement Learning for Quadrotors, IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2112-2119, March 2025, DOI: 10.1109/LRA.2024.3520894.

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance. Video is available at https://youtu.be/HfK9UT1OVnY.

A novel safety-critical robotic architecture

Manuel Schrick, Johannes Hinckeldeyn, Marko Thiel, Jochen Kreutzfeldt, A microservice based control architecture for mobile robots in safety-critical applications, Robotics and Autonomous Systems, Volume 183, 2025, DOI: 10.1016/j.robot.2024.104795.

Mobile robots have become more and more common in public space. This increases the importance of meeting safety requirements of autonomous robots. Simple mechanisms, such as emergency braking, alone do not suffice in these highly dynamic situations. Moreover, actual robotic control approaches in literature and practice do not take safety particularly into account. A more sophisticated situational approach for assessment and planning is needed as part of the high-level process control. This paper presents the concept of a safety-critical Robot Control Architecture for mobile robots based on microservices and a Hierarchical Finite State Machine. It expands already existing architectures by drastically reducing the amount of centralized logic and thus increasing the overall system’s level of concurrency, interruptibility and fail-safety. Furthermore, it introduces new potential for code reuse that allows for straightforward implementation of safety mechanisms such as internal diagnostics systems. In doing so, this concept presents the template of a new type of state machine implementation. It is demonstrated with the application of a delivery robot, which was implemented and operated in real public during a broader research project.