Category Archives: Robotics

Deciding when to explore more by a robot using DL

Luperto, M., Ferrara, M.M., Princisgh, M. et al., Estimating map completeness in robot exploration, Auton Robot 50, 6 (2026) 10.1007/s10514-025-10221-8.

We present a novel method that, given a grid map of a partially explored indoor environment, estimates the amount of the explored area in the map and whether it is worth continuing to explore the uncovered part of the environment. Our method is based on the idea that modern deep learning models can successfully solve this task by leveraging visual clues in the map. Thus, we train a deep convolutional neural network on images depicting grid maps from partially explored environments, with annotations derived from the knowledge of the entire map, which is not available when the network is used for inference. We show that our network can be used to define a stopping criterion to successfully terminate the exploration process when this is expected to no longer add relevant details about the environment to the map, saving more than 35% of the total exploration time compared to covering the whole environment area.

Fixing artifacts of occupancy grid maps through DL

Leon Davies, Baihua Li, Mohamad Saada, Simon Sølvsten, Qinggang Meng, Transformation & Translation Occupancy Grid Mapping: 2-dimensional deep learning refined SLAM, Robotics and Autonomous Systems, Volume 200, 2026, 10.1016/j.robot.2026.105405.

SLAM (Simultaneous Localisation and Mapping) is an important component in robotics, providing a map of an environment and enabling localisation and navigation. While 3D LiDAR odometry and mapping systems have advanced in recent years, producing accurate motion estimates and detailed 3D maps, high-quality 2D occupancy grid maps (OGMs) remain challenging to obtain in large, complex indoor environments. OGMs are often degraded by drifts in odometry, sensor artefacts, and partial observability, resulting in maps with fractured walls, double boundaries, and artefacts that limit readability for mapping-centric tasks such as floor plan creation. To address this, we propose Transformation & Translation Occupancy Grid Mapping (TT-OGM), a system-level pipeline that targets map fidelity. TT-OGM leverages 3D scan registration to stabilise 2D map construction via projection and standard occupancy updates, then applies a learned GAN-based refinement module as post-processing to remove artefacts, regularise structure, and complete small missing regions. To enable training at scale, we introduce an offline DRL-based data generation process that produces paired but weakly aligned erroneous/clean OGMs spanning diverse error modes and severities. We demonstrate TT-OGM in real-time on a building-scale dataset collected at Loughborough University and evaluate map fidelity against a registered floor-plan reference using mIoU, masked SSIM, and occupied-boundary F1. We additionally report localisation accuracy on S3Ev2 using translation ATE (RMSE) against Cartographer and SLAM Toolbox (Karto). Our results show that 3D registration improves baseline 2D map quality over standard 2D SLAM outputs, and that GAN refinement further increases structural consistency and boundary accuracy in our pipeline. Additional ablations on synthetic stress tests and qualitative transfer to unseen Radish sequences show that the refinement module consistently improves OGM readability under common noise, moderate drift, and clutter conditions.

Enhancing RRT with a more intelligent sampling of movements

Asmaa Loulou, Mustafa Unel, Hybrid attention-guided RRT*: Learning spatial sampling priors for accelerated path planning, Robotics and Autonomous Systems, Volume 198, 2026, 10.1016/j.robot.2026.105338.

Sampling-based planners such as RRT* are widely used for motion planning in high-dimensional and complex environments. However, their reliance on uniform sampling often leads to slow convergence and inefficiency, especially in scenarios with narrow passages or long-range dependencies. To address this, we propose HAGRRT*, a Hybrid Attention-Guided RRT* algorithm that learns to generate spatially informed sampling priors. Our method introduces a new neural architecture that fuses multi-scale convolutional features with a lightweight cross-attention mechanism, explicitly conditioned on the start and goal positions. These features are decoded via a DPT-inspired module to produce 2D probability maps that guide the sampling process. Additionally, we propose an obstacle-aware loss function that penalizes disconnected and infeasible predictions which further encourages the network to focus on traversable, goal-directed regions. Extensive experiments on both structured (maze) and unstructured (forest) environments show that HAGRRT* achieves significantly faster convergence and improved path quality compared to both classical RRT* and recent deep-learning guided variants. Our method consistently requires fewer iterations and samples and is able to generalize across varying dataset types. On structured scenarios, our method achieves an average reduction of 39.6% in the number of samples and an average of 24.4% reduction in planning time compared to recent deep learning methods. On unstructured forest maps, our method reduces the number of samples by 71.5%, and planning time by 81.7% compared to recent deep learning methods, and improves the success rate from 67% to 93%. These results highlight the robustness, efficiency, and generalization ability of our approach across a wide range of planning environments.

See also: the no so strong influence of time in some cognitive processes, such as speech processing (https://doi.org/10.1016/j.tics.2025.05.017)

Adapting a shared teleoperation system to network delays

B. Güleçyüz et al., Enhancing Shared Autonomy in Teleoperation Under Network Delay: Transparency- and Confidence-Aware Arbitration, IEEE Robotics and Automation Letters, vol. 10, no. 10, pp. 9654-9661, Oct. 2025, 10.1109/LRA.2025.3596436.

Shared autonomy bridges human expertise with machine intelligence, yet existing approaches often overlook the impact of teleoperation delays. To address this gap, we propose a novel shared autonomy approach that enables robots to gradually learn from teleoperated demonstrations while adapting to network delays. Our method improves intent prediction by accounting for delayed feedback to the human operator and adjusts the arbitration function to balance reduced human confidence due to delay with confidence in learned autonomy. To ensure system stability, which might be compromised by delay and arbitration of human and autonomy control forces, we introduce a three-port extension of the Time-Domain Passivity Approach with Energy Reflection (TDPA-ER). Experimental validation with 12 participants demonstrated improvements in intent prediction accuracy, task performance, and the quality of final learned autonomy, highlighting the potential of our approach to enhance teleoperation and learning quality in remote environments.

Improving the generalization of robotic RL by inspiration in the humman motion control system

P. Zhang, Z. Hua and J. Ding, A Central Motor System Inspired Pretraining Reinforcement Learning for Robotic Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 55, no. 9, pp. 6285-6298, Sept. 2025, 10.1109/TSMC.2025.3577698.

Robots typically encounter diverse tasks, bringing a significant challenge for motion control. Pretraining reinforcement learning (PRL) enables robots to adapt quickly to various tasks by exploiting reusable skills. The existing PRL methods often rely on datasets and human expert knowledge, struggle to discover diverse and dynamic skills, and exhibit generalization and adaptability to different types of robots and downstream tasks. This article proposes a novel PRL algorithm based on the central motor system mechanisms, which can discover diverse and dynamic skills without relying on data and expert knowledge, effectively enabling robots to tackle different types of downstream tasks. Inspired by the cerebellum’s role in balance control and skill storage within the central motor system, an intrinsic fused reward is introduced to explore dynamic skills and eliminate dependence on data and expert knowledge during pretraining. Drawing from the basal ganglia’s function in motor programming, a discrete skill encoding method is designed to increase the diversity of discovered skills, improving the performance of complex robots in challenging environments. Furthermore, incorporating the basal ganglia’s role in motor regulation, a skill activity function is proposed to generate skills at varying dynamic levels, thereby improving the adaptability of robots in multiple downstream tasks. The effectiveness of the proposed algorithm has been demonstrated through simulation experiments on four different morphological robots across multiple downstream tasks.

Stacking multiple MDPs in an abstraction hierarchy to better solve RL

Roberto Cipollone, Marco Favorito, Flavio Maiorana, Giuseppe De Giacomo, Luca Iocchi, Fabio Patrizi, Exploiting robot abstractions in episodic RL via reward shaping and heuristics, Robotics and Autonomous Systems, Volume 193, 2025, 10.1016/j.robot.2025.105116.

One major limitation to the applicability of Reinforcement Learning (RL) to many domains of practical relevance, in particular in robotic applications, is the large number of samples required to learn an optimal policy. To address this problem and improve learning efficiency, we consider a linear hierarchy of abstraction layers of the Markov Decision Process (MDP) underlying the target domain. Each layer is an MDP representing a coarser model of the one immediately below in the hierarchy. In this work, we propose novel techniques to automatically define Reward Shaping and Reward Heuristic functions that are based on the solution obtained at a higher level of abstraction and provide rewards to the finer (possibly the concrete) MDP at the lower level, thus inducing an exploration heuristic that can effectively guide the learning process in the more complex domain. In contrast with other works in Hierarchical RL, our technique imposes fewer requirements on the design of the abstract models and is tolerant to modeling errors, thus making the proposed approach practical. We formally analyze the relationship between the abstract models and the exploration heuristic induced in the lower-level domain, we prove that the method guarantees optimal convergence, and finally demonstrate its effectiveness experimentally in several complex robotic domains.

Improving the adaptation of RL to robots with different parameters through Fuzzy

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

This paper presents a novel approach to improving the generalization capabilities of reinforcement learning (RL) agents for robotic systems with varying physical parameters. We propose the Fuzzy Ensemble of RL policies (FERL), which enhances performance in environments where system parameters differ from those encountered during training. The FERL method selectively fuses aligned policies, determining their collective decision based on fuzzy memberships tailored to the current parameters of the system. Unlike traditional centralized training approaches that rely on shared experiences for policy updates, FERL allows for independent agent training, facilitating efficient parallelization. The effectiveness of FERL is demonstrated through extensive experiments, including a real-world trajectory tracking application in a quadrotor slung-load system. Our method improves the success rates by up to 15.6% across various simulated systems with variable parameters compared to the existing benchmarks of domain randomization and robust adaptive ensemble adversary RL. In the real-world experiments, our method achieves a 30% reduction in 3D position RMSE compared to individual RL policies. The results underscores FERL robustness and applicability to real robotic systems.

Using Deep RL to model transitions and observations in EKF localization

Islem Kobbi, Abdelhak Benamirouche, Mohamed Tadjine, Enhancing pose estimation for mobile robots: A comparative analysis of deep reinforcement learning algorithms for adaptive Extended Kalman Filter-based estimation, Engineering Applications of Artificial Intelligence, Volume 150, 2025 10.1016/j.engappai.2025.110548.

The Extended Kalman Filter (EKF) is a widely used algorithm for state estimation in control systems. However, its lack of adaptability limits its performance in dynamic and uncertain environments. To address this limitation, we used an approach that leverages Deep Reinforcement Learning (DRL) to achieve adaptive state estimation in the EKF. By integrating DRL techniques, we enable the state estimator to autonomously learn and update the values of the system dynamics and measurement noise covariance matrices, Q and R, based on observed data, which encode environmental changes or system failures. In this research, we compare the performance of four DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor-Critic (SAC), and Proximal Policy Optimization (PPO), in optimizing the EKF’s adaptability. The experiments are conducted in both simulated and real-world settings using the Gazebo simulation environment and the Robot Operating System (ROS). The results demonstrate that the DRL-based adaptive state estimator outperforms traditional methods in terms of estimation accuracy and robustness. The comparative analysis provides insights into the strengths and limitations of different DRL agents, showing that the TD3 and the DDPG are the most effective algorithms, with TD3 achieving superior performance, resulting in a 91% improvement over the classic EKF, due to its delayed update mechanism that reduces training noise. This research highlights the potential of DRL to advance state estimation algorithms, offering valuable insights for future work in adaptive estimation techniques.

On the explainability of Deep RL and its improvement through the integration of human preferences

Georgios Angelopoulos, Luigi Mangiacapra, Alessandra Rossi, Claudia Di Napoli, Silvia Rossi, What is behind the curtain? Increasing transparency in reinforcement learning with human preferences and explanations, Engineering Applications of Artificial Intelligence, Volume 149, 2025, 10.1016/j.engappai.2025.110520.

In this work, we investigate whether the transparency of a robot’s behaviour is improved when human preferences on the actions the robot performs are taken into account during the learning process. For this purpose, a shielding mechanism called Preference Shielding is proposed and included in a reinforcement learning algorithm to account for human preferences. We also use the shielding to decide when to provide explanations of the robot’s actions. We carried out a within-subjects study involving 26 participants to evaluate the robot’s transparency. Results indicate that considering human preferences during learning improves legibility compared with providing only explanations. In addition, combining human preferences and explanations further amplifies transparency. Results also confirm that increased transparency leads to an increase in people’s perception of the robot’s safety, comfort, and reliability. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications when a robot has to learn a task in the presence of or in collaboration with a human.

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform, making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.