Monthly Archives: April 2026

You are browsing the site archives by month.

Selecting the best RL result from offline RL in order to minimize risks

Giorgio Angelotti, Nicolas Drougard, Caroline P․ C. Chanel, An offline risk-aware policy selection method for Bayesian Markov decision processes, Artificial Intelligence, Volume 354, 2026, 10.1016/j.artint.2026.104519.

In Offline Model Learning for Planning and in Offline Reinforcement Learning, the limited data set hinders the estimate of the Value function of the relative Markov Decision Process (MDP). Consequently, the performance of the obtained policy in the real world is bounded and possibly risky, especially when the deployment of a wrong policy can lead to catastrophic consequences. For this reason, several pathways are being followed with the scope of reducing the model error (or the distributional shift between the learned model and the true one) and, more broadly, obtaining risk-aware solutions with respect to model uncertainty. But when it comes to the final application which baseline should a practitioner choose? In an offline context where computational time is not an issue and robustness is the priority we propose Exploitation vs Caution (EvC), a paradigm that (1) elegantly incorporates model uncertainty abiding by the Bayesian formalism, and (2) selects the policy that maximizes a risk-aware objective over the Bayesian posterior between a fixed set of candidate policies provided, for instance, by the current baselines. We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes. In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners that aim to apply offline planning and reinforcement learning solvers in the real world.

Deciding when to explore more by a robot using DL

Luperto, M., Ferrara, M.M., Princisgh, M. et al., Estimating map completeness in robot exploration, Auton Robot 50, 6 (2026) 10.1007/s10514-025-10221-8.

We present a novel method that, given a grid map of a partially explored indoor environment, estimates the amount of the explored area in the map and whether it is worth continuing to explore the uncovered part of the environment. Our method is based on the idea that modern deep learning models can successfully solve this task by leveraging visual clues in the map. Thus, we train a deep convolutional neural network on images depicting grid maps from partially explored environments, with annotations derived from the knowledge of the entire map, which is not available when the network is used for inference. We show that our network can be used to define a stopping criterion to successfully terminate the exploration process when this is expected to no longer add relevant details about the environment to the map, saving more than 35% of the total exploration time compared to covering the whole environment area.

Fixing artifacts of occupancy grid maps through DL

Leon Davies, Baihua Li, Mohamad Saada, Simon Sølvsten, Qinggang Meng, Transformation & Translation Occupancy Grid Mapping: 2-dimensional deep learning refined SLAM, Robotics and Autonomous Systems, Volume 200, 2026, 10.1016/j.robot.2026.105405.

SLAM (Simultaneous Localisation and Mapping) is an important component in robotics, providing a map of an environment and enabling localisation and navigation. While 3D LiDAR odometry and mapping systems have advanced in recent years, producing accurate motion estimates and detailed 3D maps, high-quality 2D occupancy grid maps (OGMs) remain challenging to obtain in large, complex indoor environments. OGMs are often degraded by drifts in odometry, sensor artefacts, and partial observability, resulting in maps with fractured walls, double boundaries, and artefacts that limit readability for mapping-centric tasks such as floor plan creation. To address this, we propose Transformation & Translation Occupancy Grid Mapping (TT-OGM), a system-level pipeline that targets map fidelity. TT-OGM leverages 3D scan registration to stabilise 2D map construction via projection and standard occupancy updates, then applies a learned GAN-based refinement module as post-processing to remove artefacts, regularise structure, and complete small missing regions. To enable training at scale, we introduce an offline DRL-based data generation process that produces paired but weakly aligned erroneous/clean OGMs spanning diverse error modes and severities. We demonstrate TT-OGM in real-time on a building-scale dataset collected at Loughborough University and evaluate map fidelity against a registered floor-plan reference using mIoU, masked SSIM, and occupied-boundary F1. We additionally report localisation accuracy on S3Ev2 using translation ATE (RMSE) against Cartographer and SLAM Toolbox (Karto). Our results show that 3D registration improves baseline 2D map quality over standard 2D SLAM outputs, and that GAN refinement further increases structural consistency and boundary accuracy in our pipeline. Additional ablations on synthetic stress tests and qualitative transfer to unseen Radish sequences show that the refinement module consistently improves OGM readability under common noise, moderate drift, and clutter conditions.