Monthly Archives: February 2026

You are browsing the site archives by month.

Using evolutionary computation to find better rewards in the case of partial-observable RL

Zhengwei Zhu, Zhixuan Chen, Chenyang Zhu, Wen Si, Fang Wang, Optimizing potential-based reward automata in partially observable reinforcement learning using genetic local search, Engineering Applications of Artificial Intelligence, Volume 169, 2026, 10.1016/j.engappai.2026.114054.

Partially observable reinforcement learning extends the reinforcement learning framework to environments in which agents have limited visibility of the state space, making it particularly relevant for applications in robotics and autonomous vehicle navigation. However, a primary challenge in partially observable reinforcement learning is defining effective reward functions that can guide the learning process despite partial observability. To address this challenge, this paper introduces a novel approach for constructing potential-based reward automata by employing genetic local search methods. Specifically, our method constructs these automata from compressed representations of exploration trajectories, which succinctly capture critical decision points and essential state transitions while eliminating redundant steps. By optimizing trajectory samples and shortening agent trajectories to their crucial transitions, our technique significantly reduces computational overhead. Formally, we define the learning objective as an optimization problem aimed at maximizing the log-likelihood of future observations while simultaneously minimizing the structural complexity of the learned reward automata. Furthermore, by incorporating value-based strategies to estimate potential values within the reward automata, our approach improves learning efficiency and facilitates the identification of optimal reward structures. We empirically evaluate our proposed method on seven partially observable grid-world benchmarks. Experimental results demonstrate that our method achieves superior performance relative to state-of-the-art reward automata-based techniques, exhibiting both accelerated learning speeds and higher accumulated rewards. Additionally, our genetic local search algorithm consistently outperforms comparative heuristic methods in terms of learning curves and reward accumulation.

Enhancing RRT with a more intelligent sampling of movements

Asmaa Loulou, Mustafa Unel, Hybrid attention-guided RRT*: Learning spatial sampling priors for accelerated path planning, Robotics and Autonomous Systems, Volume 198, 2026, 10.1016/j.robot.2026.105338.

Sampling-based planners such as RRT* are widely used for motion planning in high-dimensional and complex environments. However, their reliance on uniform sampling often leads to slow convergence and inefficiency, especially in scenarios with narrow passages or long-range dependencies. To address this, we propose HAGRRT*, a Hybrid Attention-Guided RRT* algorithm that learns to generate spatially informed sampling priors. Our method introduces a new neural architecture that fuses multi-scale convolutional features with a lightweight cross-attention mechanism, explicitly conditioned on the start and goal positions. These features are decoded via a DPT-inspired module to produce 2D probability maps that guide the sampling process. Additionally, we propose an obstacle-aware loss function that penalizes disconnected and infeasible predictions which further encourages the network to focus on traversable, goal-directed regions. Extensive experiments on both structured (maze) and unstructured (forest) environments show that HAGRRT* achieves significantly faster convergence and improved path quality compared to both classical RRT* and recent deep-learning guided variants. Our method consistently requires fewer iterations and samples and is able to generalize across varying dataset types. On structured scenarios, our method achieves an average reduction of 39.6% in the number of samples and an average of 24.4% reduction in planning time compared to recent deep learning methods. On unstructured forest maps, our method reduces the number of samples by 71.5%, and planning time by 81.7% compared to recent deep learning methods, and improves the success rate from 67% to 93%. These results highlight the robustness, efficiency, and generalization ability of our approach across a wide range of planning environments.

See also: the no so strong influence of time in some cognitive processes, such as speech processing (https://doi.org/10.1016/j.tics.2025.05.017)