Author Archives: Juan-antonio Fernández-madrigal

Learning how to reset the episode in RL

S. -H. Lee and S. -W. Seo, Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning Without Task-Specific Knowledge, IEEE Robotics and Automation Letters, vol. 9, no. 5, pp. 4043-4050, May 2024 DOI: 10.1109/LRA.2024.3375714.

A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent’s learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent’s learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.

Networked differential telerrobot remotely controlled in spite of disturbances and delays

Luca Nanu, Luigi Colangelo, Carlo Novara, Carlos Perez Montenegro, Embedded model control of networked control systems: An experimental robotic application, Mechatronics, Volume 99, 2024 DOI: 10.1016/j.mechatronics.2024.103160.

In Networked Control System (NCS), the absence of physical communication links in the loop leads to relevant issues, such as measurement delays and asynchronous execution of the control commands. In general, these issues may significantly compromise the performance of the NCS, possibly causing unstable behaviours. This paper presents an original approach to the design of a complete digital control unit for a system characterized by a varying sampling time and asynchronous command execution. The approach is based on the Embedded Model Control (EMC) methodology, whose key feature is the estimation of the disturbances, errors and nonlinearities affecting the plant to control and their online cancellation. In this way, measurement delays and execution asynchronicity are treated as errors and rejected up to a given frequency by the EMC unit. The effectiveness of the proposed approach is demonstrated in a real-world case-study, where the NCS consists of a differential-drive mobile robot (the plant) and a control unit, and the two subsystems communicate through the web without physical connection links. After a preliminary verification using a high-fidelity numerical simulator, the designed controller is validated in several experimental tests, carried out on a real-time embedded system incorporated in the robotic platform.

Improving EKF and UKF when diverse precision sensors are used for localization through adaptive covariances

Giseo Park, Optimal vehicle position estimation using adaptive unscented Kalman filter based on sensor fusion, Mechatronics, Volume 99, 2024 DOI: 10.1016/j.mechatronics.2024.103144.

Precise position recognition systems are actively used in various automotive technology fields such as autonomous vehicles, intelligent transportation systems, and vehicle driving safety systems. In line with this demand, this paper proposes a new vehicle position estimation algorithm based on sensor fusion between low-cost standalone global positioning system (GPS) and inertial measurement unit (IMU) sensors. In order to estimate accurate vehicle position information using two complementary sensor types, adaptive unscented Kalman filter (AUKF), an optimal state estimation algorithm, is applied to the vehicle kinematic model. Since this AUKF includes an adaptive covariance matrix whose value changes under GPS outage conditions, it has high estimation robustness even if the accuracy of the GPS measurement signal is low. Through comparison of estimation errors with both extended Kalman filter (EKF) and UKF, which are widely used state estimation algorithms, it can be confirmed how improved the estimation performance of the proposed AUKF algorithm in real-vehicle experiments is. The given test course includes roads of various shapes as well as GPS outage sections, so it is suitable for evaluating vehicle position estimation performance.

POMDPs focused on obtaining policies that can be understood well just through the observation of the robot actions

Miguel Faria, Francisco S. Melo, Ana Paiva, “Guess what I’m doing”: Extending legibility to sequential decision tasks, Artificial Intelligence, Volume 330, 2024 DOI: 10.1016/j.artint.2024.104107.

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoLMDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several scenarios of varying complexity. We also showcase the use of our legible policies as demonstrations in machine teaching scenarios, establishing their superiority in teaching new behaviours against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study, where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.

On the influence of the representations obtained through Deep RL in the learning process

Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White, Investigating the properties of neural network representations in reinforcement learning, Artificial Intelligence, Volume 330, 2024 DOI: 10.1016/j.artint.2024.104100.

In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation—good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. We introduce and measure six representational properties over more than 25,000 agent-task settings. We consider Deep Q-learning agents with different auxiliary losses in a pixel-based navigation environment, with source and transfer tasks corresponding to different goal locations. We develop a method to better understand why some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance. We demonstrate the generality of the methodology by investigating representations learned by a Rainbow agent that successfully transfers across Atari 2600 game modes.

Graph NNs in RL for improving sample efficiency

Feng Zhang, Chengbin Xuan, Hak-Keung Lam, An obstacle avoidance-specific reinforcement learning method based on fuzzy attention mechanism and heterogeneous graph neural networks, Engineering Applications of Artificial Intelligence, Volume 130, 2024 DOI: 10.1016/j.engappai.2023.107764.

Deep reinforcement learning (RL) is an advancing learning tool to handle robotics control problems. However, it typically suffers from sample efficiency and effectiveness. The emergence of Graph Neural Networks (GNNs) enables the integration of the RL and graph representation learning techniques. It realises outstanding training performance and transfer capability by forming controlling scenarios into the corresponding graph domain. Nevertheless, the existing approaches strongly depend on the artificial graph formation processes with intensive bias and cannot propagate messages discriminatively on explicit physical dependence, which leads to restricted flexibility, size transfer capability and suboptimal performance. This paper proposes a fuzzy attention mechanism-based heterogeneous graph neural network (FAM-HGNN) framework for resolving the control problem under the RL context. FAM emphasises the significant connections and weakening of the trivial connections in a fully connected graph, which mitigates the potential negative influence caused by the artificial graph formation process. HGNN obtains a higher level of relational inductive bias by conducting graph propagations on a masked graph. Experimental results show that our FAM-HGNN outperforms the multi-layer perceptron-based and the existing GNN-based RL approaches regarding training performance and size transfer capability. We also conducted an ablation study and sensitivity analysis to validate the efficacy of the proposed method further.

Using RL as a framework to study political issues

Lion Schulz, Rahul Bhui, Political reinforcement learners, Trends in Cognitive Sciences, Volume 28, Issue 3, 2024, Pages 210-222 DOI: 10.1016/j.tics.2023.12.001.

Politics can seem home to the most calculating and yet least rational elements of humanity. How might we systematically characterize this spectrum of political cognition? Here, we propose reinforcement learning (RL) as a unified framework to dissect the political mind. RL describes how agents algorithmically navigate complex and uncertain domains like politics. Through this computational lens, we outline three routes to political differences, stemming from variability in agents\u2019 conceptions of a problem, the cognitive operations applied to solve the problem, or the backdrop of information available from the environment. A computational vantage on maladies of the political mind offers enhanced precision in assessing their causes, consequences, and cures.

Object oriented paradigm to improve transfer learning in RL, i.e., a sort of symbolic abstraction mechanism

Ofir Marom, Benjamin Rosman, Transferable dynamics models for efficient object-oriented reinforcement learning, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.artint.2024.104079.

The Reinforcement Learning (RL) framework offers a general paradigm for constructing autonomous agents that can make effective decisions when solving tasks. An important area of study within the field of RL is transfer learning, where an agent utilizes knowledge gained from solving previous tasks to solve a new task more efficiently. While the notion of transfer learning is conceptually appealing, in practice, not all RL representations are amenable to transfer learning. Moreover, much of the research on transfer learning in RL is purely empirical. Previous research has shown that object-oriented representations are suitable for the purposes of transfer learning with theoretical efficiency guarantees. Such representations leverage the notion of object classes to learn lifted rules that apply to grounded object instantiations. In this paper, we extend previous research on object-oriented representations and introduce two formalisms: the first is based on deictic predicates, and is used to learn a transferable transition dynamics model; the second is based on propositions, and is used to learn a transferable reward dynamics model. In addition, we extend previously introduced efficient learning algorithms for object-oriented representations to our proposed formalisms. Our frameworks are then combined into a single efficient algorithm that learns transferable transition and reward dynamics models across a domain of related tasks. We illustrate our proposed algorithm empirically on an extended version of the Taxi domain, as well as the more difficult Sokoban domain, showing the benefits of our approach with regards to efficient learning and transfer.

A review of state-of-the-art path planning methods applied to autonomous driving

Mohamed Reda, Ahmed Onsy, Amira Y. Haikal, Ali Ghanbari, Path planning algorithms in the autonomous driving system: A comprehensive review, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.robot.2024.104630.

This comprehensive review focuses on the Autonomous Driving System (ADS), which aims to reduce human errors that are the reason for about 95% of car accidents. The ADS consists of six stages: sensors, perception, localization, assessment, path planning, and control. We explain the main state-of-the-art techniques used in each stage, analyzing 275 papers, with 162 specifically on path planning due to its complexity, NP-hard optimization nature, and pivotal role in ADS. This paper categorizes path planning techniques into three primary groups: traditional (graph-based, sampling-based, gradient-based, optimization-based, interpolation curve algorithms), machine and deep learning, and meta-heuristic optimization, detailing their advantages and drawbacks. Findings show that meta-heuristic optimization methods, representing 23% of our study, are preferred for being general problem solvers capable of handling complex problems. In addition, they have faster convergence and reduced risk of local minima. Machine and deep learning techniques, accounting for 25%, are favored for their learning capabilities and fast responses to known scenarios. The trend towards hybrid algorithms (27%) combines various methods, merging each algorithm’s benefits and overcoming the other’s drawbacks. Moreover, adaptive parameter tuning is crucial to enhance efficiency, applicability, and balancing the search capability. This review sheds light on the future of path planning in autonomous driving systems, helping to tackle current challenges and unlock the full capabilities of autonomous vehicles.

Integrating symbolic (common sense) reasoning and probabilistic planning (POMDPs) in robots

Shiqi Zhang, Piyush Khandelwal, Peter Stone, iCORPP: Interleaved commonsense reasoning and probabilistic planning on robots, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.robot.2023.104613.

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present an algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to “understand the world” and planning to “achieve the goal” respectively. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, and dialog management. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies.