Graph NNs in RL for improving sample efficiency

Feng Zhang, Chengbin Xuan, Hak-Keung Lam, An obstacle avoidance-specific reinforcement learning method based on fuzzy attention mechanism and heterogeneous graph neural networks, Engineering Applications of Artificial Intelligence, Volume 130, 2024 DOI: 10.1016/j.engappai.2023.107764.

Deep reinforcement learning (RL) is an advancing learning tool to handle robotics control problems. However, it typically suffers from sample efficiency and effectiveness. The emergence of Graph Neural Networks (GNNs) enables the integration of the RL and graph representation learning techniques. It realises outstanding training performance and transfer capability by forming controlling scenarios into the corresponding graph domain. Nevertheless, the existing approaches strongly depend on the artificial graph formation processes with intensive bias and cannot propagate messages discriminatively on explicit physical dependence, which leads to restricted flexibility, size transfer capability and suboptimal performance. This paper proposes a fuzzy attention mechanism-based heterogeneous graph neural network (FAM-HGNN) framework for resolving the control problem under the RL context. FAM emphasises the significant connections and weakening of the trivial connections in a fully connected graph, which mitigates the potential negative influence caused by the artificial graph formation process. HGNN obtains a higher level of relational inductive bias by conducting graph propagations on a masked graph. Experimental results show that our FAM-HGNN outperforms the multi-layer perceptron-based and the existing GNN-based RL approaches regarding training performance and size transfer capability. We also conducted an ablation study and sensitivity analysis to validate the efficacy of the proposed method further.

Using RL as a framework to study political issues

Lion Schulz, Rahul Bhui, Political reinforcement learners, Trends in Cognitive Sciences, Volume 28, Issue 3, 2024, Pages 210-222 DOI: 10.1016/j.tics.2023.12.001.

Politics can seem home to the most calculating and yet least rational elements of humanity. How might we systematically characterize this spectrum of political cognition? Here, we propose reinforcement learning (RL) as a unified framework to dissect the political mind. RL describes how agents algorithmically navigate complex and uncertain domains like politics. Through this computational lens, we outline three routes to political differences, stemming from variability in agents\u2019 conceptions of a problem, the cognitive operations applied to solve the problem, or the backdrop of information available from the environment. A computational vantage on maladies of the political mind offers enhanced precision in assessing their causes, consequences, and cures.

Object oriented paradigm to improve transfer learning in RL, i.e., a sort of symbolic abstraction mechanism

Ofir Marom, Benjamin Rosman, Transferable dynamics models for efficient object-oriented reinforcement learning, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.artint.2024.104079.

The Reinforcement Learning (RL) framework offers a general paradigm for constructing autonomous agents that can make effective decisions when solving tasks. An important area of study within the field of RL is transfer learning, where an agent utilizes knowledge gained from solving previous tasks to solve a new task more efficiently. While the notion of transfer learning is conceptually appealing, in practice, not all RL representations are amenable to transfer learning. Moreover, much of the research on transfer learning in RL is purely empirical. Previous research has shown that object-oriented representations are suitable for the purposes of transfer learning with theoretical efficiency guarantees. Such representations leverage the notion of object classes to learn lifted rules that apply to grounded object instantiations. In this paper, we extend previous research on object-oriented representations and introduce two formalisms: the first is based on deictic predicates, and is used to learn a transferable transition dynamics model; the second is based on propositions, and is used to learn a transferable reward dynamics model. In addition, we extend previously introduced efficient learning algorithms for object-oriented representations to our proposed formalisms. Our frameworks are then combined into a single efficient algorithm that learns transferable transition and reward dynamics models across a domain of related tasks. We illustrate our proposed algorithm empirically on an extended version of the Taxi domain, as well as the more difficult Sokoban domain, showing the benefits of our approach with regards to efficient learning and transfer.

A review of state-of-the-art path planning methods applied to autonomous driving

Mohamed Reda, Ahmed Onsy, Amira Y. Haikal, Ali Ghanbari, Path planning algorithms in the autonomous driving system: A comprehensive review, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.robot.2024.104630.

This comprehensive review focuses on the Autonomous Driving System (ADS), which aims to reduce human errors that are the reason for about 95% of car accidents. The ADS consists of six stages: sensors, perception, localization, assessment, path planning, and control. We explain the main state-of-the-art techniques used in each stage, analyzing 275 papers, with 162 specifically on path planning due to its complexity, NP-hard optimization nature, and pivotal role in ADS. This paper categorizes path planning techniques into three primary groups: traditional (graph-based, sampling-based, gradient-based, optimization-based, interpolation curve algorithms), machine and deep learning, and meta-heuristic optimization, detailing their advantages and drawbacks. Findings show that meta-heuristic optimization methods, representing 23% of our study, are preferred for being general problem solvers capable of handling complex problems. In addition, they have faster convergence and reduced risk of local minima. Machine and deep learning techniques, accounting for 25%, are favored for their learning capabilities and fast responses to known scenarios. The trend towards hybrid algorithms (27%) combines various methods, merging each algorithm’s benefits and overcoming the other’s drawbacks. Moreover, adaptive parameter tuning is crucial to enhance efficiency, applicability, and balancing the search capability. This review sheds light on the future of path planning in autonomous driving systems, helping to tackle current challenges and unlock the full capabilities of autonomous vehicles.

Integrating symbolic (common sense) reasoning and probabilistic planning (POMDPs) in robots

Shiqi Zhang, Piyush Khandelwal, Peter Stone, iCORPP: Interleaved commonsense reasoning and probabilistic planning on robots, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.robot.2023.104613.

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present an algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to “understand the world” and planning to “achieve the goal” respectively. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, and dialog management. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies.

Clock synchronization in the CAN bus

M. Akp\u0131nar and K. W. Schmidt, Predictable Timestamping for the Controller Area Network: Evaluation and Effect on Clock Synchronization Accuracy, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 3, pp. 1926-1935, March 2024 DOI: 10.1109/TSMC.2023.3332559.

Accurate timestamps are important for clock synchronization (CS) and cyber-security on the controller area network (CAN). This article proposes a new predictable timestamping (TS) method on CAN. Different from existing TS methods, our method reduces the effect of uncertainties that are caused by the CAN bit timing, oscillator drifts, and different cable lengths. Accordingly, our TS method provides an improved TS quality, which is confirmed in comprehensive hardware experiments. We further show the positive impact of our TS method on CS for CAN with clock accuracies below 100 ns.

Improving sample efficiency of RL through memory reconstruction

Y. Kang et al., Sample Efficient Reinforcement Learning Using Graph-Based Memory Reconstruction, IEEE Transactions on Artificial Intelligence, vol. 5, no. 2, pp. 751-762, Feb. 2024 DOI: 10.1109/TAI.2023.3268612.

Reinforcement learning (RL) algorithms typically require orders of magnitude more interactions than humans to learn effective policies. Research on memory in neuroscience suggests that humans’ learning efficiency benefits from associating their experiences and reconstructing potential events. Inspired by this finding, we introduce a human brainlike memory structure for agents and build a general learning framework based on this structure to improve the RL sampling efficiency. Since this framework is similar to the memory reconstruction process in psychology, we name the newly proposed RL framework as graph-based memory reconstruction (GBMR). In particular, GBMR first maintains an attribute graph on the agent’s memory and then retrieves its critical nodes to build and update potential paths among these nodes. This novel pipeline drives the RL agent to learn faster with its memory-enhanced value functions and reduces interactions with the environment by reconstructing its valuable paths. Extensive experimental analyses and evaluations in the grid maze and some challenging Atari environments demonstrate GBMRs superiority over traditional RL methods. We will release the source code and trained models to facilitate further studies in this research direction.

Correcting systematic and non-systematic errors in odometry

Bibiana Fari�a, Jonay Toledo, Leopoldo Acosta, Improving odometric sensor performance by real-time error processing and variable covariance, Mechatronics, Volume 98, 2024 DOI: 10.1016/j.mechatronics.2023.103123.

This paper presents a new method to increase odometric sensor accuracy by systematic and non-systematic errors processing. Mobile robot localization is improved combining this technique with a filter that fuses the information from several sensors characterized by their covariance. The process focuses on calculating the odometric speed difference with respect to the filter to implement an error type detection module in real time. The correction of systematic errors consists in an online parameter adjustment using the previous information and conditioned by the filter accuracy. This data is also applied to design a variable odometric covariance which describes the sensor reliability and determines the influence of both errors on the robot localization. The method is implemented in a low-cost autonomous wheelchair with a LIDAR, IMU and encoders fused by an UKF algorithm. The experimental results prove that the estimated poses are closer to the real ones than using other well-known previous methods.

Survey on methods for learning from demonstration in robotics

M. Tavassoli, S. Katyara, M. Pozzi, N. Deshpande, D. G. Caldwell and D. Prattichizzo, Learning Skills From Demonstrations: A Trend From Motion Primitives to Experience Abstraction, IEEE Transactions on Cognitive and Developmental Systems, vol. 16, no. 1, pp. 57-74, Feb. 20248 DOI: 10.1109/TCDS.2023.3296166.

The uses of robots are changing from static environments in factories to encompass novel concepts such as human\u2013robot collaboration in unstructured settings. Preprogramming all the functionalities for robots becomes impractical, and hence, robots need to learn how to react to new events autonomously, just like humans. However, humans, unlike machines, are naturally skilled in responding to unexpected circumstances based on either experiences or observations. Hence, embedding such anthropoid behaviors into robots entails the development of neuro-cognitive models that emulate motor skills under a robot learning paradigm. Effective encoding of these skills is bound to the proper choice of tools and techniques. This survey paper studies different motion and behavior learning methods ranging from movement primitives (MPs) to experience abstraction (EA), applied to different robotic tasks. These methods are scrutinized and then experimentally benchmarked by reconstructing a standard pick-n-place task. Apart from providing a standard guideline for the selection of strategies and algorithms, this article aims to draw a perspective on their possible extensions and improvements.

On the complexities of RL when it confronts the real (natural) world

Toby Wise, Kara Emery, Angela Radulescu, Naturalistic reinforcement learning, Trends in Cognitive Sciences, Volume 28, Issue 2, 2024, Pages 144-158 DOI: 10.1016/j.tics.2023.08.016.

Humans possess a remarkable ability to make decisions within real-world environments that are expansive, complex, and multidimensional. Human cognitive computational neuroscience has sought to exploit reinforcement learning (RL) as a framework within which to explain human decision-making, often focusing on constrained, artificial experimental tasks. In this article, we review recent efforts that use naturalistic approaches to determine how humans make decisions in complex environments that better approximate the real world, providing a clearer picture of how humans navigate the challenges posed by real-world decisions. These studies purposely embed elements of naturalistic complexity within experimental paradigms, rather than focusing on simplification, generating insights into the processes that likely underpin humans\u2019 ability to navigate complex, multidimensional real-world environments so successfully.