Including a safety procedure in RL to avoid physical agent problems while learning

Kim Peter Wabersich, Melanie N. Zeilinger, A predictive safety filter for learning-based control of constrained nonlinear dynamical systems, . Automatica, Volume 129, 2021 DOI: 10.1016/j.automatica.2021.109597.

The transfer of reinforcement learning (RL) techniques into real-world applications is challenged by safety requirements in the presence of physical limitations. Most RL methods, in particular the most popular algorithms, do not support explicit consideration of state and input constraints. In this paper, we address this problem for nonlinear systems with continuous state and input spaces by introducing a predictive safety filter, which is able to turn a constrained dynamical system into an unconstrained safe system and to which any RL algorithm can be applied ‘out-of-the-box’. The predictive safety filter receives the proposed control input and decides, based on the current system state, if it can be safely applied to the real system, or if it has to be modified otherwise. Safety is thereby established by a continuously updated safety policy, which is based on a model predictive control formulation using a data-driven system model and considering state and input dependent uncertainties.

Including attention mechanisms in long-short term memory

Lin, X., Zhong, G., Chen, K. et al, Attention-Augmented Machine Memory, . Cogn Comput 13, 751–760 (2021) DOI: 10.1007/s12559-021-09854-5.

Attention mechanism plays an important role in the perception and cognition of human beings. Among others, many machine learning models have been developed to memorize the sequential data, such as the Long Short-Term Memory (LSTM) network and its extensions. However, due to lack of the attention mechanism, they cannot pay special attention to the important parts of the sequences. In this paper, we present a novel machine learning method called attention-augmented machine memory (AAMM). It seamlessly integrates the attention mechanism into the memory cell of LSTM. As a result, it facilitates the network to focus on valuable information in the sequences and ignore irrelevant information during its learning. We have conducted experiments on two sequence classification tasks for pattern classification and sentiment analysis, respectively. The experimental results demonstrate the advantages of AAMM over LSTM and some other related approaches. Hence, AAMM can be considered as a substitute of LSTM in the sequence learning applications.

Mixing logical planning with NNs for decision making

Zuo, G., Pan, T., Zhang, T. et al., SOAR Improved Artificial Neural Network for Multistep Decision-making Tasks, . Cogn Comput 13, 612–625 (2021) DOI: 10.1007/s12559-020-09716-6.

Recently, artificial neural networks (ANNs) have been applied to various robot-related research areas due to their powerful spatial feature abstraction and temporal information prediction abilities. Decision-making has also played a fundamental role in the research area of robotics. How to improve ANNs with the characteristics of decision-making is a challenging research issue. ANNs are connectionist models, which means they are naturally weak in long-term planning, logical reasoning, and multistep decision-making. Considering that a small refinement of the inner network structures of ANNs will usually lead to exponentially growing data costs, an additional planning module seems necessary for the further improvement of ANNs, especially for small data learning. In this paper, we propose a state operator and result (SOAR) improved ANN (SANN) model, which takes advantage of both the long-term cognitive planning ability of SOAR and the powerful feature detection ability of ANNs. It mimics the cognitive mechanism of the human brain to improve the traditional ANN with an additional logical planning module. In addition, a data fusion module is constructed to combine the probability vector obtained by SOAR planning and the original data feature array. A data fusion module is constructed to convert the information from the logical sequences in SOAR to the probabilistic vector in ANNs. The proposed architecture is validated in two types of robot multistep decision-making experiments for a grasping task: a multiblock simulated experiment and a multicup experiment in a real scenario. The experimental results show the efficiency and high accuracy of our proposed architecture. The integration of SOAR and ANN is a good compromise between logical planning with small data and probabilistic classification with big data. It also has strong potential for more complicated tasks that require robust classification, long-term planning, and fast learning. Some potential applications include recognition of grasping order in multiobject environment and cooperative grasping of multiagents.

Classical task planning at an abstract level for achieving good low level motion planning under uncertainty

Antony Thomas, Fulvio Mastrogiovanni, Marco Baglietto, MPTP: Motion-planning-aware task planning for navigation in belief space, . Robotics and Autonomous Systems, Volume 141, 2021 DOI: 10.1016/j.robot.2021.103786.

We present an integrated Task-Motion Planning (TMP) framework for navigation in large-scale environments. Of late, TMP for manipulation has attracted significant interest resulting in a proliferation of different approaches. In contrast, TMP for navigation has received considerably less attention. Autonomous robots operating in real-world complex scenarios require planning in the discrete (task) space and the continuous (motion) space. In knowledge-intensive domains, on the one hand, a robot has to reason at the highest-level, for example, the objects to procure, the regions to navigate to in order to acquire them; on the other hand, the feasibility of the respective navigation tasks have to be checked at the execution level. This presents a need for motion-planning-aware task planners. In this paper, we discuss a probabilistically complete approach that leverages this task-motion interaction for navigating in large knowledge-intensive domains, returning a plan that is optimal at the task-level. The framework is intended for motion planning under motion and sensing uncertainty, which is formally known as belief space planning. The underlying methodology is validated in simulation, in an office environment and its scalability is tested in the larger Willow Garage world. A reasonable comparison with a work that is closest to our approach is also provided. We also demonstrate the adaptability of our approach by considering a building floor navigation domain. Finally, we also discuss the limitations of our approach and put forward suggestions for improvements and future work.

Physiological bases of navigation

Eva Zita Patai, Hugo J. Spiers, The Versatile Wayfinder: Prefrontal Contributions to Spatial Navigation, . Trends in Cognitive Sciences, Volume 25, Issue 6, 2021, Pages 520-533 DOI: 10.1016/j.tics.2021.02.010.

The prefrontal cortex (PFC) supports decision-making, goal tracking, and planning. Spatial navigation is a behavior that taxes these cognitive processes, yet the role of the PFC in models of navigation has been largely overlooked. In humans, activity in dorsolateral PFC (dlPFC) and ventrolateral PFC (vlPFC) during detours, reveal a role in inhibition and replanning. Dorsal anterior cingulate cortex (dACC) is implicated in planning and spontaneous internally-generated changes of route. Orbitofrontal cortex (OFC) integrates representations of the environment with the value of actions, providing a ‘map’ of possible decisions. In rodents, medial frontal areas interact with hippocampus during spatial decisions and switching between navigation strategies. In reviewing these advances, we provide a framework for how different prefrontal regions may contribute to different stages of navigation.

Generating contrafactual explanations of Deep RL decisions to identify flawed agents

Matthew L. Olson, Roli Khanna, Lawrence Neal, Fuxin Li, Weng-Keen Wong, Counterfactual state explanations for reinforcement learning agents via generative deep learning, . Artificial Intelligence, Volume 295, 2021 DOI: 10.1016/j.artint.2021.103455.

Counterfactual explanations, which deal with “why not?” scenarios, can provide insightful explanations to an AI agent’s behavior [Miller [38]]. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our second user study investigates if counterfactual state explanations can help non-expert participants identify a flawed agent; we compare against a baseline approach based on a nearest neighbor explanation which uses images from the actual game. Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent compared to the nearest neighbor baseline and to having no explanation at all.

POMDPs to combine human semantic sensing with robot sensing

Luke Burks, Nisar Ahmed, Ian Loefgren, Luke Barbier, Jeremy Muesing, Jamison McGinley, Sousheel Vunnam, Collaborative human-autonomy semantic sensing through structured POMDP planning, . Robotics and Autonomous Systems, Volume 140, 2021 DOI: 10.1016/j.robot.2021.103753.

Autonomous unmanned systems and robots must be able to actively leverage all available information sources — including imprecise but readily available semantic observations provided by human collaborators. This work develops and validates a novel active collaborative human–machine sensing solution for robotic information gathering and optimal decision making problems, with an example implementation of a dynamic target search scenario. Our approach uses continuous partially observable Markov decision process (CPOMDP) planning to generate vehicle trajectories that optimally exploit imperfect detection data from onboard sensors, as well as semantic natural language observations that can be specifically requested from human sensors. The key innovations are a method for the inclusion of a human querying/sensing model in a CPOMDP based autonomous decision making process, as well as a scalable hierarchical Gaussian mixture model formulation for efficiently solving CPOMDPs with semantic observations in continuous dynamic state spaces. Unlike previous state-of-the-art approaches this allows planning in large, complex, highly segmented environments. Our solution is demonstrated and validated with a real human–robot team engaged in dynamic indoor target search and capture scenarios on a custom testbed..

Studying magician tricks to understand decision making and how to influence it

Alice Pailhès, Gustav Kuhn, Mind Control Tricks: Magicians’ Forcing and Free Will, . Trends in Cognitive Sciences, Volume 25, Issue 5, 2021, Pages 338-341 DOI: 10.1016/j.tics.2021.02.001.

A new research program has recently emerged that investigates magicians’ mind control tricks, also called forces. This research highlights the psychological processes that underpin decision-making, illustrates the ease by which our decisions can be covertly influenced, and helps answer questions about our sense of free will and agency over choices.

Improving POMDP solving efficiency by eliminating variables in the state structure

Eric A. Hansen, An integrated approach to solving influence diagrams and finite-horizon partially observable decision processes, . Artificial Intelligence, Volume 294, 2021 DOI: 10.1016/j.artint.2020.103431.

We show how to integrate a variable elimination approach to solving influence diagrams with a value iteration approach to solving finite-horizon partially observable Markov decision processes (POMDPs). The integration of these approaches creates a variable elimination algorithm for influence diagrams that has much more relaxed constraints on elimination order, which allows improved scalability in many cases. The new algorithm can also be viewed as a generalization of the value iteration algorithm for POMDPs that solves non-Markovian as well as Markovian problems, in addition to leveraging a factored representation for improved efficiency. The development of a single algorithm that integrates and generalizes both of these classic algorithms, one for influence diagrams and the other for POMDPs, unifies these two approaches to solving Bayesian decision problems in a way that combines their complementary advantages.

Cubature (fixed point representation of uncertainties, as in UKF) Kalman Filter

Juan-Carlos Santos-León, Ramón Orive, Daniel Acosta, Leopoldo Acosta, The Cubature Kalman Filter revisited, . Automatica, Volume 127, 2021 DOI: 10.1016/j.automatica.2021.109541.

In this paper, the construction and effectiveness of the so-called Cubature Kalman Filter (CKF) is revisited, as well as its extensions for higher degrees of precision. In this sense, some stable (with respect to the dimension) cubature rules with a quasi-optimal number of nodes are built, and their numerical performance is checked in comparison with other known formulas. All these cubature rules are suitably placed in the mathematical framework of numerical integration in several variables. A method based on the discretization of higher order partial derivatives by certain divided differences is used to provide stable rules of degrees d=5 and d=7, though it can also be applied for higher dimensions. The application of these old and new formulas to the filter algorithm is tested by means of some examples.