Category Archives: Cognitive Sciences

Mixing logical planning with NNs for decision making

Zuo, G., Pan, T., Zhang, T. et al., SOAR Improved Artificial Neural Network for Multistep Decision-making Tasks, . Cogn Comput 13, 612–625 (2021) DOI: 10.1007/s12559-020-09716-6.

Recently, artificial neural networks (ANNs) have been applied to various robot-related research areas due to their powerful spatial feature abstraction and temporal information prediction abilities. Decision-making has also played a fundamental role in the research area of robotics. How to improve ANNs with the characteristics of decision-making is a challenging research issue. ANNs are connectionist models, which means they are naturally weak in long-term planning, logical reasoning, and multistep decision-making. Considering that a small refinement of the inner network structures of ANNs will usually lead to exponentially growing data costs, an additional planning module seems necessary for the further improvement of ANNs, especially for small data learning. In this paper, we propose a state operator and result (SOAR) improved ANN (SANN) model, which takes advantage of both the long-term cognitive planning ability of SOAR and the powerful feature detection ability of ANNs. It mimics the cognitive mechanism of the human brain to improve the traditional ANN with an additional logical planning module. In addition, a data fusion module is constructed to combine the probability vector obtained by SOAR planning and the original data feature array. A data fusion module is constructed to convert the information from the logical sequences in SOAR to the probabilistic vector in ANNs. The proposed architecture is validated in two types of robot multistep decision-making experiments for a grasping task: a multiblock simulated experiment and a multicup experiment in a real scenario. The experimental results show the efficiency and high accuracy of our proposed architecture. The integration of SOAR and ANN is a good compromise between logical planning with small data and probabilistic classification with big data. It also has strong potential for more complicated tasks that require robust classification, long-term planning, and fast learning. Some potential applications include recognition of grasping order in multiobject environment and cooperative grasping of multiagents.

Physiological bases of navigation

Eva Zita Patai, Hugo J. Spiers, The Versatile Wayfinder: Prefrontal Contributions to Spatial Navigation, . Trends in Cognitive Sciences, Volume 25, Issue 6, 2021, Pages 520-533 DOI: 10.1016/j.tics.2021.02.010.

The prefrontal cortex (PFC) supports decision-making, goal tracking, and planning. Spatial navigation is a behavior that taxes these cognitive processes, yet the role of the PFC in models of navigation has been largely overlooked. In humans, activity in dorsolateral PFC (dlPFC) and ventrolateral PFC (vlPFC) during detours, reveal a role in inhibition and replanning. Dorsal anterior cingulate cortex (dACC) is implicated in planning and spontaneous internally-generated changes of route. Orbitofrontal cortex (OFC) integrates representations of the environment with the value of actions, providing a ‘map’ of possible decisions. In rodents, medial frontal areas interact with hippocampus during spatial decisions and switching between navigation strategies. In reviewing these advances, we provide a framework for how different prefrontal regions may contribute to different stages of navigation.

Generating contrafactual explanations of Deep RL decisions to identify flawed agents

Matthew L. Olson, Roli Khanna, Lawrence Neal, Fuxin Li, Weng-Keen Wong, Counterfactual state explanations for reinforcement learning agents via generative deep learning, . Artificial Intelligence, Volume 295, 2021 DOI: 10.1016/j.artint.2021.103455.

Counterfactual explanations, which deal with “why not?” scenarios, can provide insightful explanations to an AI agent’s behavior [Miller [38]]. In this work, we focus on generating counterfactual explanations for deep reinforcement learning (RL) agents which operate in visual input environments like Atari. We introduce counterfactual state explanations, a novel example-based approach to counterfactual explanations based on generative deep learning. Specifically, a counterfactual state illustrates what minimal change is needed to an Atari game image such that the agent chooses a different action. We also evaluate the effectiveness of counterfactual states on human participants who are not machine learning experts. Our first user study investigates if humans can discern if the counterfactual state explanations are produced by the actual game or produced by a generative deep learning approach. Our second user study investigates if counterfactual state explanations can help non-expert participants identify a flawed agent; we compare against a baseline approach based on a nearest neighbor explanation which uses images from the actual game. Our results indicate that counterfactual state explanations have sufficient fidelity to the actual game images to enable non-experts to more effectively identify a flawed RL agent compared to the nearest neighbor baseline and to having no explanation at all.

Studying magician tricks to understand decision making and how to influence it

Alice Pailhès, Gustav Kuhn, Mind Control Tricks: Magicians’ Forcing and Free Will, . Trends in Cognitive Sciences, Volume 25, Issue 5, 2021, Pages 338-341 DOI: 10.1016/j.tics.2021.02.001.

A new research program has recently emerged that investigates magicians’ mind control tricks, also called forces. This research highlights the psychological processes that underpin decision-making, illustrates the ease by which our decisions can be covertly influenced, and helps answer questions about our sense of free will and agency over choices.

Improving POMDP solving efficiency by eliminating variables in the state structure

Eric A. Hansen, An integrated approach to solving influence diagrams and finite-horizon partially observable decision processes, . Artificial Intelligence, Volume 294, 2021 DOI: 10.1016/j.artint.2020.103431.

We show how to integrate a variable elimination approach to solving influence diagrams with a value iteration approach to solving finite-horizon partially observable Markov decision processes (POMDPs). The integration of these approaches creates a variable elimination algorithm for influence diagrams that has much more relaxed constraints on elimination order, which allows improved scalability in many cases. The new algorithm can also be viewed as a generalization of the value iteration algorithm for POMDPs that solves non-Markovian as well as Markovian problems, in addition to leveraging a factored representation for improved efficiency. The development of a single algorithm that integrates and generalizes both of these classic algorithms, one for influence diagrams and the other for POMDPs, unifies these two approaches to solving Bayesian decision problems in a way that combines their complementary advantages.

Model-based (on ordinary differential equations) and partially model-free Policy Iteration on continuous space and time

Jaeyoung Lee, Richard S. Sutton, Policy iterations for reinforcement learning problems in continuous time and space — Fundamental theory and methods, . Automatica, Volume 126, 2021 DOI: 10.1016/j.automatica.2020.109421.

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, a reinforcement learning (RL) problem. PI has also served as the fundamental for developing RL methods. In this paper, we propose two PI methods, called differential PI (DPI) and integral PI (IPI), and their variants, for a general RL framework in continuous time and space (CTS), where the environment is modeled by a system of ordinary differential equations (ODEs). The proposed methods inherit the current ideas of PI in classical RL and optimal control and theoretically support the existing RL algorithms in CTS: TD-learning and value-gradient-based (VGB) greedy policy update. We also provide case studies including (1) discounted RL and (2) optimal control tasks. Fundamental mathematical properties – admissibility, uniqueness of the solution to the Bellman equation (BE), monotone improvement, convergence, and optimality of the solution to the Hamilton–Jacobi–Bellman equation (HJBE) – are all investigated in-depth and improved from the existing theory, along with the general and case studies. Finally, the proposed ones are simulated with an inverted-pendulum model and their model-based and partially model-free implementations to support the theory and further investigate them beyond.

Formalization of “making sense” of sensory perceptions and use in several practical cases that compare favourably, because of the use of induction, to neural network approaches

Richard Evans, José Hernández-Orallo, Johannes Welbl, Pushmeet Kohli, Marek Sergot, Making sense of sensory input, . Artificial Intelligence, Volume 293, 2021 DOI: 10.1016/j.artint.2020.103438.

This paper attempts to answer a central question in unsupervised learning: what does it mean to “make sense” of a sensory sequence? In our formalization, making sense involves constructing a symbolic causal theory that both explains the sensory sequence and also satisfies a set of unity conditions. The unity conditions insist that the constituents of the causal theory – objects, properties, and laws – must be integrated into a coherent whole. On our account, making sense of sensory input is a type of program synthesis, but it is unsupervised program synthesis. Our second contribution is a computer implementation, the Apperception Engine, that was designed to satisfy the above requirements. Our system is able to produce interpretable human-readable causal theories from very small amounts of data, because of the strong inductive bias provided by the unity conditions. A causal theory produced by our system is able to predict future sensor readings, as well as retrodict earlier readings, and impute (fill in the blanks of) missing sensory readings, in any combination. In fact, it is able to do all three tasks simultaneously. We tested the engine in a diverse variety of domains, including cellular automata, rhythms and simple nursery tunes, multi-modal binding problems, occlusion tasks, and sequence induction intelligence tests. In each domain, we test our engine’s ability to predict future sensor values, retrodict earlier sensor values, and impute missing sensory data. The Apperception Engine performs well in all these domains, significantly out-performing neural net baselines. We note in particular that in the sequence induction intelligence tests, our system achieved human-level performance. This is notable because our system is not a bespoke system designed specifically to solve intelligence tests, but a general-purpose system that was designed to make sense of any sensory sequence.

Continuation paper: https://doi.org/10.1016/j.artint.2021.103521

Notes:

  • Use HMMs with the states being sets of atomic propositions and the transition function logical predicates, therefore mixing a non-symbolic framework (HMM) with a completely symbolic one.
  • Assume perceptions to be previously discretized and modelled as grounded atoms.
  • Need to be provided with both the sensory (discretized) input and commonsense knowledge about the predicates used for making sense.
  • Include a very clear and simple representation of deduction, induction and abduction (Fig. 1).

On the role of the hippocampus in managing the environmental context

Andrew P. Maurer, Lynn Nadel, The Continuity of Context: A Role for the Hippocampus, . Trends in Cognitive Sciences, Volume 25, Issue 3, 2021, Pages 187-199 DOI: 10.1016/j.tics.2020.12.007.

Tracking moment-to-moment change in input and detecting change sufficient to require altering behavior is crucial to survival. Here, we discuss how the brain evaluates change over time, focusing on the hippocampus and its role in tracking context. We leverage the anatomy and physiology of the hippocampal longitudinal axis, re-entrant loops, and amorphous networks to account for stimulus equivalence and the updating of an organism’s sense of its context. Place cells have a central role in tracking contextual continuities and discontinuities across multiple scales, a capacity beyond current models of pattern separation and completion. This perspective highlights the critical role of the hippocampus in both spatial cognition and episodic memory: tracking change and detecting boundaries separating one context, or episode, from another.

Classification with decision trees based on POMDPs

Shlomi Maliah, Guy Shani, Using POMDPs for learning cost sensitive decision trees, . Artificial Intelligence, Volume 292, 2021 DOI: 10.1016/j.artint.2020.103400.

In classification, an algorithm learns to classify a given instance based on a set of observed attribute values. In many real world cases testing the value of an attribute incurs a cost. Furthermore, there can also be a cost associated with the misclassification of an instance. Cost sensitive classification attempts to minimize the expected cost of classification, by deciding after each observed attribute value, which attribute to measure next. In this paper we suggest Partially Observable Markov Decision Processes (POMDPs) as a modeling tool for cost sensitive classification. POMDPs are typically solved through a policy over belief states. We show how a relatively small set of potentially important belief states can be identified, and define an MDP over these belief states. To identify these potentially important belief states, we construct standard decision trees over all attribute subsets, and the leaves of these trees become the state space of our tree-based MDP. At each phase we decide on the next attribute to measure, balancing the cost of the measurement and the classification accuracy. We compare our approach to a set of previous approaches, showing our approach to work better for a range of misclassification costs.

A new clustering algorithm based on swarm intelligence that is alleged to require no parameterization

Michael C. Thrun, Alfred Ultsch, Swarm intelligence for self-organized clustering, . Artificial Intelligence, Volume 290, 2021, DOI: 10.1016/j.artint.2020.103237.

Algorithms implementing populations of agents which interact with one another and sense their environment may exhibit emergent behavior such as self-organization and swarm intelligence. Here a swarm system, called Databionic swarm (DBS), is introduced which is able to adapt itself to structures of high-dimensional data characterized by distance and/or density-based structures in the data space. By exploiting the interrelations of swarm intelligence, self-organization and emergence, DBS serves as an alternative approach to the optimization of a global objective function in the task of clustering. The swarm omits the usage of a global objective function and is parameter-free because it searches for the Nash equilibrium during its annealing process. To our knowledge, DBS is the first swarm combining these approaches. Its clustering can outperform common clustering methods such as K-means, PAM, single linkage, spectral clustering, model-based clustering, and Ward, if no prior knowledge about the data is available. A central problem in clustering is the correct estimation of the number of clusters. This is addressed by a DBS visualization called topographic map which allows assessing the number of clusters. It is known that all clustering algorithms construct clusters, irrespective of the data set contains clusters or not. In contrast to most other clustering algorithms, the topographic map identifies, that clustering of the data is meaningless if the data contains no (natural) clusters. The performance of DBS is demonstrated on a set of benchmark data, which are constructed to pose difficult clustering problems and in two real-world applications.