Monthly Archives: July 2023

You are browsing the site archives by month.

Studying magician tricks to understand decision making and how to influence it

Alice Pailhès, Gustav Kuhn, Mind Control Tricks: Magicians’ Forcing and Free Will, . Trends in Cognitive Sciences, Volume 25, Issue 5, 2021, Pages 338-341 DOI: 10.1016/j.tics.2021.02.001.

A new research program has recently emerged that investigates magicians’ mind control tricks, also called forces. This research highlights the psychological processes that underpin decision-making, illustrates the ease by which our decisions can be covertly influenced, and helps answer questions about our sense of free will and agency over choices.

Improving POMDP solving efficiency by eliminating variables in the state structure

Eric A. Hansen, An integrated approach to solving influence diagrams and finite-horizon partially observable decision processes, . Artificial Intelligence, Volume 294, 2021 DOI: 10.1016/j.artint.2020.103431.

We show how to integrate a variable elimination approach to solving influence diagrams with a value iteration approach to solving finite-horizon partially observable Markov decision processes (POMDPs). The integration of these approaches creates a variable elimination algorithm for influence diagrams that has much more relaxed constraints on elimination order, which allows improved scalability in many cases. The new algorithm can also be viewed as a generalization of the value iteration algorithm for POMDPs that solves non-Markovian as well as Markovian problems, in addition to leveraging a factored representation for improved efficiency. The development of a single algorithm that integrates and generalizes both of these classic algorithms, one for influence diagrams and the other for POMDPs, unifies these two approaches to solving Bayesian decision problems in a way that combines their complementary advantages.

Cubature (fixed point representation of uncertainties, as in UKF) Kalman Filter

Juan-Carlos Santos-León, Ramón Orive, Daniel Acosta, Leopoldo Acosta, The Cubature Kalman Filter revisited, . Automatica, Volume 127, 2021 DOI: 10.1016/j.automatica.2021.109541.

In this paper, the construction and effectiveness of the so-called Cubature Kalman Filter (CKF) is revisited, as well as its extensions for higher degrees of precision. In this sense, some stable (with respect to the dimension) cubature rules with a quasi-optimal number of nodes are built, and their numerical performance is checked in comparison with other known formulas. All these cubature rules are suitably placed in the mathematical framework of numerical integration in several variables. A method based on the discretization of higher order partial derivatives by certain divided differences is used to provide stable rules of degrees d=5 and d=7, though it can also be applied for higher dimensions. The application of these old and new formulas to the filter algorithm is tested by means of some examples.

A hierarchical POMDP system for robot manipulation

Wenrui Zhao, Weidong Chen, Hierarchical POMDP planning for object manipulation in clutter, . Robotics and Autonomous Systems, Volume 139, 2021 DOI: 10.1016/j.robot.2021.103736.

Object manipulation planning in clutter suffers from perception uncertainties due to occlusion, as well as action constraints required by collision avoidance. Partially observable Markov decision process (POMDP) provides a general model for planning under uncertainties. But a manipulation task usually have a large action space, which not only makes task planning intractable but also brings significant motion planning effort to check action feasibility. In this work, a new kind of hierarchical POMDP is presented for object manipulation tasks, in which a brief abstract POMDP is extracted and utilized together with the original POMDP. And a hierarchical belief tree search algorithm is proposed for efficient online planning, which constructs fewer belief nodes by building part of the tree with the abstract POMDP and invokes motion planning fewer times by determining action feasibility with observation function of the abstract POMDP. A learning mechanism is also designed in case there are unknown probabilities in transition and observation functions. This planning framework is demonstrated with an object fetching task and the performance is empirically validated by simulations and experiments.

A hierarchical robot control architecture that supports learning of skills at different levels through “curriculum learning” and an interesting approach to mix behaviours

Suro, F., Ferber, J., Stratulat, T. et al., A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agents, . Auton Robot 45, 245–264 (2021) DOI: 10.1007/s10514-020-09960-7.

One of the challenging aspects of open ended or lifelong agent development is that the final behaviour for which an agent is trained at a given moment can be an element for the future creation of one, or even several, behaviours of greater complexity, whose purpose cannot be anticipated. In this paper, we present modular influence network design (MIND), an artificial agent control architecture suited to open ended and cumulative learning. The MIND architecture encapsulates sub behaviours into modules and combines them into a hierarchy reflecting the modular and hierarchical nature of complex tasks. Compared to similar research, the main original aspect of MIND is the multi layered hierarchy using a generic control signal, the influence, to obtain an efficient global behaviour. This article shows the ability of MIND to learn a curriculum of independent didactic tasks of increasing complexity covering different aspects of a desired behaviour. In so doing we demonstrate the contributions of MIND to open-ended development: encapsulation into modules allows for the preservation and re-usability of all the skills acquired during the curriculum and their focused retraining, the modular structure serves the evolving topology by easing the coordination of new sensors, actuators and heterogeneous learning structures.

Model-based (on ordinary differential equations) and partially model-free Policy Iteration on continuous space and time

Jaeyoung Lee, Richard S. Sutton, Policy iterations for reinforcement learning problems in continuous time and space — Fundamental theory and methods, . Automatica, Volume 126, 2021 DOI: 10.1016/j.automatica.2020.109421.

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, a reinforcement learning (RL) problem. PI has also served as the fundamental for developing RL methods. In this paper, we propose two PI methods, called differential PI (DPI) and integral PI (IPI), and their variants, for a general RL framework in continuous time and space (CTS), where the environment is modeled by a system of ordinary differential equations (ODEs). The proposed methods inherit the current ideas of PI in classical RL and optimal control and theoretically support the existing RL algorithms in CTS: TD-learning and value-gradient-based (VGB) greedy policy update. We also provide case studies including (1) discounted RL and (2) optimal control tasks. Fundamental mathematical properties – admissibility, uniqueness of the solution to the Bellman equation (BE), monotone improvement, convergence, and optimality of the solution to the Hamilton–Jacobi–Bellman equation (HJBE) – are all investigated in-depth and improved from the existing theory, along with the general and case studies. Finally, the proposed ones are simulated with an inverted-pendulum model and their model-based and partially model-free implementations to support the theory and further investigate them beyond.

Motion planning with uncertain obstacles is NP-hard

Shimanuki L, Axelrod B., Hardness of Motion Planning with Obstacle Uncertainty in Two Dimensions, . The International Journal of Robotics Research. 2021;40(10-11):1151-1166 DOI: 10.1177/0278364921992787.

We consider the problem of motion planning in the presence of uncertain obstacles, modeled as polytopes with Gaussian-distributed faces (PGDFs). A number of practical algorithms exist for motion planning in the presence of known obstacles by constructing a graph in configuration space, then efficiently searching the graph to find a collision-free path. We show that such an exact algorithm is unlikely to be practical in the domain with uncertain obstacles. In particular, we show that safe 2D motion planning among PGDF obstacles is NP-hard with respect to the number of obstacles, and remains NP-hard after being restricted to a graph. Our reduction is based on a path encoding of MAXQHORNSAT and uses the risk of collision with an obstacle to encode variable assignments and literal satisfactions. This implies that, unlike in the known case, planning under uncertainty is hard, even when given a graph containing the solution. We further show by reduction from 3-SAT that both safe 3D motion planning among PGDF obstacles and the related minimum constraint removal problem remain NP-hard even when restricted to cases where each obstacle overlaps with at most a constant number of other obstacles.

Formalization of “making sense” of sensory perceptions and use in several practical cases that compare favourably, because of the use of induction, to neural network approaches

Richard Evans, José Hernández-Orallo, Johannes Welbl, Pushmeet Kohli, Marek Sergot, Making sense of sensory input, . Artificial Intelligence, Volume 293, 2021 DOI: 10.1016/j.artint.2020.103438.

This paper attempts to answer a central question in unsupervised learning: what does it mean to “make sense” of a sensory sequence? In our formalization, making sense involves constructing a symbolic causal theory that both explains the sensory sequence and also satisfies a set of unity conditions. The unity conditions insist that the constituents of the causal theory – objects, properties, and laws – must be integrated into a coherent whole. On our account, making sense of sensory input is a type of program synthesis, but it is unsupervised program synthesis. Our second contribution is a computer implementation, the Apperception Engine, that was designed to satisfy the above requirements. Our system is able to produce interpretable human-readable causal theories from very small amounts of data, because of the strong inductive bias provided by the unity conditions. A causal theory produced by our system is able to predict future sensor readings, as well as retrodict earlier readings, and impute (fill in the blanks of) missing sensory readings, in any combination. In fact, it is able to do all three tasks simultaneously. We tested the engine in a diverse variety of domains, including cellular automata, rhythms and simple nursery tunes, multi-modal binding problems, occlusion tasks, and sequence induction intelligence tests. In each domain, we test our engine’s ability to predict future sensor values, retrodict earlier sensor values, and impute missing sensory data. The Apperception Engine performs well in all these domains, significantly out-performing neural net baselines. We note in particular that in the sequence induction intelligence tests, our system achieved human-level performance. This is notable because our system is not a bespoke system designed specifically to solve intelligence tests, but a general-purpose system that was designed to make sense of any sensory sequence.

Continuation paper:


  • Use HMMs with the states being sets of atomic propositions and the transition function logical predicates, therefore mixing a non-symbolic framework (HMM) with a completely symbolic one.
  • Assume perceptions to be previously discretized and modelled as grounded atoms.
  • Need to be provided with both the sensory (discretized) input and commonsense knowledge about the predicates used for making sense.
  • Include a very clear and simple representation of deduction, induction and abduction (Fig. 1).

Discrete Q-learning used, along a Deep CNN for localization, for mobile robot navigation

Amirhossein Shantia, Rik Timmers, Yiebo Chong, Cornel Kuiper, Francesco Bidoia, Lambert Schomaker, Marco Wiering, Two-stage visual navigation by deep neural networks and multi-goal reinforcement learning, . Robotics and Autonomous Systems, Volume 138, 2021 DOI: 10.1016/j.robot.2021.103731.

In this paper, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. We train a deep neural network for estimating the robot’s position in the environment using ground truth information provided by a classical localization and mapping approach. The second simpler multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep network. In the experiments, we first compare different architectures to select the best deep network for location estimation, and then compare the effects of the multi-goal reinforcement learning method to traditional reinforcement learning. The results show a significant improvement when multi-goal reinforcement learning is used. Furthermore, the results of the location estimator show that a deep network can learn and generalize in different environments using camera images with high accuracy in both position and orientation.

Learning the parameters of Bernoulli for modelling the transmission times in remote control with known plant dynamics

Konstantinos Gatsis, George J. Pappas, Statistical learning for analysis of networked control systems over unknown channels, . Automatica, Volume 125, 2021 DOI: 10.1016/j.automatica.2020.109386.

Recent control trends are increasingly relying on communication networks and wireless channels to close the loop for Internet-of-Things applications. Traditionally these approaches are model-based, i.e., assuming a network or channel model they are focused on stability analysis and appropriate controller designs. However the availability of such wireless channel modeling is fundamentally challenging in practice as channels are typically unknown a priori and only available through data samples. In this work we aim to develop algorithms that rely on channel sample data to determine the mean square stability and performance of networked control tasks. In this regard our work is the first to characterize the amount of channel modeling that is required to answer such a question. Specifically we examine how many channel data samples are required in order to answer with high confidence whether a given networked control system is stable or not. This analysis is based on the notion of sample complexity from the learning literature and is facilitated by concentration inequalities. Moreover we establish a direct relation between the sample complexity and the networked system stability margin, i.e., the underlying packet success rate of the channel and the spectral radius of the dynamics of the control system. This illustrates that it becomes impractical to verify stability under a large range of plant and channel configurations. We validate our theoretical results in numerical simulations.