Author Archives: Juan-antonio Fernández-madrigal

On the existence of prior knowledge, “pre-wired” in animal brains, that guides further learning

Elisabetta Versace, Antone Martinho-Truswell, Alex Kacelnik, Giorgio Vallortigara, Priors in Animal and Artificial Intelligence: Where Does Learning Begin?, Trends in Cognitive Sciences, Volume 22, Issue 11, 2018, Pages 963-965, DOI: 10.1016/j.tics.2018.07.005.

A major goal for the next generation of artificial intelligence (AI) is to build machines that are able to reason and cope with novel tasks, environments, and situations in a manner that approaches the abilities of animals. Evidence from precocial species suggests that driving learning through suitable priors can help to successfully face this challenge.

A new model of reinforcement learning based on the human brain that copes with continuous spaces through continuous rewards, with a short but nice state-of-the-art of RL applied to large, continuous spaces

Feifei Zhao, Yi Zeng, Guixiang Wang, Jun Bai, Bo Xu, A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations, Cognitive Computation, Volume 10, Issue 2, pp 296–306, DOI: 10.1007/s12559-017-9511-3.

Decision making is a fundamental ability for intelligent agents (e.g., humanoid robots and unmanned aerial vehicles). During decision making process, agents can improve the strategy for interacting with the dynamic environment through reinforcement learning. Many state-of-the-art reinforcement learning models deal with relatively smaller number of state-action pairs, and the states are preferably discrete, such as Q-learning and Actor-Critic algorithms. While in practice, in many scenario, the states are continuous and hard to be properly discretized. Better autonomous decision making methods need to be proposed to handle these problems. Inspired by the mechanism of decision making in human brain, we propose a general computational model, named as prefrontal cortex-basal ganglia (PFC-BG) algorithm. The proposed model is inspired by the biological reinforcement learning pathway and mechanisms from the following perspectives: (1) Dopamine signals continuously update reward-relevant information for both basal ganglia and working memory in prefrontal cortex. (2) We maintain the contextual reward information in working memory. This has a top-down biasing effect on reinforcement learning in basal ganglia. The proposed model separates the continuous states into smaller distinguishable states, and introduces continuous reward function for each state to obtain reward information at different time. To verify the performance of our model, we apply it to many UAV decision making experiments, such as avoiding obstacles and flying through window and door, and the experiments support the effectiveness of the model. Compared with traditional Q-learning and Actor-Critic algorithms, the proposed model is more biologically inspired, and more accurate and faster to make decision.

Z-numbers: an extension of fuzzy variables for cognitive decision making, and the concept of cognitive information

Hong-gang Peng, Jian-qiang Wang, Outranking Decision-Making Method with Z-Number Cognitive Information, Cognitive Computation, Volume 10, Issue 5, pp 752–768, DOI: 10.1007/s12559-018-9556-y.

The Z-number provides an adequate and reliable description of cognitive information. The nature of Z-numbers is complex, however, and important issues in Z-number computation remain to be addressed. This study focuses on developing a computationally simple method with Z-numbers to address multicriteria decision-making (MCDM) problems. Processing Z-numbers requires the direct computation of fuzzy and probabilistic uncertainties. We used an effective method to analyze the Z-number construct. Next, we proposed some outranking relations of Z-numbers and defined the dominance degree of discrete Z-numbers. Also, after analyzing the characteristics of elimination and choice translating reality III (ELECTRE III) and qualitative flexible multiple criteria method (QUALIFLEX), we developed an improved outranking method. To demonstrate this method, we provided an illustrative example concerning job-satisfaction evaluation. We further verified the validity of the method by a criteria test and comparative analysis. The results demonstrate that the method can be successfully applied to real-world decision-making problems, and it can identify more reasonable outcomes than previous methods. This study overcomes the high computational complexity in existing Z-number computation frameworks by exploring the pairwise comparison of Z-numbers. The method inherits the merits of the classical outranking method and considers the non-compensability of criteria. Therefore, it has remarkable potential to address practical decision-making problems involving Z-information.

SLAM as a sampling problem, with some references to the signal sampling state-of-the-art

Golnoosh Elhami, et. al Sampling at Unknown Locations: Uniqueness and Reconstruction Under Constraints, IEEE Transactions on Signal Processing, Vol 66 no. 22, DOI: 10.1109/TSP.2018.2872019.

Traditional sampling results assume that the sample locations are known. Motivated by simultaneous localization and mapping (SLAM) and structure from motion (SfM), we investigate sampling at unknown locations. Without further constraints, the problem is often hopeless. For example, we recently showed that, for polynomial and bandlimited signals, it is possible to find two signals, arbitrarily far from each other, that fit the measurements. However, we also showed that this can be overcome by adding constraints to the sample positions. In this paper, we show that these constraints lead to a uniform sampling of a composite of functions. Furthermore, the formulation retains the key aspects of the SLAM and SfM problems, whilst providing uniqueness, in many cases. We demonstrate this by studying two simple examples of constrained sampling at unknown locations. In the first, we consider sampling a periodic bandlimited signal composite with an unknown linear function. We derive the sampling requirements for uniqueness and present an algorithm that recovers both the bandlimited signal and the linear warping. Furthermore, we prove that, when the requirements for uniqueness are not met, the cases of multiple solutions have measure zero. For our second example, we consider polynomials sampled such that the sampling positions are constrained by a rational function. We previously proved that, if a specific sampling requirement is met, uniqueness is achieved. In addition, we present an alternate minimization scheme for solving the resulting non-convex optimization problem. Finally, fully reproducible simulation results are provided to support our theoretical analysis.

Sharing beliefs (pdfs) between human and robot

Rina Tse, Mark Campbell, Human–Robot Communications of Probabilistic Beliefs via a Dirichlet Process Mixture of Statements, IEEE Transactions on Robotics, vol. 34, no. 5, DOI: 10.1109/TRO.2018.2830360.

This paper presents a natural framework for information sharing in cooperative tasks involving humans and robots. In this framework, all information gathered over time by a human–robot team is exchanged and summarized in the form of a fused probability density function (pdf). An approach for an intelligent system to describe its belief pdfs in English expressions is presented. This belief expression generation is achieved through two goodness measures: semantic correctness and information preservation. In order to describe complex, multimodal belief pdfs, a Mixture of Statements (MoS) model is proposed such that optimal expressions can be generated through compositions of multiple statements. The model is further extended to a nonparametric Dirichlet process MoS generation, such that the optimal number of statements required for describing a given pdf is automatically determined. Results based on information loss, human collaborative task performances, and correctness rating scores suggest that the proposed method for generating belief expressions is an effective approach for communicating probabilistic information between robots and humans.

Automatic design of a robot to perform given tasks with an optimal configuration

Sehoon Ha et al., Computational Design of Robotic Devices From High-Level Motion Specifications, IEEE Transactions on Robotics, vol. 34, no. 5, DOI: 10.1109/TRO.2018.2830419.

We present a novel computational approach to design the robotic devices from high-level motion specifications. Our computational system uses a library of modular components—actuators, mounting brackets, and connectors—to define the space of possible robot designs. The process of creating a new robot begins with a set of input trajectories that specify how its end effectors and/or body should move. By searching through the combinatorial set of possible arrangements of modular components, our method generates a functional, as-simple-as-possible robotic device that is capable of tracking the input motion trajectories. To significantly improve the efficiency of this discrete optimization process, we propose a novel heuristic that guides the search for appropriate designs. Briefly, our heuristic function estimates how much an intermediate robot design needs to change before it becomes able to execute the target motion trajectories. We demonstrate the effectiveness of our computational design method by automatically creating a variety of robotic manipulators and legged robots. To generate these results, we define our own robotic kit that includes off-the-shelf actuators and 3-D printable connectors. We validate our results by fabricating two robotic devices designed with our method.

On the need to replanning in POMDPs when applied to real systems, due to imperfect sensing and computational cost of online planning

Ali-akbar Agha-mohammadi et al., SLAP: Simultaneous Localization and Planning Under Uncertainty via Dynamic Replanning in Belief Space, IEEE Transactions on Robotics, vol. 34, no. 5, DOI: 10.1109/TRO.2018.2838556.

Simultaneous localization and planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous partially observable Markov decision process (POMDP), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continuous in state, action, and observation space, is approximated offline via sampling-based methods, but operates in a replanning loop online to admit local improvements to the coarse offline policy. This construct enables the proposed method to combat changing environments and large localization errors, even when the change alters the homotopy class of the optimal trajectory. It further outperforms the state-of-the-art Feedback-based Information RoadMap (FIRM) method by eliminating unnecessary stabilization steps. Applying belief space planning to physical systems brings with it a plethora of challenges. A key focus of this paper is to implement the proposed planner on a physical robot and show the SLAP solution performance under uncertainty, in changing environments and in the presence of large disturbances, such as a kidnapped robot situation.

Interesting study of the number of optimal points in SLAM, considering it as a non-linear, non-convex optimization problem

Heng Wang, Shoudong Huang, Guanghong Yang, Gamini Dissanayake, Comparison of two different objective functions in 2D point feature SLAM, Automatica,
Volume 97, 2018, Pages 172-181, DOI: 10.1016/j.automatica.2018.08.009.

This paper compares two different objective functions in 2D point feature Simultaneous Localization and Mapping (SLAM). It is shown that the objective function can have a significant impact on the convergence of the iterative optimization techniques used in SLAM. When Frobenius norm is adopted for the error term of the orientation part of odometry, the SLAM problem has much better convergence properties, as compared with that using the angle difference as the error term. For one-step case, we have proved that there is one and only one minimum to the SLAM problem, and strong duality always holds. For two-step case, strong duality always holds except when three very special conditions hold simultaneously (which happens with probability zero), thus the global optimal solution to primal SLAM problem can be obtained by solving the corresponding Lagrangian dual problem in most cases. Further, for arbitrary m-step cases, we also show using examples that much better convergence results can be obtained. Simulation examples are given to demonstrate the different convergence properties using two different objective functions.

On how psychological time emerges from execution of actions in the environment

Jennifer T. Coull, Sylvie Droit-Volet, Explicit Understanding of Duration Develops Implicitly through Action, Trends in Cognitive Sciences, Volume 22, Issue 10, 2018, Pages 923-937, DOI: 10.1016/j.tics.2018.07.011.

Time is relative. Changes in cognitive state or sensory context make it appear to speed up or slow down. Our perception of time is a rather fragile mental construct derived from the way events in the world are processed and integrated in memory. Nevertheless, the slippery concept of time can be structured by draping it over more concrete functional scaffolding. Converging evidence from developmental studies of children and neuroimaging in adults indicates that we can represent time in spatial or motor terms. We hypothesise that explicit processing of time is mediated by motor structures of the brain in adulthood because we implicitly learn about time through action during childhood. Future challenges will be to harness motor or spatial representations of time to optimise behaviour, potentially for therapeutic gain.

A very interesting analysis on how reinforcement learning depends on time, both for MDPs and for the psychological basis of RL in the human brain

Elijah A. Petter, Samuel J. Gershman, Warren H. Meck, Integrating Models of Interval Timing and Reinforcement Learning, Trends in Cognitive Sciences, Volume 22, Issue 10, 2018, Pages 911-922 DOI: 10.1016/j.tics.2018.08.004.

We present an integrated view of interval timing and reinforcement learning (RL) in the brain. The computational goal of RL is to maximize future rewards, and this depends crucially on a representation of time. Different RL systems in the brain process time in distinct ways. A model-based system learns ‘what happens when’, employing this internal model to generate action plans, while a model-free system learns to predict reward directly from a set of temporal basis functions. We describe how these systems are subserved by a computational division of labor between several brain regions, with a focus on the basal ganglia and the hippocampus, as well as how these regions are influenced by the neuromodulator dopamine.

Some quotes beyond the abstract:

The Markov assumption also makes explicit the requirements for temporal representation. All temporal dynamics must be captured by the state-transition function, which means that the state representation must encode the time-invariant structure of the environment.