Tag Archives: Decision Making

A novel method of mathematical compression of the value function for polynomial (in the state) time complexity of value iteration / policy iteration

Alex Gorodetsky, Sertac Karaman, and Youssef Marzouk, High-dimensional stochastic optimal control using continuous tensor decompositions, The International Journal of Robotics Research Vol 37, Issue 2-3, pp. 340 – 377, DOI: 10.1177/0278364917753994.

Motion planning and control problems are embedded and essential in almost all robotics applications. These problems are often formulated as stochastic optimal control problems and solved using dynamic programming algorithms. Unfortunately, most existing algorithms that guarantee convergence to optimal solutions suffer from the curse of dimensionality: the run time of the algorithm grows exponentially with the dimension of the state space of the system. We propose novel dynamic programming algorithms that alleviate the curse of dimensionality in problems that exhibit certain low-rank structure. The proposed algorithms are based on continuous tensor decompositions recently developed by the authors. Essentially, the algorithms represent high-dimensional functions (e.g. the value function) in a compressed format, and directly perform dynamic programming computations (e.g. value iteration, policy iteration) in this format. Under certain technical assumptions, the new algorithms guarantee convergence towards optimal solutions with arbitrary precision. Furthermore, the run times of the new algorithms scale polynomially with the state dimension and polynomially with the ranks of the value function. This approach realizes substantial computational savings in “compressible” problem instances, where value functions admit low-rank approximations. We demonstrate the new algorithms in a wide range of problems, including a simulated six-dimensional agile quadcopter maneuvering example and a seven-dimensional aircraft perching example. In some of these examples, we estimate computational savings of up to 10 orders of magnitude over standard value iteration algorithms. We further demonstrate the algorithms running in real time on board a quadcopter during a flight experiment under motion capture.

A model of the interdependence of previous sensorimotor experiences in the following decision making

Evelina Dineva & Gregor Schöner, How infants’ reaches reveal principles of sensorimotor decision making, Connection Science vol. 30 iss. 1, p. 53-80, DOI: 10.1080/09540091.2017.1405382.

In Piaget’s classical A-not-B-task, infants repeatedly make a sensorimotor decision to reach to one of two cued targets. Perseverative errors are induced by switching the cue from A to B, while spontaneous errors are unsolicited reaches to B when only A is cued. We argue that theoretical accounts of sensorimotor decision-making fail to address how motor decisions leave a memory trace that may impact future sensorimotor decisions. Instead, in extant neural models, perseveration is caused solely by the history of stimulation. We present a neural dynamic model of sensorimotor decision-making within the framework of Dynamic Field Theory, in which a dynamic instability amplifies fluctuations in neural activation into macroscopic, stable neural activation states that leave memory traces. The model predicts perseveration, but also a tendency to repeat spontaneous errors. To test the account, we pool data from several A-not-B experiments. A conditional probabilities analysis accounts quantitatively how motor decisions depend on the history of reaching. The results provide evidence for the interdependence among subsequent reaching decisions that is explained by the model, showing that by amplifying small differences in activation and affecting learning, decisions have consequences beyond the individual behavioural act.

Towards taking into account the complexity of finding the best option in decision-making systems

Peter Bossaerts, Carsten Murawski, Computational Complexity and Human Decision-Making, Trends in Cognitive Sciences, Volume 21, Issue 12, 2017, Pages 917-929, DOI: 10.1016/j.tics.2017.09.005.

The rationality principle postulates that decision-makers always choose the best action available to them. It underlies most modern theories of decision-making. The principle does not take into account the difficulty of finding the best option. Here, we propose that computational complexity theory (CCT) provides a framework for defining and quantifying the difficulty of decisions. We review evidence showing that human decision-making is affected by computational complexity. Building on this evidence, we argue that most models of decision-making, and metacognition, are intractable from a computational perspective. To be plausible, future theories of decision-making will need to take into account both the resources required for implementing the computations implied by the theory, and the resource constraints imposed on the decision-maker by biology.

An application of POMDPs to robot surveillance

S. Witwicki et al., Autonomous Surveillance Robots: A Decision-Making Framework for Networked Muiltiagent Systems, IEEE Robotics & Automation Magazine, vol. 24, no. 3, pp. 52-64, DOI: 10.1109/MRA.2017.2662222.

This article proposes an architecture for an intelligent surveillance system, where the aim is to mitigate the burden on humans in conventional surveillance systems by incorporating intelligent interfaces, computer vision, and autonomous mobile robots. Central to the intelligent surveillance system is the application of research into planning and decision making in this novel context. In this article, we describe the robot surveillance decision problem and explain how the integration of components in our system supports fully automated decision making. Several concrete scenarios deployed in real surveillance environments exemplify both the flexibility of our system to experiment with different representations and algorithms and the portability of our system into a variety of problem contexts. Moreover, these scenarios demonstrate how planning enables robots to effectively balance surveillance objectives, autonomously performing the job of human patrols and responders.

POMDPs with multicriteria in the cost to optimize – a hierarchical approach

Seyedshams Feyzabadi, Stefano Carpin, Planning using hierarchical constrained Markov decision processes, Autonomous Robots, Volume 41, Issue 8, pp 1589–1607, DOI: 10.1007/s10514-017-9630-4.

Constrained Markov decision processes offer a principled method to determine policies for sequential stochastic decision problems where multiple costs are concurrently considered. Although they could be very valuable in numerous robotic applications, to date their use has been quite limited. Among the reasons for their limited adoption is their computational complexity, since policy computation requires the solution of constrained linear programs with an extremely large number of variables. To overcome this limitation, we propose a hierarchical method to solve large problem instances. States are clustered into macro states and the parameters defining the dynamic behavior and the costs of the clustered model are determined using a Monte Carlo approach. We show that the algorithm we propose to create clustered states maintains valuable properties of the original model, like the existence of a solution for the problem. Our algorithm is validated in various planning problems in simulation and on a mobile robot platform, and we experimentally show that the clustered approach significantly outperforms the non-hierarchical solution while experiencing only moderate losses in terms of objective functions.

Prediction of changes in behaviors of cars for autohomous driving, based on POMDPs made efficient by separation of multiple policies

Enric Galceran, Alexander G. Cunningham, Ryan M. Eustice, Edwin Olson,Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment, Autonomous Robots, August 2017, Volume 41, Issue 6, pp 1367–1382, DOI: 10.1007/s10514-017-9619-z.

This paper reports on an integrated inference and decision-making approach for autonomous driving that models vehicle behavior for both our vehicle and nearby vehicles as a discrete set of closed-loop policies. Each policy captures a distinct high-level behavior and intention, such as driving along a lane or turning at an intersection. We first employ Bayesian changepoint detection on the observed history of nearby cars to estimate the distribution over potential policies that each nearby car might be executing. We then sample policy assignments from these distributions to obtain high-likelihood actions for each participating vehicle, and perform closed-loop forward simulation to predict the outcome for each sampled policy assignment. After evaluating these predicted outcomes, we execute the policy with the maximum expected reward value. We validate behavioral prediction and decision-making using simulated and real-world experiments.

A good intro about actor-critic and decision making without model on MDPs

J. Wang and I. C. Paschalidis, “An Actor-Critic Algorithm With Second-Order Actor and Critic,” in IEEE Transactions on Automatic Control, vol. 62, no. 6, pp. 2689-2703, June 2017.DOI: 10.1109/TAC.2016.2616384.

Actor-critic algorithms solve dynamic decision making problems by optimizing a performance metric of interest over a user-specified parametric class of policies. They employ a combination of an actor, making policy improvement steps, and a critic, computing policy improvement directions. Many existing algorithms use a steepest ascent method to improve the policy, which is known to suffer from slow convergence for ill-conditioned problems. In this paper, we first develop an estimate of the (Hessian) matrix containing the second derivatives of the performance metric with respect to policy parameters. Using this estimate, we introduce a new second-order policy improvement method and couple it with a critic using a second-order learning method. We establish almost sure convergence of the new method to a neighborhood of a policy parameter stationary point. We compare the new algorithm with some existing algorithms in two applications and demonstrate that it leads to significantly faster convergence.

On how the calculus of utility of actions drives many human behaviours

Julian Jara-Ettinger, Hyowon Gweon, Laura E. Schulz, Joshua B. Tenenbaum, The Naïve Utility Calculus: Computational Principles Underlying Commonsense Psychology, Trends in Cognitive Sciences, Volume 20, Issue 8, 2016, Pages 589-604, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.05.011.

We propose that human social cognition is structured around a basic understanding of ourselves and others as intuitive utility maximizers: from a young age, humans implicitly assume that agents choose goals and actions to maximize the rewards they expect to obtain relative to the costs they expect to incur. This \u2018naïve utility calculus\u2019 allows both children and adults observe the behavior of others and infer their beliefs and desires, their longer-term knowledge and preferences, and even their character: who is knowledgeable or competent, who is praiseworthy or blameworthy, who is friendly, indifferent, or an enemy. We review studies providing support for the naïve utility calculus, and we show how it captures much of the rich social reasoning humans engage in from infancy.

Theoretical models for explaining the human (quick) decicion-making process

Roger Ratcliff, Philip L. Smith, Scott D. Brown, Gail McKoon, Diffusion Decision Model: Current Issues and History, Trends in Cognitive Sciences, Volume 20, Issue 4, April 2016, Pages 260-281, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.01.007.

There is growing interest in diffusion models to represent the cognitive and neural processes of speeded decision making. Sequential-sampling models like the diffusion model have a long history in psychology. They view decision making as a process of noisy accumulation of evidence from a stimulus. The standard model assumes that evidence accumulates at a constant rate during the second or two it takes to make a decision. This process can be linked to the behaviors of populations of neurons and to theories of optimality. Diffusion models have been used successfully in a range of cognitive tasks and as psychometric tools in clinical research to examine individual differences. In this review, we relate the models to both earlier and more recent research in psychology.

The quick-intuition vs. slow-deliberation dilemma from a decision-making perspective

Y-Lan Boureau, Peter Sokol-Hessner, Nathaniel D. Daw, Deciding How To Decide: Self-Control and Meta-Decision Making, Trends in Cognitive Sciences, Volume 19, Issue 11, November 2015, Pages 700-710, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.08.013.

Many different situations related to self control involve competition between two routes to decisions: default and frugal versus more resource-intensive. Examples include habits versus deliberative decisions, fatigue versus cognitive effort, and Pavlovian versus instrumental decision making. We propose that these situations are linked by a strikingly similar core dilemma, pitting the opportunity costs of monopolizing shared resources such as executive functions for some time, against the possibility of obtaining a better outcome. We offer a unifying normative perspective on this underlying rational meta-optimization, review how this may tie together recent advances in many separate areas, and connect several independent models. Finally, we suggest that the crucial mechanisms and meta-decision variables may be shared across domains.