Tag Archives: Decision Making

Towards taking into account the complexity of finding the best option in decision-making systems

Peter Bossaerts, Carsten Murawski, Computational Complexity and Human Decision-Making, Trends in Cognitive Sciences, Volume 21, Issue 12, 2017, Pages 917-929, DOI: 10.1016/j.tics.2017.09.005.

The rationality principle postulates that decision-makers always choose the best action available to them. It underlies most modern theories of decision-making. The principle does not take into account the difficulty of finding the best option. Here, we propose that computational complexity theory (CCT) provides a framework for defining and quantifying the difficulty of decisions. We review evidence showing that human decision-making is affected by computational complexity. Building on this evidence, we argue that most models of decision-making, and metacognition, are intractable from a computational perspective. To be plausible, future theories of decision-making will need to take into account both the resources required for implementing the computations implied by the theory, and the resource constraints imposed on the decision-maker by biology.

An application of POMDPs to robot surveillance

S. Witwicki et al., Autonomous Surveillance Robots: A Decision-Making Framework for Networked Muiltiagent Systems, IEEE Robotics & Automation Magazine, vol. 24, no. 3, pp. 52-64, DOI: 10.1109/MRA.2017.2662222.

This article proposes an architecture for an intelligent surveillance system, where the aim is to mitigate the burden on humans in conventional surveillance systems by incorporating intelligent interfaces, computer vision, and autonomous mobile robots. Central to the intelligent surveillance system is the application of research into planning and decision making in this novel context. In this article, we describe the robot surveillance decision problem and explain how the integration of components in our system supports fully automated decision making. Several concrete scenarios deployed in real surveillance environments exemplify both the flexibility of our system to experiment with different representations and algorithms and the portability of our system into a variety of problem contexts. Moreover, these scenarios demonstrate how planning enables robots to effectively balance surveillance objectives, autonomously performing the job of human patrols and responders.

POMDPs with multicriteria in the cost to optimize – a hierarchical approach

Seyedshams Feyzabadi, Stefano Carpin, Planning using hierarchical constrained Markov decision processes, Autonomous Robots, Volume 41, Issue 8, pp 1589–1607, DOI: 10.1007/s10514-017-9630-4.

Constrained Markov decision processes offer a principled method to determine policies for sequential stochastic decision problems where multiple costs are concurrently considered. Although they could be very valuable in numerous robotic applications, to date their use has been quite limited. Among the reasons for their limited adoption is their computational complexity, since policy computation requires the solution of constrained linear programs with an extremely large number of variables. To overcome this limitation, we propose a hierarchical method to solve large problem instances. States are clustered into macro states and the parameters defining the dynamic behavior and the costs of the clustered model are determined using a Monte Carlo approach. We show that the algorithm we propose to create clustered states maintains valuable properties of the original model, like the existence of a solution for the problem. Our algorithm is validated in various planning problems in simulation and on a mobile robot platform, and we experimentally show that the clustered approach significantly outperforms the non-hierarchical solution while experiencing only moderate losses in terms of objective functions.

Prediction of changes in behaviors of cars for autohomous driving, based on POMDPs made efficient by separation of multiple policies

Enric Galceran, Alexander G. Cunningham, Ryan M. Eustice, Edwin Olson,Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment, Autonomous Robots, August 2017, Volume 41, Issue 6, pp 1367–1382, DOI: 10.1007/s10514-017-9619-z.

This paper reports on an integrated inference and decision-making approach for autonomous driving that models vehicle behavior for both our vehicle and nearby vehicles as a discrete set of closed-loop policies. Each policy captures a distinct high-level behavior and intention, such as driving along a lane or turning at an intersection. We first employ Bayesian changepoint detection on the observed history of nearby cars to estimate the distribution over potential policies that each nearby car might be executing. We then sample policy assignments from these distributions to obtain high-likelihood actions for each participating vehicle, and perform closed-loop forward simulation to predict the outcome for each sampled policy assignment. After evaluating these predicted outcomes, we execute the policy with the maximum expected reward value. We validate behavioral prediction and decision-making using simulated and real-world experiments.

A good intro about actor-critic and decision making without model on MDPs

J. Wang and I. C. Paschalidis, “An Actor-Critic Algorithm With Second-Order Actor and Critic,” in IEEE Transactions on Automatic Control, vol. 62, no. 6, pp. 2689-2703, June 2017.DOI: 10.1109/TAC.2016.2616384.

Actor-critic algorithms solve dynamic decision making problems by optimizing a performance metric of interest over a user-specified parametric class of policies. They employ a combination of an actor, making policy improvement steps, and a critic, computing policy improvement directions. Many existing algorithms use a steepest ascent method to improve the policy, which is known to suffer from slow convergence for ill-conditioned problems. In this paper, we first develop an estimate of the (Hessian) matrix containing the second derivatives of the performance metric with respect to policy parameters. Using this estimate, we introduce a new second-order policy improvement method and couple it with a critic using a second-order learning method. We establish almost sure convergence of the new method to a neighborhood of a policy parameter stationary point. We compare the new algorithm with some existing algorithms in two applications and demonstrate that it leads to significantly faster convergence.

On how the calculus of utility of actions drives many human behaviours

Julian Jara-Ettinger, Hyowon Gweon, Laura E. Schulz, Joshua B. Tenenbaum, The Naïve Utility Calculus: Computational Principles Underlying Commonsense Psychology, Trends in Cognitive Sciences, Volume 20, Issue 8, 2016, Pages 589-604, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.05.011.

We propose that human social cognition is structured around a basic understanding of ourselves and others as intuitive utility maximizers: from a young age, humans implicitly assume that agents choose goals and actions to maximize the rewards they expect to obtain relative to the costs they expect to incur. This \u2018naïve utility calculus\u2019 allows both children and adults observe the behavior of others and infer their beliefs and desires, their longer-term knowledge and preferences, and even their character: who is knowledgeable or competent, who is praiseworthy or blameworthy, who is friendly, indifferent, or an enemy. We review studies providing support for the naïve utility calculus, and we show how it captures much of the rich social reasoning humans engage in from infancy.

Theoretical models for explaining the human (quick) decicion-making process

Roger Ratcliff, Philip L. Smith, Scott D. Brown, Gail McKoon, Diffusion Decision Model: Current Issues and History, Trends in Cognitive Sciences, Volume 20, Issue 4, April 2016, Pages 260-281, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.01.007.

There is growing interest in diffusion models to represent the cognitive and neural processes of speeded decision making. Sequential-sampling models like the diffusion model have a long history in psychology. They view decision making as a process of noisy accumulation of evidence from a stimulus. The standard model assumes that evidence accumulates at a constant rate during the second or two it takes to make a decision. This process can be linked to the behaviors of populations of neurons and to theories of optimality. Diffusion models have been used successfully in a range of cognitive tasks and as psychometric tools in clinical research to examine individual differences. In this review, we relate the models to both earlier and more recent research in psychology.

The quick-intuition vs. slow-deliberation dilemma from a decision-making perspective

Y-Lan Boureau, Peter Sokol-Hessner, Nathaniel D. Daw, Deciding How To Decide: Self-Control and Meta-Decision Making, Trends in Cognitive Sciences, Volume 19, Issue 11, November 2015, Pages 700-710, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.08.013.

Many different situations related to self control involve competition between two routes to decisions: default and frugal versus more resource-intensive. Examples include habits versus deliberative decisions, fatigue versus cognitive effort, and Pavlovian versus instrumental decision making. We propose that these situations are linked by a strikingly similar core dilemma, pitting the opportunity costs of monopolizing shared resources such as executive functions for some time, against the possibility of obtaining a better outcome. We offer a unifying normative perspective on this underlying rational meta-optimization, review how this may tie together recent advances in many separate areas, and connect several independent models. Finally, we suggest that the crucial mechanisms and meta-decision variables may be shared across domains.

Modelling emotions in adaptive agents through the action selection part of reinforcement learning, plus some references on the neurophysiological bases of RL and a good review of literature on emotions

Joost Broekens , Elmer Jacobs , Catholijn M. Jonker, A reinforcement learning model of joy, distress, hope and fear, Connection Science, Vol. 27, Iss. 3, 2015, DOI: 10.1080/09540091.2015.1031081.

In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, V(s), models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework – coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human–robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

Quantum probability theory as an alternative to classical (Kolgomorov) probability theory for modelling human decision making processes, and a curious description of the effect of a particular ordering of decisions in the complete result

Peter D. Bruza, Zheng Wang, Jerome R. Busemeyer, Quantum cognition: a new theoretical approach to psychology, Trends in Cognitive Sciences, Volume 19, Issue 7, July 2015, Pages 383-393, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.05.001.

What type of probability theory best describes the way humans make judgments under uncertainty and decisions under conflict? Although rational models of cognition have become prominent and have achieved much success, they adhere to the laws of classical probability theory despite the fact that human reasoning does not always conform to these laws. For this reason we have seen the recent emergence of models based on an alternative probabilistic framework drawn from quantum theory. These quantum models show promise in addressing cognitive phenomena that have proven recalcitrant to modeling by means of classical probability theory. This review compares and contrasts probabilistic models based on Bayesian or classical versus quantum principles, and highlights the advantages and disadvantages of each approach.