Category Archives: Psycho-physiological Bases Of Engineering

How mood influcences learning, concretely perception of rewards in the context of reinforcement learning

Eran Eldar, Robb B. Rutledge, Raymond J. Dolan, Yael Niv, Mood as Representation of Momentum, Trends in Cognitive Sciences, Volume 20, Issue 1, January 2016, Pages 15-24, ISSN 1364-6613, DOI: j.tics.2015.07.010.

Experiences affect mood, which in turn affects subsequent experiences. Recent studies suggest two specific principles. First, mood depends on how recent reward outcomes differ from expectations. Second, mood biases the way we perceive outcomes (e.g., rewards), and this bias affects learning about those outcomes. We propose that this two-way interaction serves to mitigate inefficiencies in the application of reinforcement learning to real-world problems. Specifically, we propose that mood represents the overall momentum of recent outcomes, and its biasing influence on the perception of outcomes ‘corrects’ learning to account for environmental dependencies. We describe potential dysfunctions of this adaptive mechanism that might contribute to the symptoms of mood disorders.

Modelling emotions in adaptive agents through the action selection part of reinforcement learning, plus some references on the neurophysiological bases of RL and a good review of literature on emotions

Joost Broekens , Elmer Jacobs , Catholijn M. Jonker, A reinforcement learning model of joy, distress, hope and fear, Connection Science, Vol. 27, Iss. 3, 2015, DOI: 10.1080/09540091.2015.1031081.

In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, V(s), models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework – coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human–robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

On how the human cognition detects regularities in noisy sensory data (“Statistical learning” in psychology terms)

Annabelle Goujon, André Didierjean, Simon Thorpe, Investigating implicit statistical learning mechanisms through contextual cueing, Trends in Cognitive Sciences, Volume 19, Issue 9, September 2015, Pages 524-533, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.07.009.

Since its inception, the contextual cueing (CC) paradigm has generated considerable interest in various fields of cognitive sciences because it constitutes an elegant approach to understanding how statistical learning (SL) mechanisms can detect contextual regularities during a visual search. In this article we review and discuss five aspects of CC: (i) the implicit nature of learning, (ii) the mechanisms involved in CC, (iii) the mediating factors affecting CC, (iv) the generalization of CC phenomena, and (v) the dissociation between implicit and explicit CC phenomena. The findings suggest that implicit SL is an inherent component of ongoing processing which operates through clustering, associative, and reinforcement processes at various levels of sensory-motor processing, and might result from simple spike-timing-dependent plasticity.

Quantum probability theory as an alternative to classical (Kolgomorov) probability theory for modelling human decision making processes, and a curious description of the effect of a particular ordering of decisions in the complete result

Peter D. Bruza, Zheng Wang, Jerome R. Busemeyer, Quantum cognition: a new theoretical approach to psychology, Trends in Cognitive Sciences, Volume 19, Issue 7, July 2015, Pages 383-393, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.05.001.

What type of probability theory best describes the way humans make judgments under uncertainty and decisions under conflict? Although rational models of cognition have become prominent and have achieved much success, they adhere to the laws of classical probability theory despite the fact that human reasoning does not always conform to these laws. For this reason we have seen the recent emergence of models based on an alternative probabilistic framework drawn from quantum theory. These quantum models show promise in addressing cognitive phenomena that have proven recalcitrant to modeling by means of classical probability theory. This review compares and contrasts probabilistic models based on Bayesian or classical versus quantum principles, and highlights the advantages and disadvantages of each approach.

Semantic and syntactic bootstrapped learning for robots, inspired in similar processes in humans, that use language as a scaffolding mechanism to improve learning in unknown situations

Worgotter, F.; Geib, C.; Tamosiunaite, M.; Aksoy, E.E.; Piater, J.; Hanchen Xiong; Ude, A.; Nemec, B.; Kraft, D.; Kruger, N.; Wachter, M.; Asfour, T., Structural Bootstrapping—A Novel, Generative Mechanism for Faster and More Efficient Acquisition of Action-Knowledge, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.140,154, June 2015, DOI: 10.1109/TAMD.2015.2427233.

Humans, but also robots, learn to improve their behavior. Without existing knowledge, learning either needs to be explorative and, thus, slow or-to be more efficient-it needs to rely on supervision, which may not always be available. However, once some knowledge base exists an agent can make use of it to improve learning efficiency and speed. This happens for our children at the age of around three when they very quickly begin to assimilate new information by making guided guesses how this fits to their prior knowledge. This is a very efficient generative learning mechanism in the sense that the existing knowledge is generalized into as-yet unexplored, novel domains. So far generative learning has not been employed for robots and robot learning remains to be a slow and tedious process. The goal of the current study is to devise for the first time a general framework for a generative process that will improve learning and which can be applied at all different levels of the robot’s cognitive architecture. To this end, we introduce the concept of structural bootstrapping-borrowed and modified from child language acquisition-to define a probabilistic process that uses existing knowledge together with new observations to supplement our robot’s data-base with missing information about planning-, object-, as well as, action-relevant entities. In a kitchen scenario, we use the example of making batter by pouring and mixing two components and show that the agent can efficiently acquire new knowledge about planning operators, objects as well as required motor pattern for stirring by structural bootstrapping. Some benchmarks are shown, too, that demonstrate how structural bootstrapping improves performance.

Developmental approach for a robot manipulator that learns in several bootstrapped stages, strongly inspired in infant development

Ugur, E.; Nagai, Y.; Sahin, E.; Oztop, E., Staged Development of Robot Skills: Behavior Formation, Affordance Learning and Imitation with Motionese, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.119,139, June 2015, DOI: 10.1109/TAMD.2015.2426192.

Inspired by infant development, we propose a three staged developmental framework for an anthropomorphic robot manipulator. In the first stage, the robot is initialized with a basic reach-and- enclose-on-contact movement capability, and discovers a set of behavior primitives by exploring its movement parameter space. In the next stage, the robot exercises the discovered behaviors on different objects, and learns the caused effects; effectively building a library of affordances and associated predictors. Finally, in the third stage, the learned structures and predictors are used to bootstrap complex imitation and action learning with the help of a cooperative tutor. The main contribution of this paper is the realization of an integrated developmental system where the structures emerging from the sensorimotor experience of an interacting real robot are used as the sole building blocks of the subsequent stages that generate increasingly more complex cognitive capabilities. The proposed framework includes a number of common features with infant sensorimotor development. Furthermore, the findings obtained from the self-exploration and motionese guided human-robot interaction experiments allow us to reason about the underlying mechanisms of simple-to-complex sensorimotor skill progression in human infants.

Finding the common utility of actions in several tasks learnt in the same domain in order to reduce the learning cost of reinforcement learning

Rosman, B.; Ramamoorthy, S., Action Priors for Learning Domain Invariances, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.107,118, June 2015, DOI: 10.1109/TAMD.2015.2419715.

An agent tasked with solving a number of different decision making problems in similar environments has an opportunity to learn over a longer timescale than each individual task. Through examining solutions to different tasks, it can uncover behavioral invariances in the domain, by identifying actions to be prioritized in local contexts, invariant to task details. This information has the effect of greatly increasing the speed of solving new problems. We formalise this notion as action priors, defined as distributions over the action space, conditioned on environment state, and show how these can be learnt from a set of value functions. We apply action priors in the setting of reinforcement learning, to bias action selection during exploration. Aggressive use of action priors performs context based pruning of the available actions, thus reducing the complexity of lookahead during search. We additionally define action priors over observation features, rather than states, which provides further flexibility and generalizability, with the additional benefit of enabling feature selection. Action priors are demonstrated in experiments in a simulated factory environment and a large random graph domain, and show significant speed ups in learning new tasks. Furthermore, we argue that this mechanism is cognitively plausible, and is compatible with findings from cognitive psychology.

Neural support for the cognitive map: place cells and grid cells

Kate J. Jeffery, Distorting the metric fabric of the cognitive map, Trends in Cognitive Sciences, Volume 19, Issue 6, June 2015, Pages 300-301, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.04.001..

Grid cells are neurons whose regularly spaced firing fields form apparently symmetric arrays, or grids, that are thought to collectively provide an environment-independent metric framework for the brain’s cognitive map of space. However, two recent studies show that grids are naturally distorted, revealing greater local environment-specific effects than previously recognized.

Reinforcement learning when a human is the one providing the rewards to the algorithm

W. Bradley Knox, Peter Stone, Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance, Artificial Intelligence, Volume 225, August 2015, Pages 24-50, ISSN 0004-3702, DOI: 10.1016/j.artint.2015.03.009.

Several studies have demonstrated that reward from a human trainer can be a powerful feedback signal for control-learning algorithms. However, the space of algorithms for learning from such human reward has hitherto not been explored systematically. Using model-based reinforcement learning from human reward, this article investigates the problem of learning from human reward through six experiments, focusing on the relationships between reward positivity, which is how generally positive a trainer’s reward values are; temporal discounting, the extent to which future reward is discounted in value; episodicity, whether task learning occurs in discrete learning episodes instead of one continuing session; and task performance, the agent’s performance on the task the trainer intends to teach. This investigation is motivated by the observation that an agent can pursue different learning objectives, leading to different resulting behaviors. We search for learning objectives that lead the agent to behave as the trainer intends.
We identify and empirically support a “positive circuits” problem with low discounting (i.e., high discount factors) for episodic, goal-based tasks that arises from an observed bias among humans towards giving positive reward, resulting in an endorsement of myopic learning for such domains. We then show that converting simple episodic tasks to be non-episodic (i.e., continuing) reduces and in some cases resolves issues present in episodic tasks with generally positive reward and—relatedly—enables highly successful learning with non-myopic valuation in multiple user studies. The primary learning algorithm introduced in this article, which we call “vi-tamer”, is the first algorithm to successfully learn non-myopically from reward generated by a human trainer; we also empirically show that such non-myopic valuation facilitates higher-level understanding of the task. Anticipating the complexity of real-world problems, we perform further studies—one with a failure state added—that compare (1) learning when states are updated asynchronously with local bias—i.e., states quickly reachable from the agent’s current state are updated more often than other states—to (2) learning with the fully synchronous sweeps across each state in the vi-tamer algorithm. With these locally biased updates, we find that the general positivity of human reward creates problems even for continuing tasks, revealing a distinct research challenge for future work.

A bayesian framework to explain magnitude estimation in the human mind

Frederike H. Petzschner, Stefan Glasauer, Klaas E. Stephan, A Bayesian perspective on magnitude estimation, Trends in Cognitive Sciences, Volume 19, Issue 5, May 2015, Pages 285-293, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.03.002.

Our representation of the physical world requires judgments of magnitudes, such as loudness, distance, or time. Interestingly, magnitude estimates are often not veridical but subject to characteristic biases. These biases are strikingly similar across different sensory modalities, suggesting common processing mechanisms that are shared by different sensory systems. However, the search for universal neurobiological principles of magnitude judgments requires guidance by formal theories. Here, we discuss a unifying Bayesian framework for understanding biases in magnitude estimation. This Bayesian perspective enables a re-interpretation of a range of established psychophysical findings, reconciles seemingly incompatible classical views on magnitude estimation, and can guide future investigations of magnitude estimation and its neurobiological mechanisms in health and in psychiatric diseases, such as schizophrenia.