Category Archives: Cognitive Sciences

Improving reward shaping in Deep RL for avoiding user’s biases and boosting learning efficiency

Jiawei Lin, Xuekai Wei, Weizhi Xian, Jielu Yan, Leong Hou U, Yong Feng, Zhaowei Shang, Mingliang Zhou, Continuous reinforcement learning via advantage value difference reward shaping: A proximal policy optimization perspective, Engineering Applications of Artificial Intelligence, Volume 151, 2025 10.1016/j.engappai.2025.110676.

Deep reinforcement learning has shown great promise in industrial applications. However, these algorithms suffer from low learning efficiency because of sparse reward signals in continuous control tasks. Reward shaping addresses this issue by transforming sparse rewards into more informative signals, but some designs that rely on domain experts or heuristic rules can introduce cognitive biases, leading to suboptimal solutions. To overcome this challenge, this paper proposes the advantage value difference (AVD), a generalized potential-based end-to-end exploration reward function. The main contribution of this paper is to improve the agent’s exploration efficiency, accelerate the learning process, and prevent premature convergence to local optima. The method leverages the temporal difference error to estimate the potential of states and uses the advantage function to guide the learning process toward more effective strategies. In the context of engineering applications, this paper proves the superiority of AVD in continuous control tasks within the multi-joint dynamics with contact (MuJoCo) environment. Specifically, the proposed method achieves an average increase of 23.5% in episode rewards for the Hopper, Swimmer, and Humanoid tasks compared with the state-of-the-art approaches. The results demonstrate the significant improvement in learning efficiency achieved by AVD for industrial robotic systems.

When to rely on memories versus sampling sensory information anew to guide behavior

Levi Kumle, Anna C. Nobre, Dejan Draschkow, Sensorimnemonic decisions: choosing memories versus sensory information, Trends in Cognitive Sciences, Volume 29, Issue 4, 2025, Pages 311-313, 10.1016/j.tics.2024.12.010.

We highlight a fundamental psychological function that is central to many of our interactions in the environment – when to rely on memories versus sampling sensory information anew to guide behavior. By operationalizing sensorimnemonic decisions we aim to encourage and advance research into this pivotal process for understanding how memories serve adaptive cognition.

On the innate ability of vertebrates for number recognition and the one of distinguishing ratios of numbers

Elena Lorenzi, Dmitry Kobylkov, Giorgio Vallortigara, Is there an innate sense of number in the brain?, Cerebral Cortex, Volume 35, Issue 2, February 2025, DOI: 10.1093/cercor/bhaf004.

The approximate number system or «sense of number» is a crucial, presymbolic mechanism enabling animals to estimate quantities, which is essential for survival in various contexts (eg estimating numerosities of social companions, prey, predators, and so on). Behavioral studies indicate that a sense of number is widespread across vertebrates and invertebrates. Specific brain regions such as the intraparietal sulcus and prefrontal cortex in primates, or equivalent areas in birds and fish, are involved in numerical estimation, and their activity is modulated by the ratio of quantities. Data gathered across species strongly suggest similar evolutionary pressures for number estimation pointing to a likely common origin, at least across vertebrates. On the other hand, few studies have investigated the origins of the sense of number. Recent findings, however, have shown that numerosity-selective neurons exist in newborn animals, such as domestic chicks and zebrafish, supporting the hypothesis of an innateness of the approximate number system. Control-rearing experiments on visually naïve animals further support the notion that the sense of number is innate and does not need any specific instructive experience in order to be triggered.

It seems that the human brain working memory uses pointers

Edward Awh, Edward K. Vogel, Working memory needs pointers, Trends in Cognitive Sciences, Volume 29, Issue 3, 2025, Pages 230-241, DOI: 10.1016/j.tics.2024.12.006.

Cognitive neuroscience has converged on a definition of working memory (WM) as a capacity-limited system that maintains highly accessible representations via stimulus-specific neural patterns. We argue that this standard definition may be incomplete. We highlight the fundamental need to recognize specific instances or tokens and to bind those tokens to the surrounding context. We propose that contextual binding is supported by spatiotemporal ‘pointers’ and that pointers are the source of neural signals that track the number of stored items, independent of their content. These content-independent pointers may provide a productive perspective for understanding item-based capacity limits in WM and the role of WM as a gateway for long-term storage.

On the reasons of the pervasiveness of the myth of meritocracy

Ian R. Hadden, Céline Darnon, Lewis Doyle, Matthew J. Easterbrook, Sébastien Goudeau, Andrei Cimpian, Why the belief in meritocracy is so pervasive, Trends in Cognitive Sciences, Volume 29, Issue 2, 2025, Pages 101-104, DOI: 10.1016/j.tics.2024.12.008.

People worldwide tend to believe that their societies are more meritocratic than they actually are. We propose the belief in meritocracy is widespread because it is rooted in simple, seemingly obvious causal–explanatory intuitions. Our proposal suggests solutions for debunking the myth of meritocracy and increasing support for equity-oriented policies.

On the two-ways of learning language in humans: both abstracting detailed knowledge and refining still-only-abstract one

Susan Goldin-Meadow, Inbal Arnon, Whole-to-part development in language creation, Trends in Cognitive Sciences, Volume 29, Issue 1, 2025, Pages 12-14, DOI: 10.1016/j.tics.2024.09.015.

Children approach language by learning parts and constructing wholes. But they can also first learn wholes and then discover parts. We demonstrate this understudied yet impactful process in children creating language without input. Whole-to-part learning thus need not be driven by hard-to-segment input and is a bias that children bring to language.

On the limited throughput of the human cognition and its implications, e.g., in Engineering

Jieyu Zheng1, and Markus Meister, The unbearable slowness of being: Why do we live at 10 bits/s?, Neuron (2024), DOI: 10.1016/j.neuron.2024.11.008.

This article is about the neural conundrum behind the slowness of human behavior. The information throughput of a human being is about 10 bits/s. In comparison, our sensory systems gather data at 10 bits/s. The stark contrast between these numbers remains unexplained and touches on fundamental aspects of brain function: what neural substrate sets this speed limit on the pace of our existence? Why does the brain need billions of neurons to process 10 bits/s? Why can we only think about one thing at a time? The brain seems to operate in two distinct modes: the ‘‘outer’’ brain handles fast high-dimensional sensory and motor signals, whereas the ‘‘inner’’ brain processes the reduced few bits needed to control behavior. Plausible explanations exist for the large neuron numbers in the outer brain, but not for the inner brain, and we propose new research directions to remedy this.

Survey and benchmarking of open-source, low-cost LLMs for generating program code

Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Merieme Bouhandi, Walid Dahhane, El Hassane Ettifouri, Low-cost language models: Survey and performance evaluation on Python code generation, Engineering Applications of Artificial Intelligence, Volume 140, 2025, DOI: 10.1016/j.engappai.2024.109490.

Large Language Models (LLMs) have become a popular choice for many Natural Language Processing (NLP) tasks due to their versatility and ability to produce high-quality results. Specifically, they are increasingly used for automatic code generation to help developers tackle repetitive coding tasks. However, LLMs’ substantial computational and memory requirements often make them inaccessible to users with limited resources. This paper focuses on very low-cost models which offer a more accessible alternative to resource-intensive LLMs. We notably: (1) propose a thorough semi-manual evaluation of their performance in generating Python code, (2) introduce a Chain-of-Thought (CoT) prompting strategy to improve model reasoning and code quality, and (3) propose a new dataset of 60 programming problems, with varied difficulty levels, designed to extend existing benchmarks like HumanEval and EvalPlus. Our findings show that some low-cost compatible models achieve competitive results compared to larger models like ChatGPT despite using significantly fewer resources. We will make our dataset and prompts publicly available to support further research.

RL training with a massive amount of scenarios, GPU accelerated

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. To this end, we introduce Kinetix: an open-ended space of physics-based RL environments that can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments, all within a unified framework. Kinetix makes use of our novel hardware-accelerated physics engine Jax2D that allows us to cheaply simulate billions of environment steps during training. Our trained agent exhibits strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent *tabula rasa*. This includes solving some environments that standard RL training completely fails at. We believe this demonstrates the feasibility of large scale, mixed-quality pre-training for online RL and we hope that Kinetix will serve as a useful framework to investigate this further.

A good review of allostasis and control theory applied to physiology

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

The brain regulates the body by anticipating its needs and attempting to meet them before they arise – a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory’s applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses.