A possible explanation for the formation of concepts in the human brain

Luca D. Kolibius, Sheena A. Josselyn, Simon Hanslmayr, On the origin of memory neurons in the human hippocampus, Trends in Cognitive Sciences, Volume 29, Issue 5, 2025, Pages 421-433 10.1016/j.tics.2025.01.013.

The hippocampus is essential for episodic memory, yet its coding mechanism remains debated. In humans, two main theories have been proposed: one suggests that concept neurons represent specific elements of an episode, while another posits a conjunctive code, where index neurons code the entire episode. Here, we integrate new findings of index neurons in humans and other animals with the concept-specific memory framework, proposing that concept neurons evolve from index neurons through overlapping memories. This process is supported by engram literature, which posits that neurons are allocated to a memory trace based on excitability and that reactivation induces excitability. By integrating these insights, we connect two historically disparate fields of neuroscience: engram research and human single neuron episodic memory research.

On the problem of choice overload for human cognition

Jessie C. Tanner, Claire T. Hemingway, Choice overload and its consequences for animal decision-making, Trends in Cognitive Sciences, Volume 29, Issue 5, 2025, Pages 403-406 10.1016/j.tics.2025.01.003.

Animals routinely make decisions with important consequences for their survival and reproduction, but they frequently make suboptimal decisions. Here, we explore choice overload as one reason why animals may make suboptimal decisions, arguing that choice overload may have important ecological and evolutionary consequences, and propose future directions.

Improving the adaptation of RL to robots with different parameters through Fuzzy

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

This paper presents a novel approach to improving the generalization capabilities of reinforcement learning (RL) agents for robotic systems with varying physical parameters. We propose the Fuzzy Ensemble of RL policies (FERL), which enhances performance in environments where system parameters differ from those encountered during training. The FERL method selectively fuses aligned policies, determining their collective decision based on fuzzy memberships tailored to the current parameters of the system. Unlike traditional centralized training approaches that rely on shared experiences for policy updates, FERL allows for independent agent training, facilitating efficient parallelization. The effectiveness of FERL is demonstrated through extensive experiments, including a real-world trajectory tracking application in a quadrotor slung-load system. Our method improves the success rates by up to 15.6% across various simulated systems with variable parameters compared to the existing benchmarks of domain randomization and robust adaptive ensemble adversary RL. In the real-world experiments, our method achieves a 30% reduction in 3D position RMSE compared to individual RL policies. The results underscores FERL robustness and applicability to real robotic systems.

Improving reward shaping in Deep RL for avoiding user’s biases and boosting learning efficiency

Jiawei Lin, Xuekai Wei, Weizhi Xian, Jielu Yan, Leong Hou U, Yong Feng, Zhaowei Shang, Mingliang Zhou, Continuous reinforcement learning via advantage value difference reward shaping: A proximal policy optimization perspective, Engineering Applications of Artificial Intelligence, Volume 151, 2025 10.1016/j.engappai.2025.110676.

Deep reinforcement learning has shown great promise in industrial applications. However, these algorithms suffer from low learning efficiency because of sparse reward signals in continuous control tasks. Reward shaping addresses this issue by transforming sparse rewards into more informative signals, but some designs that rely on domain experts or heuristic rules can introduce cognitive biases, leading to suboptimal solutions. To overcome this challenge, this paper proposes the advantage value difference (AVD), a generalized potential-based end-to-end exploration reward function. The main contribution of this paper is to improve the agent’s exploration efficiency, accelerate the learning process, and prevent premature convergence to local optima. The method leverages the temporal difference error to estimate the potential of states and uses the advantage function to guide the learning process toward more effective strategies. In the context of engineering applications, this paper proves the superiority of AVD in continuous control tasks within the multi-joint dynamics with contact (MuJoCo) environment. Specifically, the proposed method achieves an average increase of 23.5% in episode rewards for the Hopper, Swimmer, and Humanoid tasks compared with the state-of-the-art approaches. The results demonstrate the significant improvement in learning efficiency achieved by AVD for industrial robotic systems.

Using Deep RL to model transitions and observations in EKF localization

Islem Kobbi, Abdelhak Benamirouche, Mohamed Tadjine, Enhancing pose estimation for mobile robots: A comparative analysis of deep reinforcement learning algorithms for adaptive Extended Kalman Filter-based estimation, Engineering Applications of Artificial Intelligence, Volume 150, 2025 10.1016/j.engappai.2025.110548.

The Extended Kalman Filter (EKF) is a widely used algorithm for state estimation in control systems. However, its lack of adaptability limits its performance in dynamic and uncertain environments. To address this limitation, we used an approach that leverages Deep Reinforcement Learning (DRL) to achieve adaptive state estimation in the EKF. By integrating DRL techniques, we enable the state estimator to autonomously learn and update the values of the system dynamics and measurement noise covariance matrices, Q and R, based on observed data, which encode environmental changes or system failures. In this research, we compare the performance of four DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor-Critic (SAC), and Proximal Policy Optimization (PPO), in optimizing the EKF’s adaptability. The experiments are conducted in both simulated and real-world settings using the Gazebo simulation environment and the Robot Operating System (ROS). The results demonstrate that the DRL-based adaptive state estimator outperforms traditional methods in terms of estimation accuracy and robustness. The comparative analysis provides insights into the strengths and limitations of different DRL agents, showing that the TD3 and the DDPG are the most effective algorithms, with TD3 achieving superior performance, resulting in a 91% improvement over the classic EKF, due to its delayed update mechanism that reduces training noise. This research highlights the potential of DRL to advance state estimation algorithms, offering valuable insights for future work in adaptive estimation techniques.

When to rely on memories versus sampling sensory information anew to guide behavior

Levi Kumle, Anna C. Nobre, Dejan Draschkow, Sensorimnemonic decisions: choosing memories versus sensory information, Trends in Cognitive Sciences, Volume 29, Issue 4, 2025, Pages 311-313, 10.1016/j.tics.2024.12.010.

We highlight a fundamental psychological function that is central to many of our interactions in the environment – when to rely on memories versus sampling sensory information anew to guide behavior. By operationalizing sensorimnemonic decisions we aim to encourage and advance research into this pivotal process for understanding how memories serve adaptive cognition.

On the explainability of Deep RL and its improvement through the integration of human preferences

Georgios Angelopoulos, Luigi Mangiacapra, Alessandra Rossi, Claudia Di Napoli, Silvia Rossi, What is behind the curtain? Increasing transparency in reinforcement learning with human preferences and explanations, Engineering Applications of Artificial Intelligence, Volume 149, 2025, 10.1016/j.engappai.2025.110520.

In this work, we investigate whether the transparency of a robot’s behaviour is improved when human preferences on the actions the robot performs are taken into account during the learning process. For this purpose, a shielding mechanism called Preference Shielding is proposed and included in a reinforcement learning algorithm to account for human preferences. We also use the shielding to decide when to provide explanations of the robot’s actions. We carried out a within-subjects study involving 26 participants to evaluate the robot’s transparency. Results indicate that considering human preferences during learning improves legibility compared with providing only explanations. In addition, combining human preferences and explanations further amplifies transparency. Results also confirm that increased transparency leads to an increase in people’s perception of the robot’s safety, comfort, and reliability. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications when a robot has to learn a task in the presence of or in collaboration with a human.

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform, making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.

A new perspective of considering convex optimization problems based on electric circuit theory

Stephen P. Boyd, Tetiana Parshakova, Ernest K. Ryu, Jaewook J. Suh, Optimization Algorithm Design via Electric Circuits, NeurIPS 2024 spotlight, 25 Sept 2024, Last Modified: 14 Jan 2025, https://openreview.net/forum?id=9Jmt1eER9P.

We present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the continuous-time dynamics, yielding a provably convergent discrete-time algorithm. Our methodology recovers many classical (distributed) optimization algorithms and enables users to quickly design and explore a wide range of new algorithms with convergence guarantees.

On the innate ability of vertebrates for number recognition and the one of distinguishing ratios of numbers

Elena Lorenzi, Dmitry Kobylkov, Giorgio Vallortigara, Is there an innate sense of number in the brain?, Cerebral Cortex, Volume 35, Issue 2, February 2025, DOI: 10.1093/cercor/bhaf004.

The approximate number system or «sense of number» is a crucial, presymbolic mechanism enabling animals to estimate quantities, which is essential for survival in various contexts (eg estimating numerosities of social companions, prey, predators, and so on). Behavioral studies indicate that a sense of number is widespread across vertebrates and invertebrates. Specific brain regions such as the intraparietal sulcus and prefrontal cortex in primates, or equivalent areas in birds and fish, are involved in numerical estimation, and their activity is modulated by the ratio of quantities. Data gathered across species strongly suggest similar evolutionary pressures for number estimation pointing to a likely common origin, at least across vertebrates. On the other hand, few studies have investigated the origins of the sense of number. Recent findings, however, have shown that numerosity-selective neurons exist in newborn animals, such as domestic chicks and zebrafish, supporting the hypothesis of an innateness of the approximate number system. Control-rearing experiments on visually naïve animals further support the notion that the sense of number is innate and does not need any specific instructive experience in order to be triggered.