RL for multiple tasks in the case of quadrotors and a short state of the art about the general problem

J. Xing, I. Geles, Y. Song, E. Aljalbout and D. Scaramuzza, Multi-Task Reinforcement Learning for Quadrotors, IEEE Robotics and Automation Letters, vol. 10, no. 3, pp. 2112-2119, March 2025, DOI: 10.1109/LRA.2024.3520894.

Reinforcement learning (RL) has shown great effectiveness in quadrotor control, enabling specialized policies to develop even human-champion-level performance in single-task scenarios. However, these specialized policies often struggle with novel tasks, requiring a complete retraining of the policy from scratch. To address this limitation, this paper presents a novel multi-task reinforcement learning (MTRL) framework tailored for quadrotor control, leveraging the shared physical dynamics of the platform to enhance sample efficiency and task performance. By employing a multi-critic architecture and shared task encoders, our framework facilitates knowledge transfer across tasks, enabling a single policy to execute diverse maneuvers, including high-speed stabilization, velocity tracking, and autonomous racing. Our experimental results, validated both in simulation and real-world scenarios, demonstrate that our framework outperforms baseline approaches in terms of sample efficiency and overall task performance. Video is available at https://youtu.be/HfK9UT1OVnY.

An adaptive KF for estimating angles from IMUs

Zolfa Anvari, Ali Mirhaghgoo, Yasin Salehi, Real-time angle estimation in IMU sensors: An adaptive Kalman filter approach with forgetting factor, Mechatronics, Volume 106, 2025, DOI: 10.1016/j.mechatronics.2024.103280.

In recent years, the applications of Inertial Measurement Unit (IMU) sensors have witnessed significant growth across multiple fields. However, challenges regarding angle estimation using these sensors have emerged, primarily because of the lack of accuracy in accelerometer-based dynamic motion measurements and the associated bias and error accumulation when combined with gyroscope integration. Consequently, the Kalman filter has become a popular choice for addressing these issues, as it enables the sensor to operate dynamically. Despite its widespread use, the Kalman filter requires precise noise statistics estimation for optimal noise cancellation. To accommodate this requirement, adaptive Kalman filter algorithms have been developed for estimating zero-mean Gaussian process matrix (Q) and measurement matrix (R) variances. This study introduces a real-time adaptive approach that employs a forgetting factor to precisely estimate roll and pitch angles in a 6-axis IMU. The study’s novelty lies in its algorithm, which computes the forgetting factor based on the estimation error of the last samples in the sequence. Experimental results for roll angle indicate that, in response to a step change signal, this method achieves a 54%, 39%, and 70% reduction in RMS error relative to the raw sensor data, traditional Kalman filter, and a hybrid adaptive method, respectively. Moreover, this technique exhibits significant improvements in both fixed and sinusoidal conditions for roll and pitch angles, successfully carrying out tasks within required timescales without failures related to computation time.

On the reasons of the pervasiveness of the myth of meritocracy

Ian R. Hadden, Céline Darnon, Lewis Doyle, Matthew J. Easterbrook, Sébastien Goudeau, Andrei Cimpian, Why the belief in meritocracy is so pervasive, Trends in Cognitive Sciences, Volume 29, Issue 2, 2025, Pages 101-104, DOI: 10.1016/j.tics.2024.12.008.

People worldwide tend to believe that their societies are more meritocratic than they actually are. We propose the belief in meritocracy is widespread because it is rooted in simple, seemingly obvious causal–explanatory intuitions. Our proposal suggests solutions for debunking the myth of meritocracy and increasing support for equity-oriented policies.

On the two-ways of learning language in humans: both abstracting detailed knowledge and refining still-only-abstract one

Susan Goldin-Meadow, Inbal Arnon, Whole-to-part development in language creation, Trends in Cognitive Sciences, Volume 29, Issue 1, 2025, Pages 12-14, DOI: 10.1016/j.tics.2024.09.015.

Children approach language by learning parts and constructing wholes. But they can also first learn wholes and then discover parts. We demonstrate this understudied yet impactful process in children creating language without input. Whole-to-part learning thus need not be driven by hard-to-segment input and is a bias that children bring to language.

On the use of GPUs for parallelization of MPCs through the parallelization of symbolic mathematical expressions

S. H. Jeon, S. Hong, H. J. Lee, C. Khazoom and S. Kim, CusADi: A GPU Parallelization Framework for Symbolic Expressions and Optimal Control, IEEE Robotics and Automation Letters, vol. 10, no. 2, pp. 899-906, Feb. 2025, DOI: 10.1109/LRA.2024.3512254.

The parallelism afforded by GPUs presents significant advantages in training controllers through reinforcement learning (RL). However, integrating model-based optimization into this process remains challenging due to the complexity of formulating and solving optimization problems across thousands of instances. In this work, we present CusADi , an extension of the casadi symbolic framework to support the parallelization of arbitrary closed-form expressions on GPUs with CUDA . We also formulate a closed-form approximation for solving general optimal control problems, enabling large-scale parallelization and evaluation of MPC controllers. Our results show a ten-fold speedup relative to similar MPC implementation on the CPU, and we demonstrate the use of CusADi for various applications, including parallel simulation, parameter sweeps, and policy training.

On the limited throughput of the human cognition and its implications, e.g., in Engineering

Jieyu Zheng1, and Markus Meister, The unbearable slowness of being: Why do we live at 10 bits/s?, Neuron (2024), DOI: 10.1016/j.neuron.2024.11.008.

This article is about the neural conundrum behind the slowness of human behavior. The information throughput of a human being is about 10 bits/s. In comparison, our sensory systems gather data at 10 bits/s. The stark contrast between these numbers remains unexplained and touches on fundamental aspects of brain function: what neural substrate sets this speed limit on the pace of our existence? Why does the brain need billions of neurons to process 10 bits/s? Why can we only think about one thing at a time? The brain seems to operate in two distinct modes: the ‘‘outer’’ brain handles fast high-dimensional sensory and motor signals, whereas the ‘‘inner’’ brain processes the reduced few bits needed to control behavior. Plausible explanations exist for the large neuron numbers in the outer brain, but not for the inner brain, and we propose new research directions to remedy this.

Survey and benchmarking of open-source, low-cost LLMs for generating program code

Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Merieme Bouhandi, Walid Dahhane, El Hassane Ettifouri, Low-cost language models: Survey and performance evaluation on Python code generation, Engineering Applications of Artificial Intelligence, Volume 140, 2025, DOI: 10.1016/j.engappai.2024.109490.

Large Language Models (LLMs) have become a popular choice for many Natural Language Processing (NLP) tasks due to their versatility and ability to produce high-quality results. Specifically, they are increasingly used for automatic code generation to help developers tackle repetitive coding tasks. However, LLMs’ substantial computational and memory requirements often make them inaccessible to users with limited resources. This paper focuses on very low-cost models which offer a more accessible alternative to resource-intensive LLMs. We notably: (1) propose a thorough semi-manual evaluation of their performance in generating Python code, (2) introduce a Chain-of-Thought (CoT) prompting strategy to improve model reasoning and code quality, and (3) propose a new dataset of 60 programming problems, with varied difficulty levels, designed to extend existing benchmarks like HumanEval and EvalPlus. Our findings show that some low-cost compatible models achieve competitive results compared to larger models like ChatGPT despite using significantly fewer resources. We will make our dataset and prompts publicly available to support further research.

A novel safety-critical robotic architecture

Manuel Schrick, Johannes Hinckeldeyn, Marko Thiel, Jochen Kreutzfeldt, A microservice based control architecture for mobile robots in safety-critical applications, Robotics and Autonomous Systems, Volume 183, 2025, DOI: 10.1016/j.robot.2024.104795.

Mobile robots have become more and more common in public space. This increases the importance of meeting safety requirements of autonomous robots. Simple mechanisms, such as emergency braking, alone do not suffice in these highly dynamic situations. Moreover, actual robotic control approaches in literature and practice do not take safety particularly into account. A more sophisticated situational approach for assessment and planning is needed as part of the high-level process control. This paper presents the concept of a safety-critical Robot Control Architecture for mobile robots based on microservices and a Hierarchical Finite State Machine. It expands already existing architectures by drastically reducing the amount of centralized logic and thus increasing the overall system’s level of concurrency, interruptibility and fail-safety. Furthermore, it introduces new potential for code reuse that allows for straightforward implementation of safety mechanisms such as internal diagnostics systems. In doing so, this concept presents the template of a new type of state machine implementation. It is demonstrated with the application of a delivery robot, which was implemented and operated in real public during a broader research project.

Survey on robotics navigation, particularly using RL and other approaches for self-learning that task

Suaib Al Mahmud, Abdurrahman Kamarulariffin, Azhar Mohd Ibrahim, Ahmad Jazlan Haja Mohideen, Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self‐Learning Approaches, Journal of Intelligent & Robotic Systems (2024) 110:120, DOI: 10.1007/s10846-024-02149-5.

Mobile robot navigation has been a very popular topic of practice among researchers since a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous algorithms (traditional AI-based, swarm intelligence-based, self-learning-based) have been built and implemented independently, and also in blended manners. Nevertheless, the problem of efficient autonomous robot navigation persists in multiple degrees due to the limitation of these algorithms. The lack of knowledge on the implemented techniques and their shortcomings act as a hindrance to further development on this topic. This is why an extensive study on the previously implemented algorithms, their applicability, their weaknesses as well as
their potential needs to be conducted in order to assess how to improve mobile robot navigation performance. In this review paper, a comprehensive review of mobile robot navigation algorithms has been conducted. The findings suggest that, even though the self-learning algorithms require huge amounts of training data and have the possibility of learning erroneous behavior, they possess huge potential to overcome challenges rarely addressed by the other traditional algorithms. The findings also insinuate that in the domain of machine learning-based algorithms, integration of knowledge representation with a neuro-symbolic approach has the capacity to improve the accuracy and performance of self-robot navigation training by a significant margin.

A particular action space for human-manipulator physical interaction learning through RL

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at this https URL.