Author Archives: Juan-antonio Fernández-madrigal

On the limited throughput of the human cognition and its implications, e.g., in Engineering

Jieyu Zheng1, and Markus Meister, The unbearable slowness of being: Why do we live at 10 bits/s?, Neuron (2024), DOI: 10.1016/j.neuron.2024.11.008.

This article is about the neural conundrum behind the slowness of human behavior. The information throughput of a human being is about 10 bits/s. In comparison, our sensory systems gather data at 10 bits/s. The stark contrast between these numbers remains unexplained and touches on fundamental aspects of brain function: what neural substrate sets this speed limit on the pace of our existence? Why does the brain need billions of neurons to process 10 bits/s? Why can we only think about one thing at a time? The brain seems to operate in two distinct modes: the ‘‘outer’’ brain handles fast high-dimensional sensory and motor signals, whereas the ‘‘inner’’ brain processes the reduced few bits needed to control behavior. Plausible explanations exist for the large neuron numbers in the outer brain, but not for the inner brain, and we propose new research directions to remedy this.

Survey and benchmarking of open-source, low-cost LLMs for generating program code

Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Merieme Bouhandi, Walid Dahhane, El Hassane Ettifouri, Low-cost language models: Survey and performance evaluation on Python code generation, Engineering Applications of Artificial Intelligence, Volume 140, 2025, DOI: 10.1016/j.engappai.2024.109490.

Large Language Models (LLMs) have become a popular choice for many Natural Language Processing (NLP) tasks due to their versatility and ability to produce high-quality results. Specifically, they are increasingly used for automatic code generation to help developers tackle repetitive coding tasks. However, LLMs’ substantial computational and memory requirements often make them inaccessible to users with limited resources. This paper focuses on very low-cost models which offer a more accessible alternative to resource-intensive LLMs. We notably: (1) propose a thorough semi-manual evaluation of their performance in generating Python code, (2) introduce a Chain-of-Thought (CoT) prompting strategy to improve model reasoning and code quality, and (3) propose a new dataset of 60 programming problems, with varied difficulty levels, designed to extend existing benchmarks like HumanEval and EvalPlus. Our findings show that some low-cost compatible models achieve competitive results compared to larger models like ChatGPT despite using significantly fewer resources. We will make our dataset and prompts publicly available to support further research.

A novel safety-critical robotic architecture

Manuel Schrick, Johannes Hinckeldeyn, Marko Thiel, Jochen Kreutzfeldt, A microservice based control architecture for mobile robots in safety-critical applications, Robotics and Autonomous Systems, Volume 183, 2025, DOI: 10.1016/j.robot.2024.104795.

Mobile robots have become more and more common in public space. This increases the importance of meeting safety requirements of autonomous robots. Simple mechanisms, such as emergency braking, alone do not suffice in these highly dynamic situations. Moreover, actual robotic control approaches in literature and practice do not take safety particularly into account. A more sophisticated situational approach for assessment and planning is needed as part of the high-level process control. This paper presents the concept of a safety-critical Robot Control Architecture for mobile robots based on microservices and a Hierarchical Finite State Machine. It expands already existing architectures by drastically reducing the amount of centralized logic and thus increasing the overall system’s level of concurrency, interruptibility and fail-safety. Furthermore, it introduces new potential for code reuse that allows for straightforward implementation of safety mechanisms such as internal diagnostics systems. In doing so, this concept presents the template of a new type of state machine implementation. It is demonstrated with the application of a delivery robot, which was implemented and operated in real public during a broader research project.

Survey on robotics navigation, particularly using RL and other approaches for self-learning that task

Suaib Al Mahmud, Abdurrahman Kamarulariffin, Azhar Mohd Ibrahim, Ahmad Jazlan Haja Mohideen, Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self‐Learning Approaches, Journal of Intelligent & Robotic Systems (2024) 110:120, DOI: 10.1007/s10846-024-02149-5.

Mobile robot navigation has been a very popular topic of practice among researchers since a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous algorithms (traditional AI-based, swarm intelligence-based, self-learning-based) have been built and implemented independently, and also in blended manners. Nevertheless, the problem of efficient autonomous robot navigation persists in multiple degrees due to the limitation of these algorithms. The lack of knowledge on the implemented techniques and their shortcomings act as a hindrance to further development on this topic. This is why an extensive study on the previously implemented algorithms, their applicability, their weaknesses as well as
their potential needs to be conducted in order to assess how to improve mobile robot navigation performance. In this review paper, a comprehensive review of mobile robot navigation algorithms has been conducted. The findings suggest that, even though the self-learning algorithms require huge amounts of training data and have the possibility of learning erroneous behavior, they possess huge potential to overcome challenges rarely addressed by the other traditional algorithms. The findings also insinuate that in the domain of machine learning-based algorithms, integration of knowledge representation with a neuro-symbolic approach has the capacity to improve the accuracy and performance of self-robot navigation training by a significant margin.

A particular action space for human-manipulator physical interaction learning through RL

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at this https URL.

A very good explanaition of how to model certain data to further sampling from it

Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong and Bruno Mlodozeniec, Denoising Diffusion Probabilistic Models in Six Simple Steps, arXiv:2402.04384 [cs.LG] [cs.LG].

Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning.

RL training with a massive amount of scenarios, GPU accelerated

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. To this end, we introduce Kinetix: an open-ended space of physics-based RL environments that can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments, all within a unified framework. Kinetix makes use of our novel hardware-accelerated physics engine Jax2D that allows us to cheaply simulate billions of environment steps during training. Our trained agent exhibits strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent *tabula rasa*. This includes solving some environments that standard RL training completely fails at. We believe this demonstrates the feasibility of large scale, mixed-quality pre-training for online RL and we hope that Kinetix will serve as a useful framework to investigate this further.

A good review of allostasis and control theory applied to physiology

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

The brain regulates the body by anticipating its needs and attempting to meet them before they arise – a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory’s applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses.

Generating intrinsic rewards to address the sparse reward problem of RL

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

In sparse extrinsic reward settings, reinforcement learning remains a challenge despite increasing interest in this field. Existing approaches suggest that intrinsic rewards can alleviate issues caused by reward sparsity. However, many studies overlook the critical role of temporal information, essential for human curiosity. This article introduces a novel intrinsic reward mechanism inspired by human learning processes, where curiosity is evaluated by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, periodically saving snapshots of the model parameters, and employing the nuclear norm to assess the temporal inconsistency between predictions from different snapshots as intrinsic rewards. Additionally, we propose a variational weighting mechanism to adaptively assign weights to the snapshots, enhancing the model’s robustness and performance. Experimental results across various benchmark environments demonstrate the efficacy of our approach, which outperforms other state-of-the-art methods without incurring additional training costs and exhibits higher noise tolerance. Our findings indicate that leveraging temporal information in intrinsic rewards can significantly improve exploration performance, motivating future research to develop more robust and accurate reward systems for reinforcement learning.

Using multiple data with diverse fidelities to provide surrogate simulations through GPs

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

In the engineering design of aerospace vehicles, design data at different stages exhibit hierarchical and heterogeneous distribution characteristics. Specifically, high-fidelity design data (such as from computational fluid dynamics simulations and flight tests) are costly and time-consuming to obtain. Moreover, the limited high-precision samples that are acquired often fail to cover the entire design space, resulting in a distribution characterized by small sample sizes. A critical challenge in data-driven modeling is efficiently fusing low-fidelity data with limited heterogeneous high-fidelity data to improve model accuracy and predictive performance. In response to this challenge, this paper introduces a Gaussian process fusion method for multi-fidelity data, founded on distribution characteristics. Multi-fidelity data are represented as intermediate surrogates using Gaussian processes, identifying heteroscedastic noise properties and deriving posterior distributions. The fusion is then treated as an optimization problem for prediction variance, using K-nearest neighbors and spatial clustering to determine optimal weights, which are adaptively adjusted based on sample density. These weights are adaptively adjusted based on the sample density to strengthen the local modeling behavior. The paper concludes with a comparative analysis, evaluating the proposed method against other conventional approaches using numerical cases and an aerodynamic prediction scenario for aerospace vehicles. A comparative analysis shows that the proposed method improves global modeling accuracy by 45% and reduces the demand for high-fidelity samples by over 40% compared to traditional methods. Applied in aerospace design, the method effectively merges multi-source data, establishing a robust hypersonic aerodynamic database while controlling modeling costs and demonstrating robustness to sample distribution.