kipr | Scientific papers that were of interest for Prof. Juan-Antonio Fernández-Madrigal

Improving the adaptation of RL to robots with different parameters through Fuzzy

May 2, 2025 08:42 , Juan-Antonio Fernández-Madrigal

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

This paper presents a novel approach to improving the generalization capabilities of reinforcement learning (RL) agents for robotic systems with varying physical parameters. We propose the Fuzzy Ensemble of RL policies (FERL), which enhances performance in environments where system parameters differ from those encountered during training. The FERL method selectively fuses aligned policies, determining their collective decision based on fuzzy memberships tailored to the current parameters of the system. Unlike traditional centralized training approaches that rely on shared experiences for policy updates, FERL allows for independent agent training, facilitating efficient parallelization. The effectiveness of FERL is demonstrated through extensive experiments, including a real-world trajectory tracking application in a quadrotor slung-load system. Our method improves the success rates by up to 15.6% across various simulated systems with variable parameters compared to the existing benchmarks of domain randomization and robust adaptive ensemble adversary RL. In the real-world experiments, our method achieves a 30% reduction in 3D position RMSE compared to individual RL policies. The results underscores FERL robustness and applicability to real robotic systems.

Improving reward shaping in Deep RL for avoiding user’s biases and boosting learning efficiency

April 24, 2025 08:32 , Juan-Antonio Fernández-Madrigal

Jiawei Lin, Xuekai Wei, Weizhi Xian, Jielu Yan, Leong Hou U, Yong Feng, Zhaowei Shang, Mingliang Zhou, Continuous reinforcement learning via advantage value difference reward shaping: A proximal policy optimization perspective, Engineering Applications of Artificial Intelligence, Volume 151, 2025 10.1016/j.engappai.2025.110676.

Deep reinforcement learning has shown great promise in industrial applications. However, these algorithms suffer from low learning efficiency because of sparse reward signals in continuous control tasks. Reward shaping addresses this issue by transforming sparse rewards into more informative signals, but some designs that rely on domain experts or heuristic rules can introduce cognitive biases, leading to suboptimal solutions. To overcome this challenge, this paper proposes the advantage value difference (AVD), a generalized potential-based end-to-end exploration reward function. The main contribution of this paper is to improve the agent’s exploration efficiency, accelerate the learning process, and prevent premature convergence to local optima. The method leverages the temporal difference error to estimate the potential of states and uses the advantage function to guide the learning process toward more effective strategies. In the context of engineering applications, this paper proves the superiority of AVD in continuous control tasks within the multi-joint dynamics with contact (MuJoCo) environment. Specifically, the proposed method achieves an average increase of 23.5% in episode rewards for the Hopper, Swimmer, and Humanoid tasks compared with the state-of-the-art approaches. The results demonstrate the significant improvement in learning efficiency achieved by AVD for industrial robotic systems.

Posted in: Reinforcement learning in AI , Tagged: Deep reinforcement learning, Mujoco, Reward shaping, Sparse rewards

Using Deep RL to model transitions and observations in EKF localization

April 21, 2025 12:54 , Juan-Antonio Fernández-Madrigal

Islem Kobbi, Abdelhak Benamirouche, Mohamed Tadjine, Enhancing pose estimation for mobile robots: A comparative analysis of deep reinforcement learning algorithms for adaptive Extended Kalman Filter-based estimation, Engineering Applications of Artificial Intelligence, Volume 150, 2025 10.1016/j.engappai.2025.110548.

The Extended Kalman Filter (EKF) is a widely used algorithm for state estimation in control systems. However, its lack of adaptability limits its performance in dynamic and uncertain environments. To address this limitation, we used an approach that leverages Deep Reinforcement Learning (DRL) to achieve adaptive state estimation in the EKF. By integrating DRL techniques, we enable the state estimator to autonomously learn and update the values of the system dynamics and measurement noise covariance matrices, Q and R, based on observed data, which encode environmental changes or system failures. In this research, we compare the performance of four DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor-Critic (SAC), and Proximal Policy Optimization (PPO), in optimizing the EKF’s adaptability. The experiments are conducted in both simulated and real-world settings using the Gazebo simulation environment and the Robot Operating System (ROS). The results demonstrate that the DRL-based adaptive state estimator outperforms traditional methods in terms of estimation accuracy and robustness. The comparative analysis provides insights into the strengths and limitations of different DRL agents, showing that the TD3 and the DDPG are the most effective algorithms, with TD3 achieving superior performance, resulting in a 91% improvement over the classic EKF, due to its delayed update mechanism that reduces training noise. This research highlights the potential of DRL to advance state estimation algorithms, offering valuable insights for future work in adaptive estimation techniques.

Posted in: Mobile Robot Localization , Tagged: Deep reinforcement learning, EKF

When to rely on memories versus sampling sensory information anew to guide behavior

April 3, 2025 06:41 , Juan-Antonio Fernández-Madrigal

Levi Kumle, Anna C. Nobre, Dejan Draschkow, Sensorimnemonic decisions: choosing memories versus sensory information, Trends in Cognitive Sciences, Volume 29, Issue 4, 2025, Pages 311-313, 10.1016/j.tics.2024.12.010.

We highlight a fundamental psychological function that is central to many of our interactions in the environment – when to rely on memories versus sampling sensory information anew to guide behavior. By operationalizing sensorimnemonic decisions we aim to encourage and advance research into this pivotal process for understanding how memories serve adaptive cognition.

Posted in: Psycho-physiological bases of engineering , Tagged: Mental imagery

On the explainability of Deep RL and its improvement through the integration of human preferences

April 3, 2025 06:38 , Juan-Antonio Fernández-Madrigal

Georgios Angelopoulos, Luigi Mangiacapra, Alessandra Rossi, Claudia Di Napoli, Silvia Rossi, What is behind the curtain? Increasing transparency in reinforcement learning with human preferences and explanations, Engineering Applications of Artificial Intelligence, Volume 149, 2025, 10.1016/j.engappai.2025.110520.

In this work, we investigate whether the transparency of a robot’s behaviour is improved when human preferences on the actions the robot performs are taken into account during the learning process. For this purpose, a shielding mechanism called Preference Shielding is proposed and included in a reinforcement learning algorithm to account for human preferences. We also use the shielding to decide when to provide explanations of the robot’s actions. We carried out a within-subjects study involving 26 participants to evaluate the robot’s transparency. Results indicate that considering human preferences during learning improves legibility compared with providing only explanations. In addition, combining human preferences and explanations further amplifies transparency. Results also confirm that increased transparency leads to an increase in people’s perception of the robot’s safety, comfort, and reliability. These findings show the importance of transparency during learning and suggest a paradigm for robotic applications when a robot has to learn a task in the presence of or in collaboration with a human.

Posted in: Applications of reinforcement learning to robots , Tagged: Deep reinforcement learning, Explainability, Human-robot integration

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

March 20, 2025 15:41 , Juan-Antonio Fernández-Madrigal

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Reinforcement learning (RL) offers a promising solution for controlling humanoid robots, particularly for bipedal locomotion, by learning adaptive and flexible control strategies. However, direct RL application is hindered by time-consuming trial-and-error processes, necessitating training in simulation before real-world transfer. This introduces a reality gap that degrades performance. Although various methods have been proposed for sim-to-real transfer, they have not been validated on a consistent hardware platform, making it difficult to determine which components are key to overcoming the reality gap. In contrast, we systematically evaluate techniques to enhance RL policy robustness during sim-to-real transfer by controlling variables and comparing them on a single robot to isolate and analyze the impact of each technique. These techniques include dynamics randomization, state history usage, noise/bias/delay modeling, state selection, perturbations, and network size. We quantitatively assess the reality gap by simulating diverse conditions and conducting experiments on real hardware. Our findings provide insights into bridging the reality gap, advancing robust RL-trained humanoid robots for real-world applications.

Posted in: Applications of reinforcement learning to robots , Tagged: Simulation-to-real problem

A new perspective of considering convex optimization problems based on electric circuit theory

March 6, 2025 10:41 , Juan-Antonio Fernández-Madrigal

Stephen P. Boyd, Tetiana Parshakova, Ernest K. Ryu, Jaewook J. Suh, Optimization Algorithm Design via Electric Circuits, NeurIPS 2024 spotlight, 25 Sept 2024, Last Modified: 14 Jan 2025, https://openreview.net/forum?id=9Jmt1eER9P.

We present a novel methodology for convex optimization algorithm design using ideas from electric RLC circuits. Given an optimization problem, the first stage of the methodology is to design an appropriate electric circuit whose continuous-time dynamics converge to the solution of the optimization problem at hand. Then, the second stage is an automated, computer-assisted discretization of the continuous-time dynamics, yielding a provably convergent discrete-time algorithm. Our methodology recovers many classical (distributed) optimization algorithms and enables users to quickly design and explore a wide range of new algorithms with convergence guarantees.

Posted in: Mathematics , Tagged: Convex optimization, Electric circuits, Optimization

On the innate ability of vertebrates for number recognition and the one of distinguishing ratios of numbers

March 6, 2025 10:33 , Juan-Antonio Fernández-Madrigal

Elena Lorenzi, Dmitry Kobylkov, Giorgio Vallortigara, Is there an innate sense of number in the brain?, Cerebral Cortex, Volume 35, Issue 2, February 2025, DOI: 10.1093/cercor/bhaf004.

The approximate number system or «sense of number» is a crucial, presymbolic mechanism enabling animals to estimate quantities, which is essential for survival in various contexts (eg estimating numerosities of social companions, prey, predators, and so on). Behavioral studies indicate that a sense of number is widespread across vertebrates and invertebrates. Specific brain regions such as the intraparietal sulcus and prefrontal cortex in primates, or equivalent areas in birds and fish, are involved in numerical estimation, and their activity is modulated by the ratio of quantities. Data gathered across species strongly suggest similar evolutionary pressures for number estimation pointing to a likely common origin, at least across vertebrates. On the other hand, few studies have investigated the origins of the sense of number. Recent findings, however, have shown that numerosity-selective neurons exist in newborn animals, such as domestic chicks and zebrafish, supporting the hypothesis of an innateness of the approximate number system. Control-rearing experiments on visually naïve animals further support the notion that the sense of number is innate and does not need any specific instructive experience in order to be triggered.

Posted in: Psycho-physiological bases of engineering , Tagged: Numbers in the brain

It seems that the human brain working memory uses pointers

March 6, 2025 10:21 , Juan-Antonio Fernández-Madrigal

Edward Awh, Edward K. Vogel, Working memory needs pointers, Trends in Cognitive Sciences, Volume 29, Issue 3, 2025, Pages 230-241, DOI: 10.1016/j.tics.2024.12.006.

Cognitive neuroscience has converged on a definition of working memory (WM) as a capacity-limited system that maintains highly accessible representations via stimulus-specific neural patterns. We argue that this standard definition may be incomplete. We highlight the fundamental need to recognize specific instances or tokens and to bind those tokens to the surrounding context. We propose that contextual binding is supported by spatiotemporal ‘pointers’ and that pointers are the source of neural signals that track the number of stored items, independent of their content. These content-independent pointers may provide a productive perspective for understanding item-based capacity limits in WM and the role of WM as a gateway for long-term storage.

Posted in: Psycho-physiological bases of engineering , Tagged: Working memory

Planning tasks under uncertainty that have a maximum time to be finished

March 6, 2025 10:17 , Juan-Antonio Fernández-Madrigal

Michal Staniaszek, Lara Brudermüller, Yang You, Raunak Bhattacharyya, Bruno Lacerda, Nick Hawes, Time-bounded planning with uncertain task duration distributions, Robotics and Autonomous Systems, Volume 186, 2025, DOI: 10.1016/j.robot.2025.104926.

We consider planning problems where a robot must gather reward by completing tasks at each of a large set of locations while constrained by a time bound. Our focus is problems where the context under which each task will be executed can be predicted, but is not known in advance. Here, the term context refers to the conditions under which the task is executed, and can be related to the robot’s internal state (e.g., how well it is localised?), or the environment itself (e.g., how dirty is the floor the robot must clean?). This context has an impact on the time required to execute the task, which we model probabilistically. We model the problem of time-bounded planning for tasks executed under uncertain contexts as a Markov decision process with discrete time in the state, and propose variants on this model which allow adaptation to different robotics domains. Due to the intractability of the general model, we propose simplifications to allow planning in large domains. The key idea behind these simplifications is constraining navigation using a solution to the travelling salesperson problem. We evaluate our models on maps generated from real-world environments and consider two domains with different characteristics: UV disinfection, and cleaning. We evaluate the effect of model variants and simplifications on performance, and show that policies obtained for our models outperform a rule-based baseline, as well as a model which does not consider context. We also evaluate our models in a real robot experiment where a quadruped performs simulated inspection tasks in an industrial environment.

Posted in: Robot task planning , Tagged: Tasks with time

« Previous 1 2 3 4 … 77 Next »

Improving the adaptation of RL to robots with different parameters through Fuzzy

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

Improving reward shaping in Deep RL for avoiding user’s biases and boosting learning efficiency

Using Deep RL to model transitions and observations in EKF localization

When to rely on memories versus sampling sensory information anew to guide behavior

Levi Kumle, Anna C. Nobre, Dejan Draschkow, Sensorimnemonic decisions: choosing memories versus sensory information, Trends in Cognitive Sciences, Volume 29, Issue 4, 2025, Pages 311-313, 10.1016/j.tics.2024.12.010.

On the explainability of Deep RL and its improvement through the integration of human preferences

Interesting survey of existing sim-to-real gap in RL in the context of humanoid robots

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

A new perspective of considering convex optimization problems based on electric circuit theory

Stephen P. Boyd, Tetiana Parshakova, Ernest K. Ryu, Jaewook J. Suh, Optimization Algorithm Design via Electric Circuits, NeurIPS 2024 spotlight, 25 Sept 2024, Last Modified: 14 Jan 2025, https://openreview.net/forum?id=9Jmt1eER9P.

On the innate ability of vertebrates for number recognition and the one of distinguishing ratios of numbers

Elena Lorenzi, Dmitry Kobylkov, Giorgio Vallortigara, Is there an innate sense of number in the brain?, Cerebral Cortex, Volume 35, Issue 2, February 2025, DOI: 10.1093/cercor/bhaf004.

It seems that the human brain working memory uses pointers

Edward Awh, Edward K. Vogel, Working memory needs pointers, Trends in Cognitive Sciences, Volume 29, Issue 3, 2025, Pages 230-241, DOI: 10.1016/j.tics.2024.12.006.

Planning tasks under uncertainty that have a maximum time to be finished

Michal Staniaszek, Lara Brudermüller, Yang You, Raunak Bhattacharyya, Bruno Lacerda, Nick Hawes, Time-bounded planning with uncertain task duration distributions, Robotics and Autonomous Systems, Volume 186, 2025, DOI: 10.1016/j.robot.2025.104926.

Post Navigation

Fields, areas and lines of research

Archives

A. G. Haddad, M. B. Mohiuddin, I. Boiko and Y. Zweiri, Fuzzy Ensembles of Reinforcement Learning Policies for Systems With Variable Parameters, IEEE Robotics and Automation Letters, vol. 10, no. 6, pp. 5361-5368, June 2025 10.1109/LRA.2025.3559833.

Levi Kumle, Anna C. Nobre, Dejan Draschkow, Sensorimnemonic decisions: choosing memories versus sensory information, Trends in Cognitive Sciences, Volume 29, Issue 4, 2025, Pages 311-313, 10.1016/j.tics.2024.12.010.

D. Kim, H. Lee, J. Cha and J. Park, Bridging the Reality Gap: Analyzing Sim-to-Real Transfer Techniques for Reinforcement Learning in Humanoid Bipedal Locomotion, IEEE Robotics & Automation Magazine, vol. 32, no. 1, pp. 49-58, March 2025 10.1109/MRA.2024.3505784.

Stephen P. Boyd, Tetiana Parshakova, Ernest K. Ryu, Jaewook J. Suh, Optimization Algorithm Design via Electric Circuits, NeurIPS 2024 spotlight, 25 Sept 2024, Last Modified: 14 Jan 2025, https://openreview.net/forum?id=9Jmt1eER9P.

Elena Lorenzi, Dmitry Kobylkov, Giorgio Vallortigara, Is there an innate sense of number in the brain?, Cerebral Cortex, Volume 35, Issue 2, February 2025, DOI: 10.1093/cercor/bhaf004.

Edward Awh, Edward K. Vogel, Working memory needs pointers, Trends in Cognitive Sciences, Volume 29, Issue 3, 2025, Pages 230-241, DOI: 10.1016/j.tics.2024.12.006.

Michal Staniaszek, Lara Brudermüller, Yang You, Raunak Bhattacharyya, Bruno Lacerda, Nick Hawes, Time-bounded planning with uncertain task duration distributions, Robotics and Autonomous Systems, Volume 186, 2025, DOI: 10.1016/j.robot.2025.104926.

Post Navigation

Fields, areas and lines of research

Transversal topics, methods and tools

Archives