kipr | Scientific papers that were of interest for Prof. Juan-Antonio Fernández-Madrigal

RL training with a massive amount of scenarios, GPU accelerated

November 21, 2024 12:45 , Juan-Antonio Fernández-Madrigal

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. To this end, we introduce Kinetix: an open-ended space of physics-based RL environments that can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments, all within a unified framework. Kinetix makes use of our novel hardware-accelerated physics engine Jax2D that allows us to cheaply simulate billions of environment steps during training. Our trained agent exhibits strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent *tabula rasa*. This includes solving some environments that standard RL training completely fails at. We believe this demonstrates the feasibility of large scale, mixed-quality pre-training for online RL and we hope that Kinetix will serve as a useful framework to investigate this further.

Posted in: Reinforcement learning in AI

A good review of allostasis and control theory applied to physiology

November 21, 2024 12:33 , Juan-Antonio Fernández-Madrigal

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

The brain regulates the body by anticipating its needs and attempting to meet them before they arise – a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory’s applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses.

Posted in: Control Engineering, Psycho-physiological bases of engineering , Tagged: Allostasis

Generating intrinsic rewards to address the sparse reward problem of RL

November 14, 2024 16:49 , Juan-Antonio Fernández-Madrigal

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

In sparse extrinsic reward settings, reinforcement learning remains a challenge despite increasing interest in this field. Existing approaches suggest that intrinsic rewards can alleviate issues caused by reward sparsity. However, many studies overlook the critical role of temporal information, essential for human curiosity. This article introduces a novel intrinsic reward mechanism inspired by human learning processes, where curiosity is evaluated by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, periodically saving snapshots of the model parameters, and employing the nuclear norm to assess the temporal inconsistency between predictions from different snapshots as intrinsic rewards. Additionally, we propose a variational weighting mechanism to adaptively assign weights to the snapshots, enhancing the model’s robustness and performance. Experimental results across various benchmark environments demonstrate the efficacy of our approach, which outperforms other state-of-the-art methods without incurring additional training costs and exhibits higher noise tolerance. Our findings indicate that leveraging temporal information in intrinsic rewards can significantly improve exploration performance, motivating future research to develop more robust and accurate reward systems for reinforcement learning.

Posted in: Reinforcement learning in AI , Tagged: Deep reinforcement learning, Sparse rewards

Using multiple data with diverse fidelities to provide surrogate simulations through GPs

November 14, 2024 16:42 , Juan-Antonio Fernández-Madrigal

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

In the engineering design of aerospace vehicles, design data at different stages exhibit hierarchical and heterogeneous distribution characteristics. Specifically, high-fidelity design data (such as from computational fluid dynamics simulations and flight tests) are costly and time-consuming to obtain. Moreover, the limited high-precision samples that are acquired often fail to cover the entire design space, resulting in a distribution characterized by small sample sizes. A critical challenge in data-driven modeling is efficiently fusing low-fidelity data with limited heterogeneous high-fidelity data to improve model accuracy and predictive performance. In response to this challenge, this paper introduces a Gaussian process fusion method for multi-fidelity data, founded on distribution characteristics. Multi-fidelity data are represented as intermediate surrogates using Gaussian processes, identifying heteroscedastic noise properties and deriving posterior distributions. The fusion is then treated as an optimization problem for prediction variance, using K-nearest neighbors and spatial clustering to determine optimal weights, which are adaptively adjusted based on sample density. These weights are adaptively adjusted based on the sample density to strengthen the local modeling behavior. The paper concludes with a comparative analysis, evaluating the proposed method against other conventional approaches using numerical cases and an aerodynamic prediction scenario for aerospace vehicles. A comparative analysis shows that the proposed method improves global modeling accuracy by 45% and reduces the demand for high-fidelity samples by over 40% compared to traditional methods. Applied in aerospace design, the method effectively merges multi-source data, establishing a robust hypersonic aerodynamic database while controlling modeling costs and demonstrating robustness to sample distribution.

Posted in: Systems and Signals , Tagged: Gaussian processes, Prediction by simulation, Simulation

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

November 14, 2024 16:31 , Juan-Antonio Fernández-Madrigal

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Hierarchical Reinforcement Learning (HRL) has shown superior performance for mapless navigation tasks. However, it remains limited in unstructured environments that might contain terrains like long corridors and dead corners, which can lead to local minima. This is because most HRL-based mapless navigation methods employ a simplified reward setting and exploration strategy. In this work, we propose a novel reward function for training the high-level (HL) policy, which contains two components: extrinsic reward and intrinsic reward. The extrinsic reward encourages the robot to move towards the target location, while the intrinsic reward is computed based on novelty, episode memory and memory decaying, making the agent capable of accomplishing spontaneous exploration. We also design a novel neural network structure that incorporates an LSTM network to augment the agent with memory and reasoning capabilities. We test our method in unknown environments and specific scenarios prone to the local minimum problem to evaluate the navigation performance and local minimum resolution ability. The results show that our method significantly increases the success rate when compared to advanced RL-based methods, achieving a maximum improvement of nearly 28%. Our method demonstrates effective improvement in addressing the local minimum issue, especially in cases where the baselines fail completely. Additionally, numerous ablation studies consistently confirm the effectiveness of our proposed reward function and neural network structure.

Posted in: Applications of reinforcement learning to robots, Robot motion planning, Robot task planning , Tagged: Mapless navigation, Reactive navigation

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

November 14, 2024 16:25 , Juan-Antonio Fernández-Madrigal

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Humans excel at performing a wide range of sophisticated tasks by leveraging skills acquired from prior experiences. This characteristic is especially essential in robotics empowered by deep reinforcement learning, as learning every skill from scratch is time-consuming and may not always be feasible. With the prior skills incorporated, skill composition aims to accelerate the learning process on new robotic tasks. Previous works have given insight into combining pre-trained task-agnostic skills, whereas skills are transformed into fixed order representation, resulting in poor capturing of potential complex skill relations. In this paper, we novelly propose a Graph-based framework for Skill Composition (GSC). To learn rich structural information, a carefully designed skill graph is constructed, where skill representations are taken as nodes and skill relations are utilized as edges. Furthermore, to allow it trained efficiently on large-scale skill set, a transformer-style graph updating method is employed to achieve comprehensive information aggregation. Our simulation experiments indicate that GSC outperforms the state-of-the-art methods on various challenging tasks. Additionally, we successfully apply the technique to the navigation task on a real quadruped robot. The project homepage can be found at Graph Skill Composition.

Posted in: Applications of reinforcement learning to robots , Tagged: Skill learning

Improving exploration of the state space in RL for learning robotic skills through the use of RRTs

October 25, 2024 09:23 , Juan-Antonio Fernández-Madrigal

Khandate, G., Saidi, T.L., Shang, S. et al. R R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training, Auton Robot 48, 17 (2024) DOI: 10.1007/s10514-024-10170-8.

We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: sbrl.cs.columbia.edu

Posted in: Applications of reinforcement learning to robots , Tagged: RRT, Sample efficiency

Robot exploration through decision-making + gaussian processes

October 24, 2024 06:18 , Juan-Antonio Fernández-Madrigal

Stephens, A., Budd, M., Staniaszek, M. et al. Planning under uncertainty for safe robot exploration using Gaussian process prediction, Auton Robot 48, 18 (2024) DOI: 10.1007/s10514-024-10172-6.

The exploration of new environments is a crucial challenge for mobile robots. This task becomes even more complex with the added requirement of ensuring safety. Here, safety refers to the robot staying in regions where the values of certain environmental conditions (such as terrain steepness or radiation levels) are within a predefined threshold. We consider two types of safe exploration problems. First, the robot has a map of its workspace, but the values of the environmental features relevant to safety are unknown beforehand and must be explored. Second, both the map and the environmental features are unknown, and the robot must build a map whilst remaining safe. Our proposed framework uses a Gaussian process to predict the value of the environmental features in unvisited regions. We then build a Markov decision process that integrates the Gaussian process predictions with the transition probabilities of the environmental model. The Markov decision process is then incorporated into an exploration algorithm that decides which new region of the environment to explore based on information value, predicted safety, and distance from the current position of the robot. We empirically evaluate the effectiveness of our framework through simulations and its application on a physical robot in an underground environment.

Posted in: Robot motion planning , Tagged: Active exploration, Decision making, Gaussian processes

How self-learning in mobile robot navigation can tackle situations rarely coped with by other methods in spite of their long training time

October 10, 2024 09:30 , Juan-Antonio Fernández-Madrigal

Al Mahmud, S., Kamarulariffin, A., Ibrahim, A.M. et al. , Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self-Learning Approaches, J Intell Robot Syst 110, 120 (2024) DOI: 10.1007/s10846-024-02149-5.

Mobile robot navigation has been a very popular topic of practice among researchers since a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous algorithms (traditional AI-based, swarm intelligence-based, self-learning-based) have been built and implemented independently, and also in blended manners. Nevertheless, the problem of efficient autonomous robot navigation persists in multiple degrees due to the limitation of these algorithms. The lack of knowledge on the implemented techniques and their shortcomings act as a hindrance to further development on this topic. This is why an extensive study on the previously implemented algorithms, their applicability, their weaknesses as well as their potential needs to be conducted in order to assess how to improve mobile robot navigation performance. In this review paper, a comprehensive review of mobile robot navigation algorithms has been conducted. The findings suggest that, even though the self-learning algorithms require huge amounts of training data and have the possibility of learning erroneous behavior, they possess huge potential to overcome challenges rarely addressed by the other traditional algorithms. The findings also insinuate that in the domain of machine learning-based algorithms, integration of knowledge representation with a neuro-symbolic approach has the capacity to improve the accuracy and performance of self-robot navigation training by a significant margin.

Posted in: Applications of reinforcement learning to robots , Tagged: Navigation, Review

Improving sample efficiency under sparse rewards and large continuous action spaces through predictive control in RL

October 10, 2024 09:26 , Juan-Antonio Fernández-Madrigal

Antonyshyn, L., Givigi, S., Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards, J Intell Robot Syst 110, 100 (2024) DOI: 10.1007/s10846-024-02118-y.

Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.

Posted in: Reinforcement learning in AI , Tagged: Deep reinforcement learning, Model Predictive Control, Sample efficiency, Sparse rewards

« Previous 1 … 5 6 7 8 9 … 80 Next »

RL training with a massive amount of scenarios, GPU accelerated

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

A good review of allostasis and control theory applied to physiology

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

Generating intrinsic rewards to address the sparse reward problem of RL

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

Using multiple data with diverse fidelities to provide surrogate simulations through GPs

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Improving exploration of the state space in RL for learning robotic skills through the use of RRTs

Khandate, G., Saidi, T.L., Shang, S. et al. R R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training, Auton Robot 48, 17 (2024) DOI: 10.1007/s10514-024-10170-8.

Robot exploration through decision-making + gaussian processes

Stephens, A., Budd, M., Staniaszek, M. et al. Planning under uncertainty for safe robot exploration using Gaussian process prediction, Auton Robot 48, 18 (2024) DOI: 10.1007/s10514-024-10172-6.

How self-learning in mobile robot navigation can tackle situations rarely coped with by other methods in spite of their long training time

Al Mahmud, S., Kamarulariffin, A., Ibrahim, A.M. et al. , Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self-Learning Approaches, J Intell Robot Syst 110, 120 (2024) DOI: 10.1007/s10846-024-02149-5.

Improving sample efficiency under sparse rewards and large continuous action spaces through predictive control in RL

Antonyshyn, L., Givigi, S., Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards, J Intell Robot Syst 110, 100 (2024) DOI: 10.1007/s10846-024-02118-y.

Post Navigation

Fields, areas and lines of research

Archives

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Khandate, G., Saidi, T.L., Shang, S. et al. R R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training, Auton Robot 48, 17 (2024) DOI: 10.1007/s10514-024-10170-8.

Stephens, A., Budd, M., Staniaszek, M. et al. Planning under uncertainty for safe robot exploration using Gaussian process prediction, Auton Robot 48, 18 (2024) DOI: 10.1007/s10514-024-10172-6.

Al Mahmud, S., Kamarulariffin, A., Ibrahim, A.M. et al. , Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self-Learning Approaches, J Intell Robot Syst 110, 120 (2024) DOI: 10.1007/s10846-024-02149-5.

Antonyshyn, L., Givigi, S., Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards, J Intell Robot Syst 110, 100 (2024) DOI: 10.1007/s10846-024-02118-y.

Post Navigation

Fields, areas and lines of research

Transversal topics, methods and tools

Archives