November | 2024

A particular action space for human-manipulator physical interaction learning through RL

November 21, 2024 13:10 , Juan-Antonio Fernández-Madrigal

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at this https URL.

Posted in: Applications of reinforcement learning to robots , Tagged: Human-Machine Interaction, Physical interaction

A very good explanaition of how to model certain data to further sampling from it

November 21, 2024 12:59 , Juan-Antonio Fernández-Madrigal

Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong and Bruno Mlodozeniec, Denoising Diffusion Probabilistic Models in Six Simple Steps, arXiv:2402.04384 [cs.LG] [cs.LG].

Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning.

Posted in: Probability and statistics , Tagged: Probabilistic model

RL training with a massive amount of scenarios, GPU accelerated

November 21, 2024 12:45 , Juan-Antonio Fernández-Madrigal

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge. In this work, we take a step towards this goal by procedurally generating tens of millions of 2D physics-based tasks and using these to train a general reinforcement learning (RL) agent for physical control. To this end, we introduce Kinetix: an open-ended space of physics-based RL environments that can represent tasks ranging from robotic locomotion and grasping to video games and classic RL environments, all within a unified framework. Kinetix makes use of our novel hardware-accelerated physics engine Jax2D that allows us to cheaply simulate billions of environment steps during training. Our trained agent exhibits strong physical reasoning capabilities, being able to zero-shot solve unseen human-designed environments. Furthermore, fine-tuning this general agent on tasks of interest shows significantly stronger performance than training an RL agent *tabula rasa*. This includes solving some environments that standard RL training completely fails at. We believe this demonstrates the feasibility of large scale, mixed-quality pre-training for online RL and we hope that Kinetix will serve as a useful framework to investigate this further.

Posted in: Reinforcement learning in AI

A good review of allostasis and control theory applied to physiology

November 21, 2024 12:33 , Juan-Antonio Fernández-Madrigal

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

The brain regulates the body by anticipating its needs and attempting to meet them before they arise – a process called allostasis. Allostasis requires a model of the changing sensory conditions within the body, a process called interoception. In this paper, we examine how interoception may provide performance feedback for allostasis. We suggest studying allostasis in terms of control theory, reviewing control theory’s applications to related issues in physiology, motor control, and decision making. We synthesize these by relating them to the important properties of allostatic regulation as a control problem. We then sketch a novel formalism for how the brain might perform allostatic control of the viscera by analogy to skeletomotor control, including a mathematical view on how interoception acts as performance feedback for allostasis. Finally, we suggest ways to test implications of our hypotheses.

Posted in: Control Engineering, Psycho-physiological bases of engineering , Tagged: Allostasis

Generating intrinsic rewards to address the sparse reward problem of RL

November 14, 2024 16:49 , Juan-Antonio Fernández-Madrigal

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

In sparse extrinsic reward settings, reinforcement learning remains a challenge despite increasing interest in this field. Existing approaches suggest that intrinsic rewards can alleviate issues caused by reward sparsity. However, many studies overlook the critical role of temporal information, essential for human curiosity. This article introduces a novel intrinsic reward mechanism inspired by human learning processes, where curiosity is evaluated by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, periodically saving snapshots of the model parameters, and employing the nuclear norm to assess the temporal inconsistency between predictions from different snapshots as intrinsic rewards. Additionally, we propose a variational weighting mechanism to adaptively assign weights to the snapshots, enhancing the model’s robustness and performance. Experimental results across various benchmark environments demonstrate the efficacy of our approach, which outperforms other state-of-the-art methods without incurring additional training costs and exhibits higher noise tolerance. Our findings indicate that leveraging temporal information in intrinsic rewards can significantly improve exploration performance, motivating future research to develop more robust and accurate reward systems for reinforcement learning.

Posted in: Reinforcement learning in AI , Tagged: Deep reinforcement learning, Sparse rewards

Using multiple data with diverse fidelities to provide surrogate simulations through GPs

November 14, 2024 16:42 , Juan-Antonio Fernández-Madrigal

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

In the engineering design of aerospace vehicles, design data at different stages exhibit hierarchical and heterogeneous distribution characteristics. Specifically, high-fidelity design data (such as from computational fluid dynamics simulations and flight tests) are costly and time-consuming to obtain. Moreover, the limited high-precision samples that are acquired often fail to cover the entire design space, resulting in a distribution characterized by small sample sizes. A critical challenge in data-driven modeling is efficiently fusing low-fidelity data with limited heterogeneous high-fidelity data to improve model accuracy and predictive performance. In response to this challenge, this paper introduces a Gaussian process fusion method for multi-fidelity data, founded on distribution characteristics. Multi-fidelity data are represented as intermediate surrogates using Gaussian processes, identifying heteroscedastic noise properties and deriving posterior distributions. The fusion is then treated as an optimization problem for prediction variance, using K-nearest neighbors and spatial clustering to determine optimal weights, which are adaptively adjusted based on sample density. These weights are adaptively adjusted based on the sample density to strengthen the local modeling behavior. The paper concludes with a comparative analysis, evaluating the proposed method against other conventional approaches using numerical cases and an aerodynamic prediction scenario for aerospace vehicles. A comparative analysis shows that the proposed method improves global modeling accuracy by 45% and reduces the demand for high-fidelity samples by over 40% compared to traditional methods. Applied in aerospace design, the method effectively merges multi-source data, establishing a robust hypersonic aerodynamic database while controlling modeling costs and demonstrating robustness to sample distribution.

Posted in: Systems and Signals , Tagged: Gaussian processes, Prediction by simulation, Simulation

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

November 14, 2024 16:31 , Juan-Antonio Fernández-Madrigal

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Hierarchical Reinforcement Learning (HRL) has shown superior performance for mapless navigation tasks. However, it remains limited in unstructured environments that might contain terrains like long corridors and dead corners, which can lead to local minima. This is because most HRL-based mapless navigation methods employ a simplified reward setting and exploration strategy. In this work, we propose a novel reward function for training the high-level (HL) policy, which contains two components: extrinsic reward and intrinsic reward. The extrinsic reward encourages the robot to move towards the target location, while the intrinsic reward is computed based on novelty, episode memory and memory decaying, making the agent capable of accomplishing spontaneous exploration. We also design a novel neural network structure that incorporates an LSTM network to augment the agent with memory and reasoning capabilities. We test our method in unknown environments and specific scenarios prone to the local minimum problem to evaluate the navigation performance and local minimum resolution ability. The results show that our method significantly increases the success rate when compared to advanced RL-based methods, achieving a maximum improvement of nearly 28%. Our method demonstrates effective improvement in addressing the local minimum issue, especially in cases where the baselines fail completely. Additionally, numerous ablation studies consistently confirm the effectiveness of our proposed reward function and neural network structure.

Posted in: Applications of reinforcement learning to robots, Robot motion planning, Robot task planning , Tagged: Mapless navigation, Reactive navigation

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

November 14, 2024 16:25 , Juan-Antonio Fernández-Madrigal

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Humans excel at performing a wide range of sophisticated tasks by leveraging skills acquired from prior experiences. This characteristic is especially essential in robotics empowered by deep reinforcement learning, as learning every skill from scratch is time-consuming and may not always be feasible. With the prior skills incorporated, skill composition aims to accelerate the learning process on new robotic tasks. Previous works have given insight into combining pre-trained task-agnostic skills, whereas skills are transformed into fixed order representation, resulting in poor capturing of potential complex skill relations. In this paper, we novelly propose a Graph-based framework for Skill Composition (GSC). To learn rich structural information, a carefully designed skill graph is constructed, where skill representations are taken as nodes and skill relations are utilized as edges. Furthermore, to allow it trained efficiently on large-scale skill set, a transformer-style graph updating method is employed to achieve comprehensive information aggregation. Our simulation experiments indicate that GSC outperforms the state-of-the-art methods on various challenging tasks. Additionally, we successfully apply the technique to the navigation task on a real quadruped robot. The project homepage can be found at Graph Skill Composition.

Posted in: Applications of reinforcement learning to robots , Tagged: Skill learning

Monthly Archives: November 2024

A particular action space for human-manipulator physical interaction learning through RL

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

A very good explanaition of how to model certain data to further sampling from it

Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong and Bruno Mlodozeniec, Denoising Diffusion Probabilistic Models in Six Simple Steps, arXiv:2402.04384 [cs.LG] [cs.LG].

RL training with a massive amount of scenarios, GPU accelerated

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

A good review of allostasis and control theory applied to physiology

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

Generating intrinsic rewards to address the sparse reward problem of RL

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

Using multiple data with diverse fidelities to provide surrogate simulations through GPs

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Fields, areas and lines of research

Archives

Monthly Archives: November 2024

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong and Bruno Mlodozeniec, Denoising Diffusion Probabilistic Models in Six Simple Steps, arXiv:2402.04384 [cs.LG] [cs.LG].

Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster, Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks, arXiv:2410.23208 [cs.LG].

Eli Sennesh, Jordan Theriault, Dana Brooks, Jan-Willem van de Meent, Lisa Feldman Barrett, Karen S. Quigley, Interoception as modeling, allostasis as control, Biological Psychology, Volume 167, 2022 DOI: 10.1016/j.biopsycho.2021.108242.

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

Ben Yang, Boyi Chen, Yanbin Liu, Jinbao Chen, Gaussian process fusion method for multi-fidelity data with heterogeneity distribution in aerospace vehicle flight dynamics, Engineering Applications of Artificial Intelligence, Volume 138, Part A, 2024, DOI: 10.1016/j.engappai.2024.109228.

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Fields, areas and lines of research

Transversal topics, methods and tools

Archives