Category Archives: Applications Of Reinforcement Learning To Robots

A particular action space for human-manipulator physical interaction learning through RL

Roberto Martín-Martín, Michelle A. Lee, Rachel Gardner, Silvio Savarese, Jeannette Bohg, Animesh Garg, Variable Impedance Control in End-Effector Space: An Action Space for Reinforcement Learning in Contact-Rich Tasks, arXiv:1906.08880 [cs.RO].

Reinforcement Learning (RL) of contact-rich manipulation tasks has yielded impressive results in recent years. While many studies in RL focus on varying the observation space or reward model, few efforts focused on the choice of action space (e.g. joint or end-effector space, position, velocity, etc.). However, studies in robot motion control indicate that choosing an action space that conforms to the characteristics of the task can simplify exploration and improve robustness to disturbances. This paper studies the effect of different action spaces in deep RL and advocates for Variable Impedance Control in End-effector Space (VICES) as an advantageous action space for constrained and contact-rich tasks. We evaluate multiple action spaces on three prototypical manipulation tasks: Path Following (task with no contact), Door Opening (task with kinematic constraints), and Surface Wiping (task with continuous contact). We show that VICES improves sample efficiency, maintains low energy consumption, and ensures safety across all three experimental setups. Further, RL policies learned with VICES can transfer across different robot models in simulation, and from simulation to real for the same robot. Further information is available at this https URL.

Mapless (egocentric) navigation with hierarchical RL that includes a good survey of current RL approaches for that task

Yan Gao, Feiqiang Lin, Boliang Cai, Jing Wu, Changyun Wei, Raphael Grech, Ze Ji, Mapless navigation via Hierarchical Reinforcement Learning with memory-decaying novelty, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104815.

Hierarchical Reinforcement Learning (HRL) has shown superior performance for mapless navigation tasks. However, it remains limited in unstructured environments that might contain terrains like long corridors and dead corners, which can lead to local minima. This is because most HRL-based mapless navigation methods employ a simplified reward setting and exploration strategy. In this work, we propose a novel reward function for training the high-level (HL) policy, which contains two components: extrinsic reward and intrinsic reward. The extrinsic reward encourages the robot to move towards the target location, while the intrinsic reward is computed based on novelty, episode memory and memory decaying, making the agent capable of accomplishing spontaneous exploration. We also design a novel neural network structure that incorporates an LSTM network to augment the agent with memory and reasoning capabilities. We test our method in unknown environments and specific scenarios prone to the local minimum problem to evaluate the navigation performance and local minimum resolution ability. The results show that our method significantly increases the success rate when compared to advanced RL-based methods, achieving a maximum improvement of nearly 28%. Our method demonstrates effective improvement in addressing the local minimum issue, especially in cases where the baselines fail completely. Additionally, numerous ablation studies consistently confirm the effectiveness of our proposed reward function and neural network structure.

Building a graph of skills in order to leverage already learnt ones in a robotic RL context

Qiangxing Tian, Shanshan Zhang, Donglin Wang, Jinxin Liu, Shuyu Yang, GSC: A graph-based skill composition framework for robot learning, Robotics and Autonomous Systems, Volume 182, 2024, DOI: 10.1016/j.robot.2024.104787.

Humans excel at performing a wide range of sophisticated tasks by leveraging skills acquired from prior experiences. This characteristic is especially essential in robotics empowered by deep reinforcement learning, as learning every skill from scratch is time-consuming and may not always be feasible. With the prior skills incorporated, skill composition aims to accelerate the learning process on new robotic tasks. Previous works have given insight into combining pre-trained task-agnostic skills, whereas skills are transformed into fixed order representation, resulting in poor capturing of potential complex skill relations. In this paper, we novelly propose a Graph-based framework for Skill Composition (GSC). To learn rich structural information, a carefully designed skill graph is constructed, where skill representations are taken as nodes and skill relations are utilized as edges. Furthermore, to allow it trained efficiently on large-scale skill set, a transformer-style graph updating method is employed to achieve comprehensive information aggregation. Our simulation experiments indicate that GSC outperforms the state-of-the-art methods on various challenging tasks. Additionally, we successfully apply the technique to the navigation task on a real quadruped robot. The project homepage can be found at Graph Skill Composition.

Improving exploration of the state space in RL for learning robotic skills through the use of RRTs

Khandate, G., Saidi, T.L., Shang, S. et al. R R: Rapid eXploration for Reinforcement learning via sampling-based reset distributions and imitation pre-training, Auton Robot 48, 17 (2024) DOI: 10.1007/s10514-024-10170-8.

We present a method for enabling Reinforcement Learning of motor control policies for complex skills such as dexterous manipulation. We posit that a key difficulty for training such policies is the difficulty of exploring the problem state space, as the accessible and useful regions of this space form a complex structure along manifolds of the original high-dimensional state space. This work presents a method to enable and support exploration with Sampling-based Planning. We use a generally applicable non-holonomic Rapidly-exploring Random Trees algorithm and present multiple methods to use the resulting structure to bootstrap model-free Reinforcement Learning. Our method is effective at learning various challenging dexterous motor control skills of higher difficulty than previously shown. In particular, we achieve dexterous in-hand manipulation of complex objects while simultaneously securing the object without the use of passive support surfaces. These policies also transfer effectively to real robots. A number of example videos can also be found on the project website: sbrl.cs.columbia.edu

How self-learning in mobile robot navigation can tackle situations rarely coped with by other methods in spite of their long training time

Al Mahmud, S., Kamarulariffin, A., Ibrahim, A.M. et al. , Advancements and Challenges in Mobile Robot Navigation: A Comprehensive Review of Algorithms and Potential for Self-Learning Approaches, J Intell Robot Syst 110, 120 (2024) DOI: 10.1007/s10846-024-02149-5.

Mobile robot navigation has been a very popular topic of practice among researchers since a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous algorithms (traditional AI-based, swarm intelligence-based, self-learning-based) have been built and implemented independently, and also in blended manners. Nevertheless, the problem of efficient autonomous robot navigation persists in multiple degrees due to the limitation of these algorithms. The lack of knowledge on the implemented techniques and their shortcomings act as a hindrance to further development on this topic. This is why an extensive study on the previously implemented algorithms, their applicability, their weaknesses as well as their potential needs to be conducted in order to assess how to improve mobile robot navigation performance. In this review paper, a comprehensive review of mobile robot navigation algorithms has been conducted. The findings suggest that, even though the self-learning algorithms require huge amounts of training data and have the possibility of learning erroneous behavior, they possess huge potential to overcome challenges rarely addressed by the other traditional algorithms. The findings also insinuate that in the domain of machine learning-based algorithms, integration of knowledge representation with a neuro-symbolic approach has the capacity to improve the accuracy and performance of self-robot navigation training by a significant margin.

Reducing the need of samples in RL through evolutionary techniques

Onori, G., Shahid, A.A., Braghin, F. et al. , Adaptive Optimization of Hyper-Parameters for Robotic Manipulation through Evolutionary Reinforcement Learning, J Intell Robot Syst 110, 108 (2024) DOI: 10.1007/s10846-024-02138-8.

Deep Reinforcement Learning applications are growing due to their capability of teaching the agent any task autonomously and generalizing the learning. However, this comes at the cost of a large number of samples and interactions with the environment. Moreover, the robustness of learned policies is usually achieved by a tedious tuning of hyper-parameters and reward functions. In order to address this issue, this paper proposes an evolutionary RL algorithm for the adaptive optimization of hyper-parameters. The policy is trained using an on-policy algorithm, Proximal Policy Optimization (PPO), coupled with an evolutionary algorithm. The achieved results demonstrate an improvement in the sample efficiency of the RL training on a robotic grasping task. In particular, the learning is improved with respect to the baseline case of a non-evolutionary agent. The evolutionary agent needs % fewer samples to completely learn the grasping task, enabled by the adaptive transfer of knowledge between the agents through the evolutionary algorithm. The proposed approach also demonstrates the possibility of updating reward parameters during training, potentially providing a general approach to creating reward functions.

Dealing with combinatorial large action spaces in RL through action masking

Z. Wu, Y. Li, W. Zhan, C. Liu, Y. -H. Liu and M. Tomizuka,Efficient Reinforcement Learning of Task Planners for Robotic Palletization Through Iterative Action Masking Learning, IEEE Robotics and Automation Letters, vol. 9, no. 11, pp. 9303-9310, Nov. 2024 DOI: 10.1109/LRA.2024.3440731.

The development of robotic systems for palletization in logistics scenarios is of paramount importance, addressing critical efficiency and precision demands in supply chain management. This paper investigates the application of Reinforcement Learning (RL) in enhancing task planning for such robotic systems. Confronted with the substantial challenge of a vast action space, which is a significant impediment to efficiently apply out-of-the-shelf RL methods, our study introduces a novel method of utilizing supervised learning to iteratively prune and manage the action space effectively. By reducing the complexity of the action space, our approach not only accelerates the learning phase but also ensures the effectiveness and reliability of the task planning in robotic palletization. The experiemental results underscore the efficacy of this method, highlighting its potential in improving the performance of RL applications in complex and high-dimensional environments like logistics palletization.

Improving explainability of deep RL in Robotics

Mehran Taghian, Shotaro Miwa, Yoshihiro Mitsuka, Johannes Günther, Shadan Golestan, Osmar Zaiane, Explainability of deep reinforcement learning algorithms in robotic domains by using Layer-wise Relevance Propagation, Engineering Applications of Artificial Intelligence, Volume 137, Part A, 2024 DOI: 10.1016/j.engappai.2024.109131.

A key component to the recent success of reinforcement learning is the introduction of neural networks for representation learning. Doing so allows for solving challenging problems in several domains, one of which is robotics. However, a major criticism of deep reinforcement learning (DRL) algorithms is their lack of explainability and interpretability. This problem is even exacerbated in robotics as they oftentimes cohabitate space with humans, making it imperative to be able to reason about their behavior. In this paper, we propose to analyze the learned representation in a robotic setting by utilizing Graph Networks (GNs). Using the GN and Layer-wise Relevance Propagation (LRP), we represent the observations as an entity-relationship to allow us to interpret the learned policy. We evaluate our approach in two environments in MuJoCo. These two environments were delicately designed to effectively measure the value of knowledge gained by our approach to analyzing learned representations. This approach allows us to analyze not only how different parts of the observation space contribute to the decision-making process but also differentiate between policies and their differences in performance. This difference in performance also allows for reasoning about the agent’s recovery from faults. These insights are key contributions to explainable deep reinforcement learning in robotic settings.

A good survey and taxonomy for DRL in robotics

Chen Tang 1, Ben Abbatematteo 1, Jiaheng Hu 1, Rohan Chandra , Roberto Martı́n-Martı́n , Peter Stone, Deep Reinforcement Learning for Robotics: A Survey of Real-World
Successes,
arXiv:2408.03539 [cs.RO] https://www.arxiv.org/abs/2408.03539.

Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. This article provides a modern survey of DRL for robotics, with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies. Our analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. We highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. This survey is designed to offer insights for both RL practitioners and roboticists toward harnessing RL’s power to create generally capable real-world robotic systems.

Safety in RL through “predictive safety filters”

Aksel Vaaler, Svein Jostein Husa, Daniel Menges, Thomas Nakken Larsen, Adil Rasheed, Modular control architecture for safe marine navigation: Reinforcement learning with predictive safety filters, Artificial Intelligence, Volume 336, 2024, DOI: 10.1016/j.artint.2024.104201.

Many autonomous systems are safety-critical, making it essential to have a closed-loop control system that satisfies constraints arising from underlying physical limitations and safety aspects in a robust manner. However, this is often challenging to achieve for real-world systems. For example, autonomous ships at sea have nonlinear and uncertain dynamics and are subject to numerous time-varying environmental disturbances such as waves, currents, and wind. There is increasing interest in using machine learning-based approaches to adapt these systems to more complex scenarios, but there are few standard frameworks that guarantee the safety and stability of such systems. Recently, predictive safety filters (PSF) have emerged as a promising method to ensure constraint satisfaction in learning-based control, bypassing the need for explicit constraint handling in the learning algorithms themselves. The safety filter approach leads to a modular separation of the problem, allowing the use of arbitrary control policies in a task-agnostic way. The filter takes in a potentially unsafe control action from the main controller and solves an optimization problem to compute a minimal perturbation of the proposed action that adheres to both physical and safety constraints. In this work, we combine reinforcement learning (RL) with predictive safety filtering in the context of marine navigation and control. The RL agent is trained on path-following and safety adherence across a wide range of randomly generated environments, while the predictive safety filter continuously monitors the agents’ proposed control actions and modifies them if necessary. The combined PSF/RL scheme is implemented on a simulated model of Cybership II, a miniature replica of a typical supply ship. Safety performance and learning rate are evaluated and compared with those of a standard, non-PSF, RL agent. It is demonstrated that the predictive safety filter is able to keep the vessel safe, while not prohibiting the learning rate and performance of the RL agent.

See also: https://doi.org/10.1016/j.artint.2024.104195