Category Archives: Applications Of Reinforcement Learning To Robots

Safety in RL through “predictive safety filters”

Aksel Vaaler, Svein Jostein Husa, Daniel Menges, Thomas Nakken Larsen, Adil Rasheed, Modular control architecture for safe marine navigation: Reinforcement learning with predictive safety filters, Artificial Intelligence, Volume 336, 2024, DOI: 10.1016/j.artint.2024.104201.

Many autonomous systems are safety-critical, making it essential to have a closed-loop control system that satisfies constraints arising from underlying physical limitations and safety aspects in a robust manner. However, this is often challenging to achieve for real-world systems. For example, autonomous ships at sea have nonlinear and uncertain dynamics and are subject to numerous time-varying environmental disturbances such as waves, currents, and wind. There is increasing interest in using machine learning-based approaches to adapt these systems to more complex scenarios, but there are few standard frameworks that guarantee the safety and stability of such systems. Recently, predictive safety filters (PSF) have emerged as a promising method to ensure constraint satisfaction in learning-based control, bypassing the need for explicit constraint handling in the learning algorithms themselves. The safety filter approach leads to a modular separation of the problem, allowing the use of arbitrary control policies in a task-agnostic way. The filter takes in a potentially unsafe control action from the main controller and solves an optimization problem to compute a minimal perturbation of the proposed action that adheres to both physical and safety constraints. In this work, we combine reinforcement learning (RL) with predictive safety filtering in the context of marine navigation and control. The RL agent is trained on path-following and safety adherence across a wide range of randomly generated environments, while the predictive safety filter continuously monitors the agents’ proposed control actions and modifies them if necessary. The combined PSF/RL scheme is implemented on a simulated model of Cybership II, a miniature replica of a typical supply ship. Safety performance and learning rate are evaluated and compared with those of a standard, non-PSF, RL agent. It is demonstrated that the predictive safety filter is able to keep the vessel safe, while not prohibiting the learning rate and performance of the RL agent.

See also: https://doi.org/10.1016/j.artint.2024.104195

Using physical models to guide Deep RL in robotics

X. Li, W. Shang and S. Cong, Offline Reinforcement Learning of Robotic Control Using Deep Kinematics and Dynamics, IEEE/ASME Transactions on Mechatronics, vol. 29, no. 4, pp. 2428-2439, Aug. 2024 DOI: 10.1109/TMECH.2023.3336316.

With the rapid development of deep learning, model-free reinforcement learning algorithms have achieved remarkable results in many fields. However, their high sample complexity and the potential for causing damage to environments and robots pose severe challenges for their application in real-world environments. Model-based reinforcement learning algorithms are often used to reduce the sample complexity. One limitation of these algorithms is the inevitable modeling errors. While the black-box model can fit complex state transition models, it ignores the existing knowledge of physics and robotics, especially studies of kinematic and dynamic models of the robotic manipulator. Compared with the black-box model, the physics-inspired deep models do not require specific knowledge of each system to obtain interpretable kinematic and dynamic models. In model-based reinforcement learning, these models can simulate the motion and be combined with classical controllers. This is due to their sharing the same form as traditional models, leading to higher precision tracking results. In this work, we utilize physics-inspired deep models to learn the kinematics and dynamics of a robotic manipulator. We propose a model-based offline reinforcement learning algorithm for controller parameter learning, combined with the traditional computed-torque controller. Experiments on trajectory tracking control of the Baxter manipulator, both in joint and operational space, are conducted in simulation and real environments. Experimental results demonstrate that our algorithm can significantly improve tracking accuracy and exhibits strong generalization and robustness.

Improving reward-sparse situations in RL by adding backward learning

X. Qi, D. Chen, Z. Li and X. Tan, Back-Stepping Experience Replay With Application to Model-Free Reinforcement Learning for a Soft Snake Robot, IEEE Robotics and Automation Letters, vol. 9, no. 9, pp. 7517-7524, Sept. 2024 DOI: 10.1109/LRA.2024.3427550.

In this letter, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a purification of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.

Avoiding the sim-to-real RL transfer problem through learning the parameters of a physical system

Viktor Wiberg, Erik Wallin, Arvid Fälldin, Tobias Semberg, Morgan Rossander, Eddie Wadbro, Martin Servin, Sim-to-real transfer of active suspension control using deep reinforcement learning, Robotics and Autonomous Systems, Volume 179, 2024 DOI: 10.1016/j.robot.2024.104731.

We explore sim-to-real transfer of deep reinforcement learning controllers for a heavy vehicle with active suspensions designed for traversing rough terrain. While related research primarily focuses on lightweight robots with electric motors and fast actuation, this study uses a forestry vehicle with a complex hydraulic driveline and slow actuation. We simulate the vehicle using multibody dynamics and apply system identification to find an appropriate set of simulation parameters. We then train policies in simulation using various techniques to mitigate the sim-to-real gap, including domain randomization, action delays, and a reward penalty to encourage smooth control. In reality, the policies trained with action delays and a penalty for erratic actions perform nearly at the same level as in simulation. In experiments on level ground, the motion trajectories closely overlap when turning to either side, as well as in a route tracking scenario. When faced with a ramp that requires active use of the suspensions, the simulated and real motions are in close alignment. This shows that the actuator model together with system identification yields a sufficiently accurate model of the actuators. We observe that policies trained without the additional action penalty exhibit fast switching or bang–bang control. These present smooth motions and high performance in simulation but transfer poorly to reality. We find that policies make marginal use of the local height map for perception, showing no indications of predictive planning. However, the strong transfer capabilities entail that further development concerning perception and performance can be largely confined to simulation.

RL to learn the coordination of different goals in autonomous driving

J. Liu, J. Yin, Z. Jiang, Q. Liang and H. Li, Attention-Based Distributional Reinforcement Learning for Safe and Efficient Autonomous Driving, IEEE Robotics and Automation Letters, vol. 9, no. 9, pp. 7477-7484, Sept. 2024 DOI: 10.1109/LRA.2024.3427551.

Autonomous driving vehicles play a critical role in intelligent transportation systems and have garnered considerable attention. Currently, the popular approach in autonomous driving systems is to design separate optimal objectives for each independent module. Therefore, a major concern arises from the fact that these diverse optimal objectives may have an impact on the final driving policy. However, reinforcement learning provides a promising solution to tackle the challenge through joint training and its exploration ability. This letter aims to develop a safe and efficient reinforcement learning approach with advanced features for autonomous navigation in urban traffic scenarios. Firstly, we develop a novel distributional reinforcement learning method that integrates an implicit distribution model into an actor-critic framework. Subsequently, we introduce a spatial attention module to capture interaction features between the ego vehicle and other traffic vehicles, and design a temporal attention module to extract the long-term sequential feature. Finally, we utilize bird’s-eye-view as a context-aware representation of traffic scenarios, fused by the above spatio-temporal features. To validate our approach, we conduct experiments on the NoCrash and CoRL benchmarks, especially on our closed-loop openDD scenarios. The experimental results demonstrate the impressive performance of our approach in terms of convergence and stability compared to the baselines.

Making RL safer by first learning what is a safe situation

K. Fan, Z. Chen, G. Ferrigno and E. D. Momi, Learn From Safe Experience: Safe Reinforcement Learning for Task Automation of Surgical Robot, IEEE Transactions on Artificial Intelligence, vol. 5, no. 7, pp. 3374-3383, July 2024 DOI: 10.1109/TAI.2024.3351797.

Surgical task automation in robotics can improve the outcomes, reduce quality-of-care variance among surgeons and relieve surgeons’ fatigue. Reinforcement learning (RL) methods have shown considerable performance in robot autonomous control in complex environments. However, the existing RL algorithms for surgical robots do not consider any safety requirements, which is unacceptable in automating surgical tasks. In this work, we propose an approach called safe experience reshaping (SER) that can be integrated into any offline RL algorithm. First, the method identifies and learns the geometry of constraints. Second, a safe experience is obtained by projecting an unsafe action to the tangent space of the learned geometry, which means that the action is in the safe space. Then, the collected safe experiences are used for safe policy training. We designed three tasks that closely resemble real surgical tasks including 2-D cutting tasks and a contact-rich debridement task in 3-D space to evaluate the safe RL framework. We compare our framework to five state-of-the-art (SOTA) RL methods including reward penalty and primal-dual methods. Results show that our framework gets a lower rate of constraint violations and better performance in task success, especially with a higher convergence speed.

Learning how to reset the episode in RL

S. -H. Lee and S. -W. Seo, Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning Without Task-Specific Knowledge, IEEE Robotics and Automation Letters, vol. 9, no. 5, pp. 4043-4050, May 2024 DOI: 10.1109/LRA.2024.3375714.

A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent’s learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent’s learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.

Graph NNs in RL for improving sample efficiency

Feng Zhang, Chengbin Xuan, Hak-Keung Lam, An obstacle avoidance-specific reinforcement learning method based on fuzzy attention mechanism and heterogeneous graph neural networks, Engineering Applications of Artificial Intelligence, Volume 130, 2024 DOI: 10.1016/j.engappai.2023.107764.

Deep reinforcement learning (RL) is an advancing learning tool to handle robotics control problems. However, it typically suffers from sample efficiency and effectiveness. The emergence of Graph Neural Networks (GNNs) enables the integration of the RL and graph representation learning techniques. It realises outstanding training performance and transfer capability by forming controlling scenarios into the corresponding graph domain. Nevertheless, the existing approaches strongly depend on the artificial graph formation processes with intensive bias and cannot propagate messages discriminatively on explicit physical dependence, which leads to restricted flexibility, size transfer capability and suboptimal performance. This paper proposes a fuzzy attention mechanism-based heterogeneous graph neural network (FAM-HGNN) framework for resolving the control problem under the RL context. FAM emphasises the significant connections and weakening of the trivial connections in a fully connected graph, which mitigates the potential negative influence caused by the artificial graph formation process. HGNN obtains a higher level of relational inductive bias by conducting graph propagations on a masked graph. Experimental results show that our FAM-HGNN outperforms the multi-layer perceptron-based and the existing GNN-based RL approaches regarding training performance and size transfer capability. We also conducted an ablation study and sensitivity analysis to validate the efficacy of the proposed method further.

Offline RL in robotics

L. Yao, B. Zhao, X. Xu, Z. Wang, P. K. Wong and Y. Hu, Efficient Incremental Offline Reinforcement Learning With Sparse Broad Critic Approximation, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 1, pp. 156-169, Jan. 2024 DOI: 10.1109/TSMC.2023.3305498.

Offline reinforcement learning (ORL) has been getting increasing attention in robot learning, benefiting from its ability to avoid hazardous exploration and learn policies directly from precollected samples. Approximate policy iteration (API) is one of the most commonly investigated ORL approaches in robotics, due to its linear representation of policies, which makes it fairly transparent in both theoretical and engineering analysis. One open problem of API is how to design efficient and effective basis functions. The broad learning system (BLS) has been extensively studied in supervised and unsupervised learning in various applications. However, few investigations have been conducted on ORL. In this article, a novel incremental ORL approach with sparse broad critic approximation (BORL) is proposed with the advantages of BLS, which approximates the critic function in a linear manner with randomly projected sparse and compact features and dynamically expands its broad structure. The BORL is the first extension of API with BLS in the field of robotics and ORL. The approximation ability and convergence performance of BORL are also analyzed. Comprehensive simulation studies are then conducted on two benchmarks, and the results demonstrate that the proposed BORL can obtain comparable or better performance than conventional API methods without laborious hyperparameter fine-tuning work. To further demonstrate the effectiveness of BORL in practical robotic applications, a variable force tracking problem in robotic ultrasound scanning (RUSS) is investigated, and a learning-based adaptive impedance control (LAIC) algorithm is proposed based on BORL. The experimental results demonstrate the advantages of LAIC compared with conventional force tracking methods.

See also: X. Wang, D. Hou, L. Huang and Y. Cheng, “Offline\u2013Online Actor\u2013Critic,” in IEEE Transactions on Artificial Intelligence, vol. 5, no. 1, pp. 61-69, Jan. 2024, doi: 10.1109/TAI.2022.3225251

Hierarchical Deep-RL for continuous and large state spaces

A. P. Pope et al. Hierarchical Reinforcement Learning for Air Combat at DARPA’s AlphaDogfight Trials, EEE Transactions on Artificial Intelligence, vol. 4, no. 6, pp. 1371-1385, Dec. 2023 DOI: 10.1109/TAI.2022.3222143.

Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA’s AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event.