Category Archives: Applications Of Reinforcement Learning To Robots

Making RL safer by first learning what is a safe situation

K. Fan, Z. Chen, G. Ferrigno and E. D. Momi, Learn From Safe Experience: Safe Reinforcement Learning for Task Automation of Surgical Robot, IEEE Transactions on Artificial Intelligence, vol. 5, no. 7, pp. 3374-3383, July 2024 DOI: 10.1109/TAI.2024.3351797.

Surgical task automation in robotics can improve the outcomes, reduce quality-of-care variance among surgeons and relieve surgeons’ fatigue. Reinforcement learning (RL) methods have shown considerable performance in robot autonomous control in complex environments. However, the existing RL algorithms for surgical robots do not consider any safety requirements, which is unacceptable in automating surgical tasks. In this work, we propose an approach called safe experience reshaping (SER) that can be integrated into any offline RL algorithm. First, the method identifies and learns the geometry of constraints. Second, a safe experience is obtained by projecting an unsafe action to the tangent space of the learned geometry, which means that the action is in the safe space. Then, the collected safe experiences are used for safe policy training. We designed three tasks that closely resemble real surgical tasks including 2-D cutting tasks and a contact-rich debridement task in 3-D space to evaluate the safe RL framework. We compare our framework to five state-of-the-art (SOTA) RL methods including reward penalty and primal-dual methods. Results show that our framework gets a lower rate of constraint violations and better performance in task success, especially with a higher convergence speed.

Learning how to reset the episode in RL

S. -H. Lee and S. -W. Seo, Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning Without Task-Specific Knowledge, IEEE Robotics and Automation Letters, vol. 9, no. 5, pp. 4043-4050, May 2024 DOI: 10.1109/LRA.2024.3375714.

A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent’s learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent’s learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.

Graph NNs in RL for improving sample efficiency

Feng Zhang, Chengbin Xuan, Hak-Keung Lam, An obstacle avoidance-specific reinforcement learning method based on fuzzy attention mechanism and heterogeneous graph neural networks, Engineering Applications of Artificial Intelligence, Volume 130, 2024 DOI: 10.1016/j.engappai.2023.107764.

Deep reinforcement learning (RL) is an advancing learning tool to handle robotics control problems. However, it typically suffers from sample efficiency and effectiveness. The emergence of Graph Neural Networks (GNNs) enables the integration of the RL and graph representation learning techniques. It realises outstanding training performance and transfer capability by forming controlling scenarios into the corresponding graph domain. Nevertheless, the existing approaches strongly depend on the artificial graph formation processes with intensive bias and cannot propagate messages discriminatively on explicit physical dependence, which leads to restricted flexibility, size transfer capability and suboptimal performance. This paper proposes a fuzzy attention mechanism-based heterogeneous graph neural network (FAM-HGNN) framework for resolving the control problem under the RL context. FAM emphasises the significant connections and weakening of the trivial connections in a fully connected graph, which mitigates the potential negative influence caused by the artificial graph formation process. HGNN obtains a higher level of relational inductive bias by conducting graph propagations on a masked graph. Experimental results show that our FAM-HGNN outperforms the multi-layer perceptron-based and the existing GNN-based RL approaches regarding training performance and size transfer capability. We also conducted an ablation study and sensitivity analysis to validate the efficacy of the proposed method further.

Offline RL in robotics

L. Yao, B. Zhao, X. Xu, Z. Wang, P. K. Wong and Y. Hu, Efficient Incremental Offline Reinforcement Learning With Sparse Broad Critic Approximation, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 54, no. 1, pp. 156-169, Jan. 2024 DOI: 10.1109/TSMC.2023.3305498.

Offline reinforcement learning (ORL) has been getting increasing attention in robot learning, benefiting from its ability to avoid hazardous exploration and learn policies directly from precollected samples. Approximate policy iteration (API) is one of the most commonly investigated ORL approaches in robotics, due to its linear representation of policies, which makes it fairly transparent in both theoretical and engineering analysis. One open problem of API is how to design efficient and effective basis functions. The broad learning system (BLS) has been extensively studied in supervised and unsupervised learning in various applications. However, few investigations have been conducted on ORL. In this article, a novel incremental ORL approach with sparse broad critic approximation (BORL) is proposed with the advantages of BLS, which approximates the critic function in a linear manner with randomly projected sparse and compact features and dynamically expands its broad structure. The BORL is the first extension of API with BLS in the field of robotics and ORL. The approximation ability and convergence performance of BORL are also analyzed. Comprehensive simulation studies are then conducted on two benchmarks, and the results demonstrate that the proposed BORL can obtain comparable or better performance than conventional API methods without laborious hyperparameter fine-tuning work. To further demonstrate the effectiveness of BORL in practical robotic applications, a variable force tracking problem in robotic ultrasound scanning (RUSS) is investigated, and a learning-based adaptive impedance control (LAIC) algorithm is proposed based on BORL. The experimental results demonstrate the advantages of LAIC compared with conventional force tracking methods.

See also: X. Wang, D. Hou, L. Huang and Y. Cheng, “Offline\u2013Online Actor\u2013Critic,” in IEEE Transactions on Artificial Intelligence, vol. 5, no. 1, pp. 61-69, Jan. 2024, doi: 10.1109/TAI.2022.3225251

Hierarchical Deep-RL for continuous and large state spaces

A. P. Pope et al. Hierarchical Reinforcement Learning for Air Combat at DARPA’s AlphaDogfight Trials, EEE Transactions on Artificial Intelligence, vol. 4, no. 6, pp. 1371-1385, Dec. 2023 DOI: 10.1109/TAI.2022.3222143.

Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA’s AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event.

Hierarchical RL with continuous options

Zhigang Huang, Quan Liu, Fei Zhu, Hierarchical reinforcement learning with adaptive scheduling for robot control, Engineering Applications of Artificial Intelligence, Volume 126, Part D, 2023 DOI: 10.1016/j.engappai.2023.107130.

Conventional hierarchical reinforcement learning (HRL) relies on discrete options to represent explicitly distinguishable knowledge, which may lead to severe performance bottlenecks. It is possible to represent richer knowledge through continuous options, but reliable scheduling methods are lacking. To design an available scheduling method for continuous options, in this paper, the hierarchical reinforcement learning with adaptive scheduling (HAS) algorithm is proposed. Its low-level controller learns diverse options, while the high-level controller schedules options to learn solutions. It achieves an adaptive balance between exploration and exploitation during the frequent scheduling of continuous options, maximizing the representation potential of continuous options. It builds on multi-step static scheduling and makes switching decisions according to the relative advantages of the previous and the estimated continuous options, enabling the agent to focus on different behaviors at different phases of the task. The expected t-step distance is applied to demonstrate the superiority of adaptive scheduling in terms of exploration. Furthermore, an interruption incentive based on annealing is proposed to alleviate excessive exploration during the early training phase, accelerating the convergence rate. Finally, we apply HAS to robot control with sparse rewards in continuous spaces, and develop a comprehensive experimental analysis scheme. The experimental results not only demonstrate the high performance and robustness of HAS, but also provide evidence that the adaptive scheduling method has a positive effect both on the representation and option policies.

RL to learn not only manipulator skills but also safety skills

A. C. Ak, E. E. Aksoy and S. Sariel, Learning Failure Prevention Skills for Safe Robot Manipulation, IEEE Robotics and Automation Letters, vol. 8, no. 12, pp. 7994-8001, Dec. 2023 DOI: 10.1109/LRA.2023.3324587.

Robots are more capable of achieving manipulation tasks for everyday activities than before. However, the safety of manipulation skills that robots employ is still an open problem. Considering all possible failures during skill learning increases the complexity of the process and restrains learning an optimal policy. Nonetheless, safety-focused modularity in the acquisition of skills has not been adequately addressed in previous works. For that purpose, we reformulate skills as base and failure prevention skills, where base skills aim at completing tasks and failure prevention skills aim at reducing the risk of failures to occur. Then, we propose a modular and hierarchical method for safe robot manipulation by augmenting base skills by learning failure prevention skills with reinforcement learning and forming a skill library to address different safety risks. Furthermore, a skill selection policy that considers estimated risks is used for the robot to select the best control policy for safe manipulation. Our experiments show that the proposed method achieves the given goal while ensuring safety by preventing failures. We also show that with the proposed method, skill learning is feasible and our safe manipulation tools can be transferred to the real environment.

Dealing with affordances in robotics through RL

X. Yang, Z. Ji, J. Wu and Y. -K. Lai, Recent Advances of Deep Robotic Affordance Learning: A Reinforcement Learning Perspective, EEE Transactions on Cognitive and Developmental Systems, vol. 15, no. 3, pp. 1139-1149, Sept. 2023 DOI: 10.1109/TCDS.2023.3277288.

As a popular concept proposed in the field of psychology, affordance has been regarded as one of the important abilities that enable humans to understand and interact with the environment. Briefly, it captures the possibilities and effects of the actions of an agent applied to a specific object or, more generally, a part of the environment. This article provides a short review of the recent developments of deep robotic affordance learning (DRAL), which aims to develop data-driven methods that use the concept of affordance to aid in robotic tasks. We first classify these papers from a reinforcement learning (RL) perspective and draw connections between RL and affordances. The technical details of each category are discussed and their limitations are identified. We further summarize them and identify future challenges from the aspects of observations, actions, affordance representation, data-collection, and real-world deployment. A final remark is given at the end to propose a promising future direction of the RL-based affordance definition to include the predictions of arbitrary action consequences.

Using “empowerment” to better select actions in RL when there are only sparse rewards

Dai, S., Xu, W., Hofmann, A. et al. An empowerment-based solution to robotic manipulation tasks with sparse rewards, Auton Robot 47, 617\u2013633 (2023) DOI: 10.1007/s10514-023-10087-8.

In order to provide adaptive and user-friendly solutions to robotic manipulation, it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse, this paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm and can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity, this approach shows superior performance compared to other state-of-the-art intrinsic exploration approaches during extensive empirical testing. When combined with other strategies for tackling the exploration challenge, e.g. curriculum learning, our approach is able to further improve the exploration efficiency and task success rate. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations, this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process.

A survey of guided RL for improving its application on robotics

J. E�er, N. Bach, C. Jestel, O. Urbann and S. Kerner, Guided Reinforcement Learning: A Review and Evaluation for Efficient and Effective Real-World Robotics [Survey], IEEE Robotics & Automation Magazine, vol. 30, no. 2, pp. 67-85, June 2023 DOI: 10.1109/MRA.2022.3207664.

Recent successes aside, reinforcement learning (RL) still faces significant challenges in its application to the real-world robotics domain. Guiding the learning process with additional knowledge offers a potential solution, thus leveraging the strengths of data- and knowledge-driven approaches. However, this field of research encompasses several disciplines and hence would benefit from a structured overview.

In this article, we propose a concept of guided RL that provides a systematic approach toward accelerating the training process and improving performance for real-world robotics settings. We introduce a taxonomy that structures guided RL approaches and shows how different sources of knowledge can be integrated into the learning pipeline in a practical way. Based on this, we describe available approaches in this field and quantitatively evaluate their specific impact in terms of efficiency, effectiveness, and sim-to-real transfer within the robotics domain.