Tag Archives: Sparse Rewards

Improving reward shaping in Deep RL for avoiding user’s biases and boosting learning efficiency

Jiawei Lin, Xuekai Wei, Weizhi Xian, Jielu Yan, Leong Hou U, Yong Feng, Zhaowei Shang, Mingliang Zhou, Continuous reinforcement learning via advantage value difference reward shaping: A proximal policy optimization perspective, Engineering Applications of Artificial Intelligence, Volume 151, 2025 10.1016/j.engappai.2025.110676.

Deep reinforcement learning has shown great promise in industrial applications. However, these algorithms suffer from low learning efficiency because of sparse reward signals in continuous control tasks. Reward shaping addresses this issue by transforming sparse rewards into more informative signals, but some designs that rely on domain experts or heuristic rules can introduce cognitive biases, leading to suboptimal solutions. To overcome this challenge, this paper proposes the advantage value difference (AVD), a generalized potential-based end-to-end exploration reward function. The main contribution of this paper is to improve the agent’s exploration efficiency, accelerate the learning process, and prevent premature convergence to local optima. The method leverages the temporal difference error to estimate the potential of states and uses the advantage function to guide the learning process toward more effective strategies. In the context of engineering applications, this paper proves the superiority of AVD in continuous control tasks within the multi-joint dynamics with contact (MuJoCo) environment. Specifically, the proposed method achieves an average increase of 23.5% in episode rewards for the Hopper, Swimmer, and Humanoid tasks compared with the state-of-the-art approaches. The results demonstrate the significant improvement in learning efficiency achieved by AVD for industrial robotic systems.

Generating intrinsic rewards to address the sparse reward problem of RL

Z. Gao et al., Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning, IEEE Transactions on Artificial Intelligence, vol. 5, no. 11, pp. 5530-5539, Nov. 2024, DOI: 10.1109/TAI.2024.3413692.

In sparse extrinsic reward settings, reinforcement learning remains a challenge despite increasing interest in this field. Existing approaches suggest that intrinsic rewards can alleviate issues caused by reward sparsity. However, many studies overlook the critical role of temporal information, essential for human curiosity. This article introduces a novel intrinsic reward mechanism inspired by human learning processes, where curiosity is evaluated by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, periodically saving snapshots of the model parameters, and employing the nuclear norm to assess the temporal inconsistency between predictions from different snapshots as intrinsic rewards. Additionally, we propose a variational weighting mechanism to adaptively assign weights to the snapshots, enhancing the model’s robustness and performance. Experimental results across various benchmark environments demonstrate the efficacy of our approach, which outperforms other state-of-the-art methods without incurring additional training costs and exhibits higher noise tolerance. Our findings indicate that leveraging temporal information in intrinsic rewards can significantly improve exploration performance, motivating future research to develop more robust and accurate reward systems for reinforcement learning.

Improving sample efficiency under sparse rewards and large continuous action spaces through predictive control in RL

Antonyshyn, L., Givigi, S., Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards, J Intell Robot Syst 110, 100 (2024) DOI: 10.1007/s10846-024-02118-y.

Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.

Improving reward-sparse situations in RL by adding backward learning

X. Qi, D. Chen, Z. Li and X. Tan, Back-Stepping Experience Replay With Application to Model-Free Reinforcement Learning for a Soft Snake Robot, IEEE Robotics and Automation Letters, vol. 9, no. 9, pp. 7517-7524, Sept. 2024 DOI: 10.1109/LRA.2024.3427550.

In this letter, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a purification of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.

Using “empowerment” to better select actions in RL when there are only sparse rewards

Dai, S., Xu, W., Hofmann, A. et al. An empowerment-based solution to robotic manipulation tasks with sparse rewards, Auton Robot 47, 617\u2013633 (2023) DOI: 10.1007/s10514-023-10087-8.

In order to provide adaptive and user-friendly solutions to robotic manipulation, it is important that the agent can learn to accomplish tasks even if they are only provided with very sparse instruction signals. To address the issues reinforcement learning algorithms face when task rewards are sparse, this paper proposes an intrinsic motivation approach that can be easily integrated into any standard reinforcement learning algorithm and can allow robotic manipulators to learn useful manipulation skills with only sparse extrinsic rewards. Through integrating and balancing empowerment and curiosity, this approach shows superior performance compared to other state-of-the-art intrinsic exploration approaches during extensive empirical testing. When combined with other strategies for tackling the exploration challenge, e.g. curriculum learning, our approach is able to further improve the exploration efficiency and task success rate. Qualitative analysis also shows that when combined with diversity-driven intrinsic motivations, this approach can help manipulators learn a set of diverse skills which could potentially be applied to other more complicated manipulation tasks and accelerate their learning process.