{"id":1777,"date":"2024-09-06T08:02:28","date_gmt":"2024-09-06T07:02:28","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1777"},"modified":"2024-09-06T08:02:28","modified_gmt":"2024-09-06T07:02:28","slug":"improving-reward-sparse-situations-in-rl-by-adding-backward-learning","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1777","title":{"rendered":"Improving reward-sparse situations in RL by adding backward learning"},"content":{"rendered":"<h4>X. Qi, D. Chen, Z. Li and X. Tan, <strong>Back-Stepping Experience Replay With Application to Model-Free Reinforcement Learning for a Soft Snake Robot,<\/strong> IEEE Robotics and Automation Letters, vol. 9, no. 9, pp. 7517-7524, Sept. 2024 <a href=\"https:\/\/doi.org\/10.1109\/LRA.2024.3427550\" target=\"_blank\">DOI: 10.1109\/LRA.2024.3427550<\/a>.<\/h4>\n<blockquote><p>In this letter, we propose a novel technique, Back-stepping Experience Replay (BER), that is compatible with arbitrary off-policy reinforcement learning (RL) algorithms. BER aims to enhance learning efficiency in systems with approximate reversibility, reducing the need for complex reward shaping. The method constructs reversed trajectories using back-stepping transitions to reach random or fixed targets. Interpretable as a bi-directional approach, BER addresses inaccuracies in back-stepping transitions through a purification of the replay experience during learning. Given the intricate nature of soft robots and their complex interactions with environments, we present an application of BER in a model-free RL approach for the locomotion and navigation of a soft snake robot, which is capable of serpentine motion enabled by anisotropic friction between the body and ground. In addition, a dynamic simulator is developed to assess the effectiveness and efficiency of the BER algorithm, in which the robot demonstrates successful learning (reaching a 100% success rate) and adeptly reaches random targets, achieving an average speed 48% faster than that of the best baseline approach.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>X. Qi, D. Chen, Z. Li and X. Tan, Back-Stepping Experience Replay With Application to Model-Free Reinforcement Learning for a <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1777\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[309,524],"class_list":["post-1777","post","type-post","status-publish","format-standard","hentry","category-applications-of-reinforcement-learning-to-robots","tag-deep-reinforcement-learning","tag-sparse-rewards"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1777"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1777"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1777\/revisions"}],"predecessor-version":[{"id":1778,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1777\/revisions\/1778"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1777"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1777"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}