{"id":1664,"date":"2024-01-25T13:10:52","date_gmt":"2024-01-25T12:10:52","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1664"},"modified":"2024-01-25T13:10:52","modified_gmt":"2024-01-25T12:10:52","slug":"hierarchical-deep-rl-for-continuous-and-large-state-spaces","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1664","title":{"rendered":"Hierarchical Deep-RL for continuous and large state spaces"},"content":{"rendered":"<h4>A. P. Pope et al. <strong>Hierarchical Reinforcement Learning for Air Combat at DARPA&#8217;s AlphaDogfight Trials, <\/strong> EEE Transactions on Artificial Intelligence, vol. 4, no. 6, pp. 1371-1385, Dec. 2023 <a href=\"https:\/\/doi.org\/10.1109\/TAI.2022.3222143\" target=\"_blank\">DOI: 10.1109\/TAI.2022.3222143<\/a>.<\/h4>\n<blockquote><p>Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA&#8217;s AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>A. P. Pope et al. Hierarchical Reinforcement Learning for Air Combat at DARPA&#8217;s AlphaDogfight Trials, EEE Transactions on Artificial Intelligence, <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1664\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[215],"class_list":["post-1664","post","type-post","status-publish","format-standard","hentry","category-applications-of-reinforcement-learning-to-robots","tag-hierarchical-reinforcement-learning"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1664"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1664"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1664\/revisions"}],"predecessor-version":[{"id":1665,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1664\/revisions\/1665"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1664"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1664"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1664"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}