Tag Archives: Hierarchical Learning

Hierarchical RL with continuous options

Zhigang Huang, Quan Liu, Fei Zhu, Hierarchical reinforcement learning with adaptive scheduling for robot control, Engineering Applications of Artificial Intelligence, Volume 126, Part D, 2023 DOI: 10.1016/j.engappai.2023.107130.

Conventional hierarchical reinforcement learning (HRL) relies on discrete options to represent explicitly distinguishable knowledge, which may lead to severe performance bottlenecks. It is possible to represent richer knowledge through continuous options, but reliable scheduling methods are lacking. To design an available scheduling method for continuous options, in this paper, the hierarchical reinforcement learning with adaptive scheduling (HAS) algorithm is proposed. Its low-level controller learns diverse options, while the high-level controller schedules options to learn solutions. It achieves an adaptive balance between exploration and exploitation during the frequent scheduling of continuous options, maximizing the representation potential of continuous options. It builds on multi-step static scheduling and makes switching decisions according to the relative advantages of the previous and the estimated continuous options, enabling the agent to focus on different behaviors at different phases of the task. The expected t-step distance is applied to demonstrate the superiority of adaptive scheduling in terms of exploration. Furthermore, an interruption incentive based on annealing is proposed to alleviate excessive exploration during the early training phase, accelerating the convergence rate. Finally, we apply HAS to robot control with sparse rewards in continuous spaces, and develop a comprehensive experimental analysis scheme. The experimental results not only demonstrate the high performance and robustness of HAS, but also provide evidence that the adaptive scheduling method has a positive effect both on the representation and option policies.

A hierarchical robot control architecture that supports learning of skills at different levels through “curriculum learning” and an interesting approach to mix behaviours

Suro, F., Ferber, J., Stratulat, T. et al., A hierarchical representation of behaviour supporting open ended development and progressive learning for artificial agents, . Auton Robot 45, 245–264 (2021) DOI: 10.1007/s10514-020-09960-7.

One of the challenging aspects of open ended or lifelong agent development is that the final behaviour for which an agent is trained at a given moment can be an element for the future creation of one, or even several, behaviours of greater complexity, whose purpose cannot be anticipated. In this paper, we present modular influence network design (MIND), an artificial agent control architecture suited to open ended and cumulative learning. The MIND architecture encapsulates sub behaviours into modules and combines them into a hierarchy reflecting the modular and hierarchical nature of complex tasks. Compared to similar research, the main original aspect of MIND is the multi layered hierarchy using a generic control signal, the influence, to obtain an efficient global behaviour. This article shows the ability of MIND to learn a curriculum of independent didactic tasks of increasing complexity covering different aspects of a desired behaviour. In so doing we demonstrate the contributions of MIND to open-ended development: encapsulation into modules allows for the preservation and re-usability of all the skills acquired during the curriculum and their focused retraining, the modular structure serves the evolving topology by easing the coordination of new sensors, actuators and heterogeneous learning structures.

Layered learning: how to learn hierarchically more complex behaviors based on simpler ones, applied to robot soccer

Patrick MacAlpine, Peter Stone, Overlapping layered learning, Artificial Intelligence, Volume 254, 2018, Pages 21-43, DOI: 10.1016/j.artint.2017.09.001.

Layered learning is a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. A key feature of layered learning is that higher layers directly depend on the learned lower layers. In its original formulation, lower layers were frozen prior to learning higher layers. This article considers a major extension to the paradigm that allows learning certain behaviors independently, and then later stitching them together by learning at the “seams” where their influences overlap. The UT Austin Villa 2014 RoboCup 3D simulation team, using such overlapping layered learning, learned a total of 19 layered behaviors for a simulated soccer-playing robot, organized both in series and in parallel. To the best of our knowledge this is more than three times the number of layered behaviors in any prior layered learning system. Furthermore, the complete learning process is repeated on four additional robot body types, showcasing its generality as a paradigm for efficient behavior learning. The resulting team won the RoboCup 2014 championship with an undefeated record, scoring 52 goals and conceding none. This article includes a detailed experimental analysis of the team’s performance and the overlapping layered learning approach that led to its success.