Monthly Archives: December 2023

You are browsing the site archives by month.

Limiting human intervention in the design of RL solutions (now called “Automated RL”)

Marco Mussi, Davide Lombarda, Alberto Maria Metelli, Francesco Trov�, Marcello Restelli, ARLO: A framework for Automated Reinforcement Learning, Expert Systems with Applications, Volume 224, 2023 DOI: 10.1016/j.eswa.2023.119883.

Automated Reinforcement Learning (AutoRL) is a relatively new area of research that is gaining increasing attention. The objective of AutoRL consists in easing the employment of Reinforcement Learning (RL) techniques for the broader public by alleviating some of its main challenges, including data collection, algorithm selection, and hyper-parameter tuning. In this work, we propose a general and flexible framework, namely ARLO: Automated Reinforcement Learning Optimizer, to construct automated pipelines for AutoRL. Based on this, we propose a pipeline for offline and one for online RL, discussing the components, interaction, and highlighting the difference between the two settings. Furthermore, we provide a Python implementation of such pipelines, released as an open-source library. Our implementation is tested on an illustrative LQG domain and on classic MuJoCo environments, showing the ability to reach competitive performances requiring limited human intervention. We also showcase the full pipeline on a realistic dam environment, automatically performing the feature selection and the model generation tasks.

Multi-task RL through common perceptions

Jinling Meng, Fei Zhu, Seek for commonalities: Shared features extraction for multi-task reinforcement learning via adversarial training, Expert Systems with Applications, Volume 224, 2023 DOI: 10.1016/j.eswa.2023.119975.

Multi-task reinforcement learning is promising to alleviate the low sample efficiency and high computation cost of reinforcement learning algorithms. However, current methods mostly focus on unique features that are not conducive to the transfer between tasks. Moreover, they usually lack a balance mechanism among tasks, which often leads to the unnecessary occupation of training resources by tasks that have already been trained. To address the problems, a simple yet effective method referred to as Adaptive Experience buffer with Shared Features Multi-Task Reinforcement Learning (AESF-MTRL) is proposed. In AESF-MTRL, input observation of the environment is divided into shared features and unique features, which are extracted using different feature extractors. Unique features are extracted by simple gradient descent, while shared features are extracted using adversarial training, with an additional discriminator trained to ensure that the extracted features are indeed shared features. AESF-MTRL also maintains a reward stack to adjust the sampling ratio of trajectories from different tasks dynamically during the update period to balance the learning process of different tasks. Experiments on multiple robotics control environments demonstrate the effectiveness of the proposed method.

Embedding actual knowledge into Deep Learning to improve its reliability

Lutter M, Peters J., Combining physics and deep learning to learn continuous-time dynamics models, The International Journal of Robotics Research. 2023;42(3):83-107 DOI: 10.1177/02783649231169492.

Deep learning has been widely used within learning algorithms for robotics. One disadvantage of deep networks is that these networks are black-box representations. Therefore, the learned approximations ignore the existing knowledge of physics or robotics. Especially for learning dynamics models, these black-box models are not desirable as the underlying principles are well understood and the standard deep networks can learn dynamics that violate these principles. To learn dynamics models with deep networks that guarantee physically plausible dynamics, we introduce physics-inspired deep networks that combine first principles from physics with deep learning. We incorporate Lagrangian mechanics within the model learning such that all approximated models adhere to the laws of physics and conserve energy. Deep Lagrangian Networks (DeLaN) parametrize the system energy using two networks. The parameters are obtained by minimizing the squared residual of the Euler\u2013Lagrange differential equation. Therefore, the resulting model does not require specific knowledge of the individual system, is interpretable, and can be used as a forward, inverse, and energy model. Previously these properties were only obtained when using system identification techniques that require knowledge of the kinematic structure. We apply DeLaN to learning dynamics models and apply these models to control simulated and physical rigid body systems. The results show that the proposed approach obtains dynamics models that can be applied to physical systems for real-time control. Compared to standard deep networks, the physics-inspired models learn better models and capture the underlying structure of the dynamics.

Using proprioceptive, internal perceptions, in robots, with RL

Agnese Augello, Salvatore Gaglio, Ignazio Infantino, Umberto Maniscalco, Giovanni Pilato, Filippo Vella, Roboception and adaptation in a cognitive robot, Robotics and Autonomous Systems, Volume 164, 2023 DOI: 10.1016/j.robot.2023.104400.

In robotics, perception is usually oriented at understanding what is happening in the external world, while few works pay attention to what is occurring in the robot\u2019s body. In this work, we propose an artificial somatosensory system, embedded in a cognitive architecture, that enables a robot to perceive the sensations from its embodiment while executing a task. We called these perceptions roboceptions, and they let the robot act according to its own physical needs in addition to the task demands. Physical information is processed by the robot to behave in a balanced way, determining the most appropriate trade-off between the achievement of the task and its well being. The experiments show the integration of information from the somatosensory system and the choices that lead to the accomplishment of the task.

Measuring conceptual understanding of Systems & Signals university subjects

C. Crockett, H. C. Powell and C. J. Finelli, Conceptual Understanding of Signals and Systems in Senior Undergraduate Students, IEEE Transactions on Education, vol. 66, no. 2, pp. 113-122, April 2023 DOI: 10.1109/TE.2022.3199079.

Contribution: This article proposes a new definition of conceptual understanding (CU) specific to engineering. It then measures CU of signals and systems (S&S) in senior undergraduate students and describes how students approach conceptual problems. Background: Previous studies across multiple engineering subjects show students have low CU at the end of courses. However, little is known about CU semesters after a course. Research Questions: What is the CU of S&S concepts among electrical engineering senior students? Methodology: This mixed method study uses quantitative concept inventory data (n=467) and think-aloud interviews (n=12) to measure CU. The results come from two universities. Findings: Seniors\u2019 scores on the concept inventory are typical of scores presented at the end of an S&S course. Many struggled with the concept of linearity, made a common error when finding the maximum value in graphical convolution, and had low confidence on relating frequencies in time to a Fourier transform representation, but seniors had relatively high CU of time invariance and filtering.

Further support for a multi-tool approach for consciusness

Biyu J. He, Towards a pluralistic neurobiological understanding of consciousness, Trends in Cognitive Sciences, Volume 27, Issue 5, 2023 DOI: 10.1016/j.tics.2023.02.001.

Theories of consciousness are often based on the assumption that a single, unified neurobiological account will explain different types of conscious awareness. However, recent findings show that, even within a single modality such as conscious visual perception, the anatomical location, timing, and information flow of neural activity related to conscious awareness vary depending on both external and internal factors. This suggests that the search for generic neural correlates of consciousness may not be fruitful. I argue that consciousness science requires a more pluralistic approach and propose a new framework: joint determinant theory (JDT). This theory may be capable of accommodating different brain circuit mechanisms for conscious contents as varied as percepts, wills, memories, emotions, and thoughts, as well as their integrated experience.

Active Inference and Behaviour Trees as alternatives to POMDPs and the like in the perception and action of robots

C. Pezzato, C. H. Corbato, S. Bonhof and M. Wisse, Active Inference and Behavior Trees for Reactive Action Planning and Execution in Robotics, IEEE Transactions on Robotics, vol. 39, no. 2, pp. 1050-1069, April 2023 DOI: 10.1109/TRO.2022.3226144.

In this article, we propose a hybrid combination of active inference and behavior trees (BTs) for reactive action planning and execution in dynamic environments, showing how robotic tasks can be formulated as a free-energy minimization problem. The proposed approach allows handling partially observable initial states and improves the robustness of classical BTs against unexpected contingencies while at the same time reducing the number of nodes in a tree. In this work, we specify the nominal behavior offline, through BTs. However, in contrast to previous approaches, we introduce a new type of leaf node to specify the desired state to be achieved rather than an action to execute. The decision of which action to execute to reach the desired state is performed online through active inference. This results in continual online planning and hierarchical deliberation. By doing so, an agent can follow a predefined offline plan while still keeping the ability to locally adapt and take autonomous decisions at runtime, respecting safety constraints. We provide proof of convergence and robustness analysis, and we validate our method in two different mobile manipulators performing similar tasks, both in a simulated and real retail environment. The results showed improved runtime adaptability with a fraction of the hand-coded nodes compared to classical BTs.

Real-time approach to POMDPs for robot navigation

P. Cai and D. Hsu, Closing the Planning\u2013Learning Loop With Application to Autonomous Driving, IEEE Transactions on Robotics, vol. 39, no. 2, pp. 998-1011, April 2023 DOI: 10.1109/TRO.2022.3210767.

Real-time planning under uncertainty is critical for robots operating in complex dynamic environments. Consider, for example, an autonomous robot vehicle driving in dense, unregulated urban traffic of cars, motorcycles, buses, etc. The robot vehicle has to plan in both short and long terms, in order to interact with many traffic participants of uncertain intentions and drive effectively. Planning explicitly over a long time horizon, however, incurs prohibitive computational cost and is impractical under real-time constraints. To achieve real-time performance for large-scale planning, this work introduces a new algorithm Learning from Tree Search for Driving (LeTS-Drive), which integrates planning and learning in a closed loop, and applies it to autonomous driving in crowded urban traffic in simulation. Specifically, LeTS-Drive learns a policy and its value function from data provided by an online planner, which searches a sparsely sampled belief tree; the online planner in turn uses the learned policy and value functions as heuristics to scale up its run-time performance for real-time robot control. These two steps are repeated to form a closed loop so that the planner and the learner inform each other and improve in synchrony. The algorithm learns on its own in a self-supervised manner, without human effort on explicit data labeling. Experimental results demonstrate that LeTS-Drive outperforms either planning or learning alone, as well as open-loop integration of planning and learning.

Q-learning with a variation of e-greedy to learn the optimal management of energy in autonomous vehicles navigation

Mojgan Fayyazi, Monireh Abdoos, Duong Phan, Mohsen Golafrouz, Mahdi Jalili, Reza N. Jazar, Reza Langari, Hamid Khayyam, Real-time self-adaptive Q-learning controller for energy management of conventional autonomous vehicles, Expert Systems with Applications, Volume 222, 2023 DOI: 10.1016/j.eswa.2023.119770.

Reducing emissions and energy consumption of autonomous vehicles is critical in the modern era. This paper presents an intelligent energy management system based on Reinforcement Learning (RL) for conventional autonomous vehicles. Furthermore, in order to improve the efficiency, a new exploration strategy is proposed to replace the traditional decayed \u03b5-greedy strategy in the Q-learning algorithm associated with RL. Unlike traditional Q-learning algorithms, the proposed self-adaptive Q-learning (SAQ-learning) can be applied in real-time. The learning capability of the controllers can help the vehicle deal with unknown situations in real-time. Numerical simulations show that compared to other controllers, Q-learning and SAQ-learning controllers can generate the desired engine torque based on the vehicle road power demand and control the air/fuel ratio by changing the throttle angle efficiently in real-time. Also, the proposed real-time SAQ-learning is shown to improve the operational time by 23% compared to standard Q-learning. Our simulations reveal the effectiveness of the proposed control system compared to other methods, namely dynamic programming and fuzzy logic methods.

There are people working on robotic software engineering these days :-O ! (real-time included)

Arturo Laurenzi, Davide Antonucci, Nikos G. Tsagarakis, Luca Muratore, The XBot2 real-time middleware for robotics, Robotics and Autonomous Systems, Volume 163, 2023 DOI: 10.1016/j.robot.2023.104379.

This paper introduces XBot2, a novel real-time middleware for robotic applications with a strong focus on modularity and reusability of components, and seamless support for multi-threaded, mixed real-time (RT) and non-RT architectures. Compared to previous works, XBot2 focuses on providing a dynamic, ready-to-use hardware abstraction layer that allows users to make run-time queries about the robot topology, and act consequently, by leveraging an easy-to-use API that is fully RT-compatible. We provide an extensive description about implementation challenges and design decisions, and finally validate our architecture with multiple use-cases. These range from the integration of three popular simulation tools (i.e. Gazebo, PyBullet, and MuJoCo), to real-world tests involving complex, hybrid robotic platforms such as IIT\u2019s CENTAURO and MoCA robots.