Finding the policy that generalizes the best in a sample of possible real scenarios by leveraging PAC-Bayes

Majumdar A, Farid A, Sonar A., PAC-Bayes control: learning policies that provably generalize to novel environments. The International Journal of Robotics Research. 2021;40(2-3):574-593 DOI: 10.1177/0278364920959444.

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the probably approximately correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (relative entropy programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural network-based grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural network-based policies.

Extracting video summaries from RL processes to explain and understand them

Pedro Sequeira, Melinda Gervasio, Interestingness elements for explainable reinforcement learning: Understanding agents’ capabilities and limitations. Artificial Intelligence, Volume 288, 2020 DOI: 10.1016/j.artint.2020.103367.

We propose an explainable reinforcement learning (XRL) framework that analyzes an agent’s history of interaction with the environment to extract interestingness elements that help explain its behavior. The framework relies on data readily available from standard RL algorithms, augmented with data that can easily be collected by the agent while learning. We describe how to create visual summaries of an agent’s behavior in the form of short video-clips highlighting key interaction moments, based on the proposed elements. We also report on a user study where we evaluated the ability of humans to correctly perceive the aptitude of agents with different characteristics, including their capabilities and limitations, given visual summaries automatically generated by our framework. The results show that the diversity of aspects captured by the different interestingness elements is crucial to help humans correctly understand an agent’s strengths and limitations in performing a task, and determine when it might need adjustments to improve its performance.

Expressing POMDPs policies through Knowledge-Based programs

Bruno Zanuttini, Jérôme Lang, Abdallah Saffidine, François Schwarzentruber Knowledge-based programs as succinct policies for partially observable domains. Artificial Intelligence, Volume 288, 2020 DOI: 10.1016/j.artint.2020.103365.

We suggest to express policies for contingent planning by knowledge-based programs (KBPs). KBPs, introduced by Fagin et al. (1995) [32], are high-level protocols describing the actions that the agent should perform as a function of their current knowledge: branching conditions are epistemic formulas that are interpretable by the agent. The main aim of our paper is to show that KBPs can be seen as a succinct language for expressing policies in single-agent contingent planning. KBP are conceptually very close to languages used for expressing policies in the partially observable planning literature: like them, they have conditional and looping structures, with actions as atomic programs and Boolean formulas on beliefs for choosing the execution path. Now, the specificity of KBPs is that branching conditions refer to the belief state and not to the observations. Because of their structural proximity, KBPs and standard languages for representing policies have the same power of expressivity: every standard policy can be expressed as a KBP, and every KBP can be “unfolded” into a standard policy. However, KBPs are more succinct, more readable, and more explainable than standard policies. On the other hand, they require more online computation time, but we show that this is an unavoidable tradeoff. We study knowledge-based programs along four criteria: expressivity, succinctness, complexity of online execution, and complexity of verification.

A very detailed study of the performance of propellers

Scanavino, M., Vilardi, A. & Guglieri, G. An Experimental Analysis on Propeller Performance in a Climate-controlled Facility. J Intell Robot Syst 100, 505–517 (2020) DOI: 10.1007/s10846-019-01132-9.

Despite many commercial applications make extensive use of Unmanned Aircraft Systems (UAS), there is still lack of published data about their performance under unconventional weather conditions. In the last years, multirotors and fixed wing vehicles, commonly referred to as drones, have been studied in wind environments so that stability and controllability have been improved. However, other important weather variables have impact on UAS performance and they should be properly investigated for a deeper understanding of such vehicles. The primary objective of our study is the preliminary characterization of a propeller in a climate-controlled chamber. Mechanical and electrical data have been measured while testing the propeller at low pressure and cold temperatures. Test results point out that thrust and electric power are strongly affected by air density. A comparison between the experimental data and the results of the Blade Element Theory is carried out to assess the theory capability to estimate thrust in unconventional environments. The overlap between experimental data and theory computation is appropriate despite geometrical uncertainties and corroborate the need of a reliable aerodynamic database. Propeller performance data under unconventional atmospheres will be leveraged to improve UAS design, propulsion system modelling as well as provide guidelines to certify operations in extreme environments.

It seems that consciousness is not an analog uni-dimensional line, but multi-dimensional

Jonathan Birch, Alexandra K. Schnell, Nicola S. Clayton, Dimensions of Animal Consciousness. Trends in Cognitive Sciences, Volume 24, Issue 10, 2020, Pages 789-801 DOI: 10.1016/j.tics.2020.07.007.

How does consciousness vary across the animal kingdom? Are some animals ‘more conscious’ than others? This article presents a multidimensional framework for understanding interspecies variation in states of consciousness. The framework distinguishes five key dimensions of variation: perceptual richness, evaluative richness, integration at a time, integration across time, and self-consciousness. For each dimension, existing experiments that bear on it are reviewed and future experiments are suggested. By assessing a given species against each dimension, we can construct a consciousness profile for that species. On this framework, there is no single scale along which species can be ranked as more or less conscious. Rather, each species has its own distinctive consciousness profile.

It seems that our brain predicts semantic features of sensory stimuli to come

Friedemann Pulvermüller, Luigi Grisoni, Semantic Prediction in Brain and Mind. Trends in Cognitive Sciences, Volume 24, Issue 10, 2020, Pages 781-784 DOI: 10.1016/j.tics.2020.07.002.

We highlight a novel brain correlate of prediction, the prediction potential (or PP), a slow negative-going potential shift preceding visual, acoustic, and spoken or written verbal stimuli that can be predicted from their context. The cortical sources underlying the prediction potential reflect perceptual and semantic features of anticipated stimuli before these appear.

Combination of analytical models with NN learning for predicting action effects

Kloss A, Schaal S, Bohg J. , Combining learned and analytical models for predicting action effects from sensory data . The International Journal of Robotics Research. 2022;41(8):778-797 DOI: 10.1177/0278364920954896.

One of the most basic skills a robot should possess is predicting the effect of physical interactions with objects in the environment. This enables optimal action selection to reach a certain goal state. Traditionally, dynamics are approximated by physics-based analytical models. These models rely on specific state representations that may be hard to obtain from raw sensory data, especially if no knowledge of the object shape is assumed. More recently, we have seen learning approaches that can predict the effect of complex physical interactions directly from sensory input. It is, however, an open question how far these models generalize beyond their training data. In this work, we investigate the advantages and limitations of neural-network-based learning approaches for predicting the effects of actions based on sensory input and show how analytical and learned models can be combined to leverage the best of both worlds. As physical interaction task, we use planar pushing, for which there exists a well-known analytical model and a large real-world dataset. We propose the use of a convolutional neural network to convert raw depth images or organized point clouds into a suitable representation for the analytical model and compare this approach with using neural networks for both, perception and prediction. A systematic evaluation of the proposed approach on a very large real-world dataset shows two main advantages of the hybrid architecture. Compared with a pure neural network, it significantly (i) reduces required training data and (ii) improves generalization to novel physical interaction.

“Early exit” deep neural networks (i.e., CNN that provide outputs at intermediate points)

Scardapane, S., Scarpiniti, M., Baccarelli, E. et al. , Why Should We Add Early Exits to Neural Networks? . Cogn Comput 12, 954–966 (2020) DOI: 10.1007/s12559-020-09734-4.

Deep neural networks are generally designed as a stack of differentiable layers, in which a prediction is obtained only after running the full stack. Recently, some contributions have proposed techniques to endow the networks with early exits, allowing to obtain predictions at intermediate points of the stack. These multi-output networks have a number of advantages, including (i) significant reductions of the inference time, (ii) reduced tendency to overfitting and vanishing gradients, and (iii) capability of being distributed over multi-tier computation platforms. In addition, they connect to the wider themes of biological plausibility and layered cognitive reasoning. In this paper, we provide a comprehensive introduction to this family of neural networks, by describing in a unified fashion the way these architectures can be designed, trained, and actually deployed in time-constrained scenarios. We also describe in-depth their application scenarios in 5G and Fog computing environments, as long as some of the open research questions connected to them.

A so-called universal approach for modelling and controlling robots

Tarokh, M., A unified kinematics modeling, optimization and control of universal robots: from serial and parallel manipulators to walking, rolling and hybrid robots, . Auton Robot 44, 1233–1248 (2020) DOI: 10.1007/s10514-020-09929-6.

The paper develops a unified kinematics modeling, optimization and control that is applicable to a wide range of autonomous and non-autonomous robots. These include hybrid robots that combine two or more modes of operations, such as combination of walking and rolling, or rolling and manipulation, as well as parallel robots in various configurations. The equations of motion are derived in compact forms that embed an optimization criterion. These equations are used to obtain various useful forms of the robot kinematics such as recursive, body and limb-end kinematic forms. Using the modeling, actuation and control equations are derived that ensure traversing a desired path while maintaining balanced operations and tip-over avoidance. Various simulation results are provided for a hybrid rolling-walking robot, which demonstrate the capabilities and effectiveness of the developed methodologies.

Improving the realism of a simulator through deep learning

Allevato, A.D., Schaertl Short, E., Pryor, M. et al. , Iterative residual tuning for system identification and sim-to-real robot learning, . Auton Robot 44, 1167–1182 (2020) DOI: 10.1007/s10514-020-09925-w.

Robots are increasingly learning complex skills in simulation, increasing the need for realistic simulation environments. Existing techniques for approximating real-world physics with a simulation require extensive observation data and/or thousands of simulation samples. This paper presents iterative residual tuning (IRT), a deep learning system identification technique that modifies a simulator’s parameters to better match reality using minimal real-world observations. IRT learns to estimate the parameter difference between two parameterized models, allowing repeated iterations to converge on the true parameters similarly to gradient descent. In this paper, we develop and analyze IRT in depth, including its similarities and differences with gradient descent. Our IRT implementation, TuneNet, is pre-trained via supervised learning over an auto-generated simulated dataset. We show that TuneNet can perform rapid, efficient system identification even when the true parameter values lie well outside those in the network’s training data, and can also learn real-world parameter values from visual data. We apply TuneNet to a sim-to-real task transfer experiment, allowing a robot to perform a dynamic manipulation task with a new object after a single observation.