Author Archives: Juan-antonio Fernández-madrigal

Improving the simulation-to-real transfer of learning robotic skills by learning smaller skills and how to connect them in reality

Julian RC, Heiden E, He Z, et al., Scaling simulation-to-real transfer by learning a latent space of robot skills, . The International Journal of Robotics Research. 2020;39(10-11):1259-1278 DOI: 10.1177/0278364920944474.

We present a strategy for simulation-to-real transfer, which builds on recent advances in robot skill decomposition. Rather than focusing on minimizing the simulation–reality gap, we propose a method for increasing the sample efficiency and robustness of existing simulation-to-real approaches which exploits hierarchy and online adaptation. Instead of learning a unique policy for each desired robotic task, we learn a diverse set of skills and their variations, and embed those skill variations in a continuously parameterized space. We then interpolate, search, and plan in this space to find a transferable policy which solves more complex, high-level tasks by combining low-level skills and their variations. In this work, we first characterize the behavior of this learned skill space, by experimenting with several techniques for composing pre-learned latent skills. We then discuss an algorithm which allows our method to perform long-horizon tasks never seen in simulation, by intelligently sequencing short-horizon latent skills. Our algorithm adapts to unseen tasks online by repeatedly choosing new skills from the latent space, using live sensor data and simulation to predict which latent skill will perform best next in the real world. Importantly, our method learns to control a real robot in joint-space to achieve these high-level tasks with little or no on-robot time, despite the fact that the low-level policies may not be perfectly transferable from simulation to real, and that the low-level skills were not trained on any examples of high-level tasks. In addition to our results indicating a lower sample complexity for families of tasks, we believe that our method provides a promising template for combining learning-based methods with proven classical robotics algorithms such as model-predictive control.

The problems of the initial state in filtering and its effects in the estimation

He Kong, Mao Shan, Daobilige Su, Yongliang Qiao, Abdullah Al-Azzawi, Salah Sukkarieh, Filtering for systems subject to unknown inputs without a priori initial information, . Automatica, Volume 120, 2020 DOI: 10.1016/j.automatica.2020.109122.

The last few decades have witnessed much development in filtering of systems with Gaussian noises and arbitrary unknown inputs. Nonetheless, there are still some important design questions that warrant thorough discussions. Especially, the existing literature has shown that for unbiased and minimum variance estimation of the state and the unknown input, the initial guess of the state has to be unbiased. This clearly raises the question of whether and under what conditions one can design an unbiased and minimum variance filter, without making such a stringent assumption. The above-mentioned question will be investigated systematically in this paper, i.e., design of the filter is sought to be independent of a priori information about the initial conditions. In particular, for both cases with and without direct feedthrough, we establish necessary and sufficient conditions for unbiased and minimum variance estimation of the state/unknown input, independently of a priori initial conditions, respectively. When the former conditions do not hold, we carry out a thorough analysis of all possible scenarios. For each scenario, we present detailed discussions regarding whether and what can be achieved in terms of unbiased estimation, independently of a priori initial conditions. Extensions to the case with time-delays, conceptually like Kalman smoothing where future measurements are allowed in estimation, will also be presented, amongst others.

Shunyi Zhao, Biao Huang, Trial-and-error or avoiding a guess? Initialization of the Kalman filter, . Automatica, Volume 121, 2020 DOI: 10.1016/j.automatica.2020.109184.

As a recursive state estimation algorithm, the Kalman filter (KF) assumes initial state distribution is known a priori, while in practice the initial distribution is commonly treated as design parameters. In this paper, we will answer three questions concerning initialization: (1) At each time step, how does the KF respond to measurements, control signals, and more importantly, initial states? (2) What is the price (in terms of accuracy) one has to pay if inaccurate initial states are used? and (3) Can we find a better strategy rather than through guessing to improve the performance of KF in the initial estimation phase when the initial condition is unknown? To these ends, the classical recursive KF is first transformed into an equivalent but batch form, from which the responses of the KF to measurements, control signal, and initial state can be clearly separated and observed. Based on this, we isolate the initial distribution by dividing the original state into two parts and reconstructing a new state-space model. An initialization algorithm is then proposed by employing the Bayesian inference technique to estimate all the unknown variables simultaneously. By analyzing its performance, an improved version is further developed. Two simulation examples demonstrate that the proposed initialization approaches can be considered as competitive alternatives of various existing initialization methods when initial condition is unknown.

A new theory: we are curious about tasks that increase our ability to solve as many future tasks as possible

Franziska Brändle, Charley M. Wu, Eric Schulz, What Are We Curious about?, . Trends in Cognitive Sciences, Volume 24, Issue 9, 2020 DOI: 10.1016/j.tics.2020.05.010.

(no abstract).

Predicting optimistically seems to lead to better response of the agent to achieve the best goals

Zekun Sun, Chaz Firestone, Optimism and Pessimism in the Predictive Brain, . Trends in Cognitive Sciences, Volume 24, Issue 9, 2020 DOI: 10.1016/j.tics.2020.06.001.

(no abstract).

Combination of RL with human provided models for navigation

Amarildo Likmeta, Alberto Maria Metelli, Andrea Tirinzoni, Riccardo Giol, Marcello Restelli, Danilo Romano, Combining reinforcement learning with rule-based controllers for transparent and general decision-making in autonomous driving, . Robotics and Autonomous Systems, Volume 131, 2020 DOI: 10.1016/j.robot.2020.103568.

The design of high-level decision-making systems is a topical problem in the field of autonomous driving. In this paper, we combine traditional rule-based strategies and reinforcement learning (RL) with the goal of achieving transparency and robustness. On the one hand, the use of handcrafted rule-based controllers allows for transparency, i.e., it is always possible to determine why a given decision was made, but they struggle to scale to complex driving scenarios, in which several objectives need to be considered. On the other hand, black-box RL approaches enable us to deal with more complex scenarios, but they are usually hardly interpretable. In this paper, we combine the best properties of these two worlds by designing parametric rule-based controllers, in which interpretable rules can be provided by domain experts and their parameters are learned via RL. After illustrating how to apply parameter-based RL methods (PGPE) to this setting, we present extensive numerical simulations in the highway and in two urban scenarios: intersection and roundabout. For each scenario, we show the formalization as an RL problem and we discuss the results of our approach in comparison with handcrafted rule-based controllers and black-box RL techniques.

Fast and more exact triangulation method for robot localization using range measurements

Pınar Oğuz-Ekim, Lambiotte R., Lefebvre E., TDOA based localization and its application to the initialization of LiDAR based autonomous robots, . Robotics and Autonomous Systems, Volume 131, 2020, DOI: 10.1016/j.robot.2020.103590.

This work considers the problem of locating a single robot given a set of squared noisy range difference measurements to a set of points (anchors) whose positions are known. In the sequel, localization problem is solved in the Least-Squares (LS) sense by writing the robot position in polar/spherical coordinates. This representation transforms the original nonconvex/multimodal cost function into the quotient of two quadratic forms, whose constrained maximization is more tractable than the original problem. Simulation results indicate that the proposed method has similar accuracy to state-of-the-art optimization-based localization algorithms in its class, and the simple algorithmic structure and computational efficiency makes it appealing for applications with strong computational constraints. Additionally, location information is used to find the initial orientation of the robot with respect to the previously obtained map in scan matching. Thus, the crucial problem of the autonomous initialization and localization in robotics is solved.

A fast method to cluster networks that include both randomness and structure, with a nice summary of existing clustering algorithms

Blondel V.D., Guillaume J.-L., Lambiotte R., Lefebvre E., Fast unfolding of communities in large networks, . Stat. Mech. Theory Exp., 2008 (10) (2008), Article P10008, DOI: 10.1088/1742-5468/2008/10/P10008.

We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

Towards the emergence of obstacle avoidance through collisions

Qian F, Koditschek DE., An obstacle disturbance selection framework: emergent robot steady states under repeated collisions, The International Journal of Robotics Research. 2020;39(13):1549-1566, DOI: 10.1177/0278364920935514.

Natural environments are often filled with obstacles and disturbances. Traditional navigation and planning approaches normally depend on finding a traversable “free space” for robots to avoid unexpected contact or collision. We hypothesize that with a better understanding of the robot–obstacle interactions, these collisions and disturbances can be exploited as opportunities to improve robot locomotion in complex environments. In this article, we propose a novel obstacle disturbance selection (ODS) framework with the aim of allowing robots to actively select disturbances to achieve environment-aided locomotion. Using an empirically characterized relationship between leg–obstacle contact position and robot trajectory deviation, we simplify the representation of the obstacle-filled physical environment to a horizontal-plane disturbance force field. We then treat each robot leg as a “disturbance force selector” for prediction of obstacle-modulated robot dynamics. Combining the two representations provides analytical insights into the effects of gaits on legged traversal in cluttered environments. We illustrate the predictive power of the ODS framework by studying the horizontal-plane dynamics of a quadrupedal robot traversing an array of evenly-spaced cylindrical obstacles with both bounding and trotting gaits. Experiments corroborate numerical simulations that reveal the emergence of a stable equilibrium orientation in the face of repeated obstacle disturbances. The ODS reduction yields closed-form analytical predictions of the equilibrium position for different robot body aspect ratios, gait patterns, and obstacle spacings. We conclude with speculative remarks bearing on the prospects for novel ODS-based gait control schemes for shaping robot navigation in perturbation-rich environments.

A new contribution along the DESPOT line focused on hybrid CPU+GPU platforms

Cai P, Luo Y, Hsu D, Lee WS., HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty, The International Journal of Robotics Research. 2021;40(2-3):558-573, DOI: 10.1177/0278364920937074.

Robust planning under uncertainty is critical for robots in uncertain, dynamic environments, but incurs high computational cost. State-of-the-art online search algorithms, such as DESPOT, have vastly improved the computational efficiency of planning under uncertainty and made it a valuable tool for robotics in practice. This work takes one step further by leveraging both CPU and GPU parallelization in order to achieve real-time online planning performance for complex tasks with large state, action, and observation spaces. Specifically, Hybrid Parallel DESPOT (HyP-DESPOT) is a massively parallel online planning algorithm that integrates CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT tree search by simultaneously traversing multiple independent paths using multi-core CPUs; it performs parallel Monte Carlo simulations at the leaf nodes of the search tree using GPUs. HyP-DESPOT provably converges in finite time under moderate conditions and guarantees near-optimality of the solution. Experimental results show that HyP-DESPOT speeds up online planning by up to a factor of several hundred in several challenging robotic tasks in simulation, compared with the original DESPOT algorithm. It also exhibits real-time performance on a robot vehicle navigating among many pedestrians.

Abstraction of controllers

Stanley W. Smith, Murat Arcak, Majid Zamani, Approximate abstractions of control systems with an application to aggregation, Automatica, 119 (2020) DOI: 10.1016/j.automatica.2020.109065.

Previous approaches to constructing abstractions for control systems rely on geometric conditions or, in the case of an interconnected control system, a condition on the interconnection topology. Since these conditions are not always satisfiable, we relax the restrictions on the choice of abstractions, instead opting to select ones which nearly satisfy such conditions via optimization-based approaches. To quantify the resulting effect on the error between the abstraction and concrete control system, we introduce the notions of practical simulation functions and practical storage functions. We show that our approach facilitates the procedure of aggregation, where one creates an abstraction by partitioning agents into aggregate areas. We demonstrate the results on an application where we regulate the temperature in three separate zones of a building.