Reinforcement learning to recover legged robots from damages

Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret, Reset-free Trial-and-Error Learning for Robot Damage Recovery, Robotics and Autonomous Systems, Volume 100, 2018, Pages 236-250, DOI: 10.1016/j.robot.2017.11.010.

The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called “Reset-free Trial-and-Error” (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.

SLAM based on intervals

Mohamed Mustafa, Alexandru Stancu, Nicolas Delanoue, Eduard Codres, Guaranteed SLAM—An interval approach, Robotics and Autonomous Systems, Volume 100, 2018, Pages 160-170, DOI: 10.1016/j.robot.2017.11.009.

This paper proposes a new approach, interval Simultaneous Localization and Mapping (i-SLAM), which addresses the robotic mapping problem in the context of interval methods, where the robot sensor noise is assumed bounded. With no prior knowledge about the noise distribution or its probability density function, we derive and present necessary conditions to guarantee the map convergence even in the presence of nonlinear observation and motion models. These conditions may require the presence of some anchoring landmarks with known locations. The performance of i-SLAM is compared with the probabilistic counterparts in terms of accuracy and efficiency.

Solving MDPs with discounted rewards for minimizing variance instead of expected (discounted) reward

Li Xia, Mean–variance optimization of discrete time discounted Markov decision processes, Automatica, Volume 88, 2018, Pages 76-82, DOI: 10.1016/j.automatica.2017.11.012.

In this paper, we study a mean–variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance. Different from most of works in the literature which require the mean performance already achieve optimum, we can let the discounted performance equal any constant. The difficulty of this problem is caused by the quadratic form of the variance function which makes the variance minimization problem not a standard MDP. By proving the decomposable structure of the feasible policy space, we transform this constrained variance minimization problem to an equivalent unconstrained MDP under a new discounted criterion and a new reward function. The difference of the variances of Markov chains under any two feasible policies is quantified by a difference formula. Based on the variance difference formula, a policy iteration algorithm is developed to find the optimal policy. We also prove the optimality of deterministic policy over the randomized policy generated in the mean-constrained policy space. Numerical experiments demonstrate the effectiveness of our approach.

A framework for the performance analysis of collaborative network clock synchronization

Y. Xiong, N. Wu, Y. Shen and M. Z. Win, Cooperative Network Synchronization: Asymptotic Analysis, IEEE Transactions on Signal Processing, vol. 66, no. 3, pp. 757-772, DOI: 10.1109/TSP.2017.2759098.

Accurate clock synchronization is required for collaborative operations among nodes across wireless networks. Compared with traditional layer-by-layer methods, cooperative network synchronization techniques lead to significant improvement in performance, efficiency, and robustness. This paper develops a framework for the performance analysis of cooperative network synchronization. We introduce the concepts of cooperative dilution intensity (CDI) and relative CDI to characterize the interaction between agents, which can be interpreted as properties of a random walk over the network. Our approach enables us to derive closed-form asymptotic expressions of performance limits, relating them to the quality of observations as well as the network topology.

Simultaneous localization and clock synchronization (apparently only offsets are estimated) in wireless networks

Y. Liu, Y. Shen, D. Guo and M. Z. Win, Network Localization and Synchronization Using Full-Duplex Radios, IEEE Transactions on Signal Processing, vol. 66, no. 3, pp. 714-728, DOI: 10.1109/TSP.2017.2770090.

Both localization and synchronization of mobile nodes are important for wireless networks. In this paper, we propose new methods for network localization and synchronization (NLS) using full-duplex radios through only two frames of transmission. Specifically, all nodes simultaneously transmit their signature signals in the first frame, while receiving others’ signals via full-duplex radios. In the second frame, nodes transmit either scrambled versions of their received signals in the first frame or a digital packet of the channel parameter estimates of the received signals. We develop distributed algorithms to estimate the arrival times of different components in the received signals. These arrival times are then used to determine the local network geometry and clock offsets. The Cramér-Rao lower bounds for internode distances and clock offsets are derived, and the former can be translated into error bounds of the node positions. Compared with conventional frequency division duplex or time-division duplex, we demonstrate the high efficiency of NLS using full-duplex radios, revealing its potential beyond data communications in future wireless networks.

Resampling point clouds to reduce their size without compromising their utility for the tasks that use them

S. Chen, D. Tian, C. Feng, A. Vetro and J. Kovačević, Fast Resampling of Three-Dimensional Point Clouds via Graphs, IEEE Transactions on Signal Processing, vol. 66, no. 3, pp. 666-681, DOI: 10.1109/TSP.2017.2771730.

To reduce the cost of storing, processing, and visualizing a large-scale point cloud, we propose a randomized resampling strategy that selects a representative subset of points while preserving application-dependent features. The strategy is based on graphs, which can represent underlying surfaces and lend themselves well to efficient computation. We use a general feature-extraction operator to represent application-dependent features and propose a general reconstruction error to evaluate the quality of resampling; by minimizing the error, we obtain a general form of optimal resampling distribution. The proposed resampling distribution is guaranteed to be shift-, rotation- and scale-invariant in the three-dimensional space. We then specify the feature-extraction operator to be a graph filter and study specific resampling strategies based on all-pass, low-pass, high-pass graph filtering and graph filter banks. We validate the proposed methods on three applications: Large-scale visualization, accurate registration, and robust shape modeling demonstrating the effectiveness and efficiency of the proposed resampling methods.

A new approach to SLAM based on KF but without linearization

Feng Tan, Winfried Lohmiller, and Jean-Jacques Slotine, Analytical SLAM without linearization, The International Journal of Robotics Research
Vol 36, Issue 13-14, pp. 1554 – 1578, DOI: 10.1177/0278364917710541.

This paper solves the classical problem of simultaneous localization and mapping (SLAM) in a fashion that avoids linearized approximations altogether. Based on the creation of virtual synthetic measurements, the algorithm uses a linear time-varying Kalman observer, bypassing errors and approximations brought by the linearization process in traditional extended Kalman filtering SLAM. Convergence rates of the algorithm are established using contraction analysis. Different combinations of sensor information can be exploited, such as bearing measurements, range measurements, optical flow, or time-to-contact. SLAM-DUNK, a more advanced version of the algorithm in global coordinates, exploits the conditional independence property of the SLAM problem, decoupling the covariance matrices between different landmarks and reducing computational complexity to O(n). As illustrated in simulations, the proposed algorithm can solve SLAM problems in both 2D and 3D scenarios with guaranteed convergence rates in a full nonlinear context.

A new method for estimating inertial sensor signals

M. Ghobadi, P. Singla and E. T. Esfahani, Robust Attitude Estimation from Uncertain Observations of Inertial Sensors Using Covariance Inflated Multiplicative Extended Kalman Filter, IEEE Transactions on Instrumentation and Measurement, vol. 67, no. 1, pp. 209-217, DOI: 10.1109/TIM.2017.2761230.

This paper presents an attitude estimation method from uncertain observations of inertial sensors, which is highly robust against different uncertainties. The proposed method of covariance inflated multiplicative extended Kalman filter (CI-MEKF) takes the advantage of non-singularity of covariance in MEKF as well as a novel covariance inflation (CI) approach to fuse inconsistent information. The proposed CI approach compensates the undesired effect of magnetic distortion and body acceleration (as inherent biases of magnetometer and accelerometer sensors data, respectively) on the estimated attitude. Moreover, the CI-MEKF can accurately estimate the gyro bias. A number of simulation scenarios are designed to compare the performance of the proposed method with the state of the art in attitude estimation. The results show the proposed method outperforms the state of the art in terms of estimation accuracy and robustness. Moreover, the proposed CI-MEKF method is shown to be significantly robust against different uncertainties, such as large body acceleration, magnetic distortion, and errors, in the initial condition of the attitude.

Deep reinforcement learning applied to learn both attention and classification in a task of vehicle classification

D. Zhao, Y. Chen and L. Lv, Deep Reinforcement Learning With Visual Attention for Vehicle Classification, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 4, pp. 356-367, DOI: 10.1109/TCDS.2016.2614675.

Automatic vehicle classification is crucial to intelligent transportation system, especially for vehicle-tracking by police. Due to the complex lighting and image capture conditions, image-based vehicle classification in real-world environments is still a challenging task and the performance is far from being satisfactory. However, owing to the mechanism of visual attention, the human vision system shows remarkable capability compared with the computer vision system, especially in distinguishing nuances processing. Inspired by this mechanism, we propose a convolutional neural network (CNN) model of visual attention for image classification. A visual attention-based image processing module is used to highlight one part of an image and weaken the others, generating a focused image. Then the focused image is input into the CNN to be classified. According to the classification probability distribution, we compute the information entropy to guide a reinforcement learning agent to achieve a better policy for image classification to select the key parts of an image. Systematic experiments on a surveillance-nature dataset which contains images captured by surveillance cameras in the front view, demonstrate that the proposed model is more competitive than the large-scale CNN in vehicle classification tasks.

Using deep learning for extracting features from range data

Y. Liao, S. Kodagoda, Y. Wang, L. Shi and Y. Liu, Place Classification With a Graph Regularized Deep Neural Network, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 4, pp. 304-315, DOI: 10.1109/TCDS.2016.2586183.

Place classification is a fundamental ability that a robot should possess to carry out effective human-robot interactions. In recent years, there is a high exploitation of artificial intelligence algorithms in robotics applications. Inspired by the recent successes of deep learning methods, we propose an end-to-end learning approach for the place classification problem. With deep architectures, this methodology automatically discovers features and contributes in general to higher classification accuracies. The pipeline of our approach is composed of three parts. First, we construct multiple layers of laser range data to represent the environment information in different levels of granularity. Second, each layer of data are fed into a deep neural network for classification, where a graph regularization is imposed to the deep architecture for keeping local consistency between adjacent samples. Finally, the predicted labels obtained from all layers are fused based on confidence trees to maximize the overall confidence. Experimental results validate the effectiveness of our end-to-end place classification framework in which both the multilayer structure and the graph regularization promote the classification performance. Furthermore, results show that the features automatically learned from the raw input range data can achieve competitive results to the features constructed based on statistical and geometrical information.