Filters with quaternions for localization

Rangaprasad Arun Srivatsan, Mengyun Xu, Nicolas Zevallos, and Howie Choset, Probabilistic pose estimation using a Bingham distribution-based linear filter, The International Journal of Robotics Research DOI: 10.1177/0278364918778353.

Pose estimation is central to several robotics applications such as registration, hand–eye calibration, and simultaneous localization and mapping (SLAM). Online pose estimation methods typically use Gaussian distributions to describe the uncertainty in the pose parameters. Such a description can be inadequate when using parameters such as unit quaternions that are not unimodally distributed. A Bingham distribution can effectively model the uncertainty in unit quaternions, as it has antipodal symmetry, and is defined on a unit hypersphere. A combination of Gaussian and Bingham distributions is used to develop a truly linear filter that accurately estimates the distribution of the pose parameters. The linear filter, however, comes at the cost of state-dependent measurement uncertainty. Using results from stochastic theory, we show that the state-dependent measurement uncertainty can be evaluated exactly. To show the broad applicability of this approach, we derive linear measurement models for applications that use position, surface-normal, and pose measurements. Experiments assert that this approach is robust to initial estimation errors as well as sensor noise. Compared with state-of-the-art methods, our approach takes fewer iterations to converge onto the correct pose estimate. The efficacy of the formulation is illustrated with a number of examples on standard datasets as well as real-world experiments.

A robot architecture for humanoids able to coordinate different cognitive processes (perception, decision-making, etc.) in a hierarchical fashion

J. Hwang and J. Tani, Seamless Integration and Coordination of Cognitive Skills in Humanoid Robots: A Deep Learning Approach, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 2, pp. 345-358 DOI: 10.1109/TCDS.2017.2714170.

This paper investigates how adequate coordination among the different cognitive processes of a humanoid robot can be developed through end-to-end learning of direct perception of visuomotor stream. We propose a deep dynamic neural network model built on a dynamic vision network, a motor generation network, and a higher-level network. The proposed model was designed to process and to integrate direct perception of dynamic visuomotor patterns in a hierarchical model characterized by different spatial and temporal constraints imposed on each level. We conducted synthetic robotic experiments in which a robot learned to read human’s intention through observing the gestures and then to generate the corresponding goal-directed actions. Results verify that the proposed model is able to learn the tutored skills and to generalize them to novel situations. The model showed synergic coordination of perception, action, and decision making, and it integrated and coordinated a set of cognitive skills including visual perception, intention reading, attention switching, working memory, action preparation, and execution in a seamless manner. Analysis reveals that coherent internal representations emerged at each level of the hierarchy. Higher-level representation reflecting actional intention developed by means of continuous integration of the lower-level visuo-proprioceptive stream.

An interesting model of Basal Ganglia that performs similarly to Q learning when applied to a robot

Y. Zeng, G. Wang and B. Xu, A Basal Ganglia Network Centric Reinforcement Learning Model and Its Application in Unmanned Aerial Vehicle, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 2, pp. 290-303 DOI: 10.1109/TCDS.2017.2649564.

Reinforcement learning brings flexibility and generality for machine learning, while most of them are mathematical optimization driven approaches, and lack of cognitive and neural evidence. In order to provide a more cognitive and neural mechanisms driven foundation and validate its applicability in complex task, we develop a basal ganglia (BG) network centric reinforcement learning model. Compared to existing work on modeling BG, this paper is unique from the following perspectives: 1) the orbitofrontal cortex (OFC) is taken into consideration. OFC is critical in decision making because of its responsibility for reward representation and is critical in controlling the learning process, while most of the BG centric models do not include OFC; 2) to compensate the inaccurate memory of numeric values, precise encoding is proposed to enable working memory system remember important values during the learning process. The method combines vector convolution and the idea of storage by digit bit and is efficient for accurate value storage; and 3) for information coding, the Hodgkin-Huxley model is used to obtain a more biological plausible description of action potential with plenty of ionic activities. To validate the effectiveness of the proposed model, we apply the model to the unmanned aerial vehicle (UAV) autonomous learning process in a 3-D environment. Experimental results show that our model is able to give the UAV the ability of free exploration in the environment and has comparable learning speed as the Q learning algorithm, while the major advances for our model is that it is with solid cognitive and neural basis.

Interesting study about how to quantify the uncertainty in SLAM and the preservation of its monotonic growth, which is needed to good decision making in active SLAM

M. L. Rodríguez-Arévalo, J. Neira and J. A. Castellanos, On the Importance of Uncertainty Representation in Active SLAM, IEEE Transactions on Robotics, vol. 34, no. 3, pp. 829-834 DOI: 10.1109/TRO.2018.2808902.

The purpose of this work is to highlight the paramount importance of representing and quantifying uncertainty to correctly report the associated confidence of the robot’s location estimate at each time step along its trajectory and therefore decide the correct course of action in an active SLAM mission. We analyze the monotonicity property of different decision-making criteria, both in 2-D and 3-D, with respect to the representation of uncertainty and of the orientation of the robot’s pose. Monotonicity, the property that uncertainty increases as the robot moves, is essential for adequate decision making. We analytically show that, by using differential representations to propagate spatial uncertainties, monotonicity is preserved for all optimality criteria, A-opt, D-opt, and E-opt, and for Shannon’s entropy. We also show that monotonicity does not hold for any criteria in absolute representations using Roll-Pitch-Yaw and Euler angles. Finally, using unit quaternions in absolute representations, the only criteria that preserve monotonicity are D-opt and Shannon’s entropy.

A new algorithm that provably converges to a global clock consensus in a network

Miloš S. Stanković, Srdjan S. Stanković, Karl Henrik Johansson, Distributed time synchronization for networks with random delays and measurement noise, Automatica, Volume 93, 2018, Pages 126-137 DOI: 10.1016/j.automatica.2018.03.054.

In this paper a new distributed asynchronous algorithm is proposed for time synchronization in networks with random communication delays, measurement noise and communication dropouts. Three different types of the drift correction algorithm are introduced, based on different kinds of local time increments. Under nonrestrictive conditions concerning network properties, it is proved that all the algorithm types provide convergence in the mean square sense and with probability one (w.p.1) of the corrected drifts of all the nodes to the same value (consensus). An estimate of the convergence rate of these algorithms is derived. For offset correction, a new algorithm is proposed containing a compensation parameter coping with the influence of random delays and special terms taking care of the influence of both linearly increasing time and drift correction. It is proved that the corrected offsets of all the nodes converge in the mean square sense and w.p.1. An efficient offset correction algorithm based on consensus on local compensation parameters is also proposed. It is shown that the overall time synchronization algorithm can also be implemented as a flooding algorithm with one reference node. It is proved that it is possible to achieve bounded error between local corrected clocks in the mean square sense and w.p.1. Simulation results provide an additional practical insight into the algorithm properties and show its advantage over the existing methods.

Evaluating the safeness of a motion plan for mobile robot navigation

Brian Axelrod, Leslie Pack Kaelbling, and Tomás Lozano-Pérez Provably safe robot navigation with obstacle uncertainty, The International Journal of Robotics Research Vol 37, Issue 7 DOI: 10.1177/0278364918778338.

As drones and autonomous cars become more widespread, it is becoming increasingly important that robots can operate safely under realistic conditions. The noisy information fed into real systems means that robots must use estimates of the environment to plan navigation. Efficiently guaranteeing that the resulting motion plans are safe under these circumstances has proved difficult. We examine how to guarantee that a trajectory or policy has at most ϵ collision probability (ϵ-safe) with only imperfect observations of the environment. We examine the implications of various mathematical formalisms of safety and arrive at a mathematical notion of safety of a long-term execution, even when conditioned on observational information. We explore the idea of shadows that generalize the notion of a confidence set to estimated shapes and present a theorem that allows us to understand the relationship between shadows and their classical statistical equivalents such as confidence and credible sets. We present efficient algorithms that use shadows to prove that trajectories or policies are safe with much tighter bounds than in previous work. Notably, the complexity of the environment does not affect our method’s ability to evaluate whether a trajectory or policy is safe. We then use these safety-checking methods to design a safe variant of the rapidly exploring random tree (RRT) planning algorithm.

Shared autonomy where the target is predicted with POMDPs to cope with uncertain predictions

Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell Shared autonomy via hindsight optimization for teleoperation and teaming, The International Journal of Robotics Research Vol 37, Issue 7, pp. 717 – 742 DOI: 10.1177/0278364918776060.

In shared autonomy, a user and autonomous system work together to achieve shared goals. To collaborate effectively, the autonomous system must know the user’s goal. As such, most prior works follow a predict-then-act model, first predicting the user’s goal with high confidence, then assisting given that goal. Unfortunately, confidently predicting the user’s goal may not be possible until they have nearly achieved it, causing predict-then-act methods to provide little assistance. However, the system can often provide useful assistance even when confidence for any single goal is low (e.g. move towards multiple goals). In this work, we formalize this insight by modeling shared autonomy as a partially observable Markov decision process (POMDP), providing assistance that minimizes the expected cost-to-go with an unknown goal. As solving this POMDP optimally is intractable, we use hindsight optimization to approximate. We apply our framework to both shared-control teleoperation and human–robot teaming. Compared with predict-then-act methods, our method achieves goals faster, requires less user input, decreases user idling time, and results in fewer user–robot collisions.

Relation between optimization and reinforcement learning

Megumi Miyashita, Shiro Yano, Toshiyuki Kondo Mirror descent search and its acceleration, Robotics and Autonomous Systems, Volume 106, 2018, Pages 107-116 DOI: 10.1016/j.robot.2018.04.009.

In recent years, attention has been focused on the relationship between black-box optimization problem and reinforcement learning problem. In this research, we propose the Mirror Descent Search (MDS) algorithm which is applicable both for black box optimization problems and reinforcement learning problems. Our method is based on the mirror descent method, which is a general optimization algorithm. The contribution of this research is roughly twofold. We propose two essential algorithms, called MDS and Accelerated Mirror Descent Search (AMDS), and two more approximate algorithms: Gaussian Mirror Descent Search (G-MDS) and Gaussian Accelerated Mirror Descent Search (G-AMDS). This research shows that the advanced methods developed in the context of the mirror descent research can be applied to reinforcement learning problem. We also clarify the relationship between an existing reinforcement learning algorithm and our method. With two evaluation experiments, we show our proposed algorithms converge faster than some state-of-the-art methods.

A new model of cognition

Howard, N. & Hussain, A. The Fundamental Code Unit of the Brain: Towards a New Model for Cognitive Geometry, Cogn Comput (2018) 10: 426 DOI: 10.1007/s12559-017-9538-5.

This paper discusses the problems arising from the multidisciplinary nature of cognitive research and the need to conceptually unify insights from multiple fields into the phenomena that drive cognition. Specifically, the Fundamental Code Unit (FCU) is proposed as a means to better quantify the intelligent thought process at multiple levels of analysis. From the linguistic and behavioral output, FCU produces to the chemical and physical processes within the brain that drive it. The proposed method efficiently model the most complex decision-making process performed by the brain.

Adapting inverse reinforcement learning for including the risk-aversion of the agent

Sumeet Singh, Jonathan Lacotte, Anirudha Majumdar, and Marco Pavone, Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods , The International Journal of Robotics Research First Published May 22, 2018 DOI: 10.1177/0278364918772017.

The literature on inverse reinforcement learning (IRL) typically assumes that humans take actions to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive (RS) IRL to explicitly account for a human’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk neutral to worst case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human’s underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with 10 human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk averse to risk neutral in a data-efficient manner. Moreover, comparisons of the RS-IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.