Shared autonomy where the target is predicted with POMDPs to cope with uncertain predictions

Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell Shared autonomy via hindsight optimization for teleoperation and teaming, The International Journal of Robotics Research Vol 37, Issue 7, pp. 717 – 742 DOI: 10.1177/0278364918776060.

In shared autonomy, a user and autonomous system work together to achieve shared goals. To collaborate effectively, the autonomous system must know the user’s goal. As such, most prior works follow a predict-then-act model, first predicting the user’s goal with high confidence, then assisting given that goal. Unfortunately, confidently predicting the user’s goal may not be possible until they have nearly achieved it, causing predict-then-act methods to provide little assistance. However, the system can often provide useful assistance even when confidence for any single goal is low (e.g. move towards multiple goals). In this work, we formalize this insight by modeling shared autonomy as a partially observable Markov decision process (POMDP), providing assistance that minimizes the expected cost-to-go with an unknown goal. As solving this POMDP optimally is intractable, we use hindsight optimization to approximate. We apply our framework to both shared-control teleoperation and human–robot teaming. Compared with predict-then-act methods, our method achieves goals faster, requires less user input, decreases user idling time, and results in fewer user–robot collisions.

Relation between optimization and reinforcement learning

Megumi Miyashita, Shiro Yano, Toshiyuki Kondo Mirror descent search and its acceleration, Robotics and Autonomous Systems, Volume 106, 2018, Pages 107-116 DOI: 10.1016/j.robot.2018.04.009.

In recent years, attention has been focused on the relationship between black-box optimization problem and reinforcement learning problem. In this research, we propose the Mirror Descent Search (MDS) algorithm which is applicable both for black box optimization problems and reinforcement learning problems. Our method is based on the mirror descent method, which is a general optimization algorithm. The contribution of this research is roughly twofold. We propose two essential algorithms, called MDS and Accelerated Mirror Descent Search (AMDS), and two more approximate algorithms: Gaussian Mirror Descent Search (G-MDS) and Gaussian Accelerated Mirror Descent Search (G-AMDS). This research shows that the advanced methods developed in the context of the mirror descent research can be applied to reinforcement learning problem. We also clarify the relationship between an existing reinforcement learning algorithm and our method. With two evaluation experiments, we show our proposed algorithms converge faster than some state-of-the-art methods.

A new model of cognition

Howard, N. & Hussain, A. The Fundamental Code Unit of the Brain: Towards a New Model for Cognitive Geometry, Cogn Comput (2018) 10: 426 DOI: 10.1007/s12559-017-9538-5.

This paper discusses the problems arising from the multidisciplinary nature of cognitive research and the need to conceptually unify insights from multiple fields into the phenomena that drive cognition. Specifically, the Fundamental Code Unit (FCU) is proposed as a means to better quantify the intelligent thought process at multiple levels of analysis. From the linguistic and behavioral output, FCU produces to the chemical and physical processes within the brain that drive it. The proposed method efficiently model the most complex decision-making process performed by the brain.

Adapting inverse reinforcement learning for including the risk-aversion of the agent

Sumeet Singh, Jonathan Lacotte, Anirudha Majumdar, and Marco Pavone, Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods , The International Journal of Robotics Research First Published May 22, 2018 DOI: 10.1177/0278364918772017.

The literature on inverse reinforcement learning (IRL) typically assumes that humans take actions to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive (RS) IRL to explicitly account for a human’s risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk neutral to worst case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human’s underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with 10 human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk averse to risk neutral in a data-efficient manner. Moreover, comparisons of the RS-IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.

A new mathematical formulation of manipulator motion that simplifies dynamics and kinematics

Labbé, M. & Michaud, F., Comprehensive theory of differential kinematics and dynamics towards extensive motion optimization framework, The International Journal of Robotics Research First Published May 20, 2018 DOI: 10.1177/0278364918772893.

This paper presents a novel unified theoretical framework for differential kinematics and dynamics for the optimization of complex robot motion. By introducing an 18×18 comprehensive motion transformation matrix, the forward differential kinematics and dynamics, including velocity and acceleration, can be written in a simple chain product similar to an ordinary rotational matrix. This formulation enables the analytical computation of derivatives of various physical quantities (e.g. link velocities, link accelerations, or joint torques) with respect to joint coordinates, velocities and accelerations for a robot trajectory in an efficient manner (O(NJ), where NJ is the number of the robot’s degree of freedom), which is useful for motion optimization. Practical implementation of gradient computation is demonstrated together with simulation results of robot motion optimization to validate the effectiveness of the proposed framework.

Using short- and long-term memories in SLAM

Labbé, M. & Michaud, F., Long-term online multi-session graph-based SPLAM with memory management, Auton Robot (2018) 42: 1133. DOI: 10.1007/s10514-017-9682-5.

For long-term simultaneous planning, localization and mapping (SPLAM), a robot should be able to continuously update its map according to the dynamic changes of the environment and the new areas explored. With limited onboard computation capabilities, a robot should also be able to limit the size of the map used for online localization and mapping. This paper addresses these challenges using a memory management mechanism, which identifies locations that should remain in a Working Memory (WM) for online processing from locations that should be transferred to a Long-Term Memory (LTM). When revisiting previously mapped areas that are in LTM, the mechanism can retrieve these locations and place them back in WM for online SPLAM. The approach is tested on a robot equipped with a short-range laser rangefinder and a RGB-D camera, patrolling autonomously 10.5 km in an indoor environment over 11 sessions while having encountered 139 people.

A novel approach to avoid the minima problem in potential fields navigation

Fedele, G., D’Alfonso, L., Chiaravalloti, F. et al., Obstacles Avoidance Based on Switching Potential Functions, J Intell Robot Syst (2018) 90: 387. DOI: 10.1007/s10846-017-0687-2.

In this paper, a novel path planning and obstacles avoidance method for a mobile robot is proposed. This method makes use of a switching strategy between the attractive potential of the target and a new helicoidal potential field which allows to bypass an obstacle by driving the robot around it. The new technique aims at overcoming the local minima problems of the well-known artificial potentials method, caused by the summation of two (or more) potential fields. In fact, in the proposed approach, only a single potential is used at a time. The resulting proposed technique uses only local information and ensures high robustness, in terms of achieved performance and computational complexity, w.r.t. the number of obstacles. Numerical simulations, together with comparisons with existing methods, confirm a very robust behavior of the method, also in the case of a framework with multiple obstacles.

Faster long-term SLAM through direct use of Lie groups in filtering

Kruno Lenac, Josip Ćesić, Ivan Marković, and Ivan Petrović, Exactly sparse delayed state filter on Lie groups for long-term pose graph SLAM, The International Journal of Robotics Research Vol 37, Issue 6, pp. 585 – 610 DOI: 10.1177/0278364918767756.

In this paper we propose a simultaneous localization and mapping (SLAM) back-end solution called the exactly sparse delayed state filter on Lie groups (LG-ESDSF). We derive LG-ESDSF and demonstrate that it retains all the good characteristics of the classic Euclidean ESDSF, the main advantage being the exact sparsity of the information matrix. The key advantage of LG-ESDSF in comparison with the classic ESDSF lies in the ability to respect the state space geometry by negotiating uncertainties and employing filtering equations directly on Lie groups. We also exploit the special structure of the information matrix in order to allow long-term operation while the robot is moving repeatedly through the same environment. To prove the effectiveness of the proposed SLAM solution, we conducted extensive experiments on two different publicly available datasets, namely the KITTI and EuRoC datasets, using two front-ends: one based on the stereo camera and the other on the 3D LIDAR. We compare LG-ESDSF with the general graph optimization framework (g2o) when coupled with the same front-ends. Similarly to g2o the proposed LG-ESDSF is front-end agnostic and the comparison demonstrates that our solution can match the accuracy of g2o, while maintaining faster computation times. Furthermore, the proposed back-end coupled with the stereo camera front-end forms a complete visual SLAM solution dubbed LG-SLAM. Finally, we evaluated LG-SLAM using the online KITTI protocol and at the time of writing it achieved the second best result among the stereo odometry solutions and the best result among the tested SLAM algorithms.

On the effects of delays in the stability of a network controlled plant due to both clocks not being synchronized

K. Okano, M. Wakaiki, G. Yang and J. P. Hespanha, Stabilization of Networked Control Systems Under Clock Offsets and Quantization, IEEE Transactions on Automatic Control, vol. 63, no. 6, pp. 1708-1723 DOI: 10.1109/TAC.2017.2753938.

This paper studies the impact of clock mismatches and quantization on networked control systems. We consider a scenario where the plant’s state is measured by a sensor that communicates with the controller through a network. Variable communication delays and clock jitter do not permit a perfect synchronization between the clocks of the sensor and controller. We investigate limitations on the clock offset tolerable for stabilization of the feedback system. For a process with a scalar-valued state, we show that there exists a tight bound on the offset above which the closed-loop system cannot be stabilized with any causal controllers. For higher dimensional plants, if the plant has two distinct poles, then the effect of clock mismatches can be canceled with a finite number of measurements, and hence there is no fundamental limitation. We also consider the case where the measurements are subject to quantization in addition to clock mismatches. For first-order plants, we present necessary conditions and sufficient conditions for stabilizability, which show that a larger clock offset requires a finer quantization.

All the information about our cognitive processes that can be deduced from our mouse movements

Paul E. Stillman, Xi Shen, Melissa J. Ferguson, How Mouse-tracking Can Advance Social Cognitive Theory, Trends in Cognitive Sciences, Volume 22, Issue 6, 2018, Pages 531-543 DOI: 10.1016/j.tics.2018.03.012.

Mouse-tracking – measuring computer-mouse movements made by participants while they choose between response options – is an emerging tool that offers an accessible, data-rich, and real-time window into how people categorize and make decisions. In the present article we review recent research in social cognition that uses mouse-tracking to test models and advance theory. In particular, mouse-tracking allows examination of nuanced predictions about both the nature of conflict (e.g., its antecedents and consequences) as well as how this conflict is resolved (e.g., how decisions evolve). We demonstrate how mouse-tracking can further our theoretical understanding by highlighting research in two domains − social categorization and self-control. We conclude with future directions and a discussion of the limitations of mouse-tracking as a method.