Tag Archives: Active Exploration

Interesting study about how to quantify the uncertainty in SLAM and the preservation of its monotonic growth, which is needed to good decision making in active SLAM

M. L. Rodríguez-Arévalo, J. Neira and J. A. Castellanos, On the Importance of Uncertainty Representation in Active SLAM, IEEE Transactions on Robotics, vol. 34, no. 3, pp. 829-834 DOI: 10.1109/TRO.2018.2808902.

The purpose of this work is to highlight the paramount importance of representing and quantifying uncertainty to correctly report the associated confidence of the robot’s location estimate at each time step along its trajectory and therefore decide the correct course of action in an active SLAM mission. We analyze the monotonicity property of different decision-making criteria, both in 2-D and 3-D, with respect to the representation of uncertainty and of the orientation of the robot’s pose. Monotonicity, the property that uncertainty increases as the robot moves, is essential for adequate decision making. We analytically show that, by using differential representations to propagate spatial uncertainties, monotonicity is preserved for all optimality criteria, A-opt, D-opt, and E-opt, and for Shannon’s entropy. We also show that monotonicity does not hold for any criteria in absolute representations using Roll-Pitch-Yaw and Euler angles. Finally, using unit quaternions in absolute representations, the only criteria that preserve monotonicity are D-opt and Shannon’s entropy.

Using two different environment representations: a detailed one for SLAM, a coarse one for selecting actions for active perception

Nelson, E., Corah, M. & Michael, N., Environment model adaptation for mobile robot exploration,Auton Robot (2018) 42: 257, DOI: 10.1007/s10514-017-9669-2.

In this work, we propose a methodology to adapt a mobile robot’s environment model during exploration as a means of decreasing the computational complexity associated with information metric evaluation and consequently increasing the speed at which the system is able to plan actions and travel through an unknown region given finite computational resources. Recent advances in exploration compute control actions by optimizing information-theoretic metrics on the robot’s map. These metrics are generally computationally expensive to evaluate, limiting the speed at which a robot is able to explore. To reduce computational cost, we propose keeping two representations of the environment: one full resolution representation for planning and collision checking, and another with a coarse resolution for rapidly evaluating the informativeness of planned actions. To generate the coarse representation, we employ the Principal of Relevant Information from rate distortion theory to compress a robot’s occupancy grid map. We then propose a method for selecting a coarse representation that sacrifices a minimal amount of information about expected future sensor measurements using the Information Bottleneck Method. We outline an adaptive strategy that changes the robot’s environment representation in response to its surroundings to maximize the computational efficiency of exploration. On computationally constrained systems, this reduction in complexity enables planning over longer predictive horizons, leading to faster navigation. We simulate and experimentally evaluate mutual information based exploration through cluttered indoor environments with exploration rates that adapt based on environment complexity leading to an order-of-magnitude increase in the maximum rate of exploration in contrast to non-adaptive techniques given the same finite computational resources.

Several strategies for exploring unknown environments based on graphs extracted from Voronoi diagrams

E. G. Tsardoulias, A. Iliakopoulou, A. Kargakos, L. Petrou, Cost-Based Target Selection Techniques Towards Full Space Exploration and Coverage for USAR applications in a Priori Unknown Environments, J Intell Robot Syst (2017) 87:313–340, DOI: 10.1007/s10846-016-0434-0.

Full coverage and exploration of an environment is essential in robot rescue operations where victim identification is required. Three methods of target selection towards full exploration and coverage of an unknown space oriented for Urban Search and Rescue (USAR) applications have been developed. These are the Selection of the closest topological node, the Selection of the minimum cost topological node and the Selection of the minimum cost sub-graph. All methods employ a topological graph extracted from the Generalized Voronoi Diagram (GVD), in order to select the next best target during exploration. The first method utilizes a distance metric for determining the next best target whereas the Selection of the minimum cost topological node method assigns four different weights on the graph’s nodes, based on certain environmental attributes. The Selection of the minimum cost sub-graph uses a similar technique, but instead of single nodes, sets of graph nodes are examined. In addition, a modification of A* algorithm for biased path creation towards uncovered areas, aiming at a faster spatial coverage, is introduced. The proposed methods’ performance is verified by experiments conducted in two heterogeneous simulated environments. Finally, the results are compared with two common exploration methods.

A new approach to solve POMDP-like problems through gradient descent and optimal control

Vadim Indelman, Luca Carlone, Frank Dellaert, Planning in the continuous domain: A generalized belief space approach for autonomous navigation in unknown environments, The International Journal of Robotics Research, vol. 34 no. 7, pp. 849-882, DOI: 10.1177/0278364914561102.

We investigate the problem of planning under uncertainty, with application to mobile robotics. We propose a probabilistic framework in which the robot bases its decisions on the generalized belief, which is a probabilistic description of its own state and of external variables of interest. The approach naturally leads to a dual-layer architecture: an inner estimation layer, which performs inference to predict the outcome of possible decisions; and an outer decisional layer which is in charge of deciding the best action to undertake. Decision making is entrusted to a model predictive control (MPC) scheme. The formulation is valid for general cost functions and does not discretize the state or control space, enabling planning in continuous domain. Moreover, it allows to relax the assumption of maximum likelihood observations: predicted measurements are treated as random variables, and binary random variables are used to model the event that a measurement is actually taken by the robot. We successfully apply our approach to the problem of uncertainty-constrained exploration, in which the robot has to perform tasks in an unknown environment, while maintaining localization uncertainty within given bounds. We present an extensive numerical analysis of the proposed approach and compare it against related work. In practice, our planning approach produces smooth and natural trajectories and is able to impose soft upper bounds on the uncertainty. Finally, we exploit the results of this analysis to identify current limitations and show that the proposed framework can accommodate several desirable extensions.

Active exploration strategy for RL in robots, and approximation of value function by Gaussian processes

Jen Jen Chung, Nicholas R.J. Lawrance, Salah Sukkarieh (2015), Learning to soar: Resource-constrained exploration in reinforcement learning, The International Journal of Robotics Research vol. 34, pp. 158-172. DOI: 10.1177/0278364914553683

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resource-limited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(\u03bb), uses a Gaussian process regression model to estimate the value function in a reinforcement learning framework. The Gaussian process also provides a variance on these estimates that is used to measure the contribution of future observations to the Gaussian process value function model in terms of information gain. To avoid myopic exploration we developed a resource-weighted objective function that combines an estimate of the future information gain using an action rollout with the estimated value function to generate directed explorative action sequences. A number of modifications and computational speed-ups to the algorithm are presented along with a standard GP-SARSA(\u03bb) implementation with Formula -greedy exploration to compare the respective learning performances. The results show that under this objective function, the learning agent is able to continue exploring for better state-action trajectories when platform energy is high and follow conservative energy-gaining trajectories when platform energy is low.