Tag Archives: Exploration Vs. Exploitation

Active exploration strategy for RL in robots, and approximation of value function by Gaussian processes

Jen Jen Chung, Nicholas R.J. Lawrance, Salah Sukkarieh (2015), Learning to soar: Resource-constrained exploration in reinforcement learning, The International Journal of Robotics Research vol. 34, pp. 158-172. DOI: 10.1177/0278364914553683

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resource-limited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(\u03bb), uses a Gaussian process regression model to estimate the value function in a reinforcement learning framework. The Gaussian process also provides a variance on these estimates that is used to measure the contribution of future observations to the Gaussian process value function model in terms of information gain. To avoid myopic exploration we developed a resource-weighted objective function that combines an estimate of the future information gain using an action rollout with the estimated value function to generate directed explorative action sequences. A number of modifications and computational speed-ups to the algorithm are presented along with a standard GP-SARSA(\u03bb) implementation with Formula -greedy exploration to compare the respective learning performances. The results show that under this objective function, the learning agent is able to continue exploring for better state-action trajectories when platform energy is high and follow conservative energy-gaining trajectories when platform energy is low.

On search as a consequence of the exploration-exploitation trade-off, and as a core element in human cognition

Thomas T. Hills, Peter M. Todd, David Lazer, A. David Redish, Iain D. Couzin, the Cognitive Search Research Group, Exploration versus exploitation in space, mind, and society, Trends in Cognitive Sciences, Volume 19, Issue 1, January 2015, Pages 46-54, ISSN 1364-6613, DOI: 10.1016/j.tics.2014.10.004.

Search is a ubiquitous property of life. Although diverse domains have worked on search problems largely in isolation, recent trends across disciplines indicate that the formal properties of these problems share similar structures and, often, similar solutions. Moreover, internal search (e.g., memory search) shows similar characteristics to external search (e.g., spatial foraging), including shared neural mechanisms consistent with a common evolutionary origin across species. Search problems and their solutions also scale from individuals to societies, underlying and constraining problem solving, memory, information search, and scientific and cultural innovation. In summary, search represents a core feature of cognition, with a vast influence on its evolution and processes across contexts and requiring input from multiple domains to understand its implications and scope.

A good summary and classification of state-of-the-art motion planning algorithms and proposal of a new one that improve the expected computational cost

Rickert, M.; Sieverling, A.; Brock, O., Balancing Exploration and Exploitation in Sampling-Based Motion Planning, Robotics, IEEE Transactions on , vol.30, no.6, pp.1305,1317, Dec. 2014. DOI: 10.1109/TRO.2014.2340191

We present the exploring/exploiting tree (EET) algorithm for motion planning. The EET planner deliberately trades probabilistic completeness for computational efficiency. This tradeoff enables the EET planner to outperform state-of-the-art sampling-based planners by up to three orders of magnitude. We show that these considerable speedups apply for a variety of challenging real-world motion planning problems. The performance improvements are achieved by leveraging work space information to continuously adjust the sampling behavior of the planner. When the available information captures the planning problem’s inherent structure, the planner’s sampler becomes increasingly exploitative. When the available information is less accurate, the planner automatically compensates by increasing local configuration space exploration. We show that active balancing of exploration and exploitation based on workspace information can be a key ingredient to enabling highly efficient motion planning in practical scenarios.