Tag Archives: Monte Carlo Pomdps

Monte Carlo Tree Search (MTCS) with hybrid discrete-continuous beliefs, applied to robotics

M. Barenboim, M. Shienman and V. Indelman, Monte Carlo Planning in Hybrid Belief POMDPs, IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 4410-4417, Aug. 2023 DOI: 10.1109/LRA.2023.3282773.

Real-world problems often require reasoning about hybrid beliefs, over both discrete and continuous random variables. Yet, such a setting has hardly been investigated in the context of planning. Moreover, existing online partially observable Markov decision processes (POMDPs) solvers do not support hybrid beliefs directly. In particular, these solvers do not address the added computational burden due to an increasing number of hypotheses with the planning horizon, which can grow exponentially. As part of this work, we present a novel algorithm, Hybrid Belief Monte Carlo Planning (HB-MCP) that utilizes the Monte Carlo Tree Search (MCTS) algorithm to solve a POMDP while maintaining a hybrid belief. We illustrate how the upper confidence bound (UCB) exploration bonus can be leveraged to guide the growth of hypotheses trees alongside the belief trees. We then evaluate our approach in highly aliased simulated environments where unresolved data association leads to multi-modal belief hypotheses.

POMDP Planner that uses multiple levels of approximation to the system dynamics to reduce the number and complexity of forward simulations

Hoerger M, Kurniawati H, Elfes A. , Multilevel Monte Carlo for solving POMDPs on-line, The International Journal of Robotics Research. 2023;42(4-5):196-213 DOI: 10.1177/02783649221093658.

Planning under partial observability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for systems with complex dynamics remains challenging. Most on-line solvers rely on a large number of forward simulations and standard Monte Carlo methods to compute the expected outcomes of actions the robot can perform. For systems with complex dynamics, for example, those with non-linear dynamics that admit no closed-form solution, even a single forward simulation can be prohibitively expensive. Of course, this issue exacerbates for problems with long planning horizons. This paper aims to alleviate the above difficulty. To this end, we propose a new on-line POMDP solver, called Multilevel POMDP Planner\u2009(MLPP), that combines the commonly known Monte-Carlo-Tree-Search with the concept of Multilevel Monte Carlo to speed up our capability in generating approximately optimal solutions for POMDPs with complex dynamics. Experiments on four different problems involving torque control, navigation and grasping indicate that MLPP\u2009substantially outperforms state-of-the-art POMDP solvers.

Mixing Monte-Carlo Tree Search with Q-learning for robot learning

Francesco Riccio, Roberto Capobianco, Daniele Nardi, LoOP: Iterative learning for optimistic planning on robots, . Robotics and Autonomous Systems, Volume 36, 2021 DOI: 10.1016/j.robot.2020.103693.

Efficient robotic behaviors require robustness and adaptation to dynamic changes of the environment, whose characteristics rapidly vary during robot operation. To generate effective robot action policies, planning and learning techniques have shown the most promising results. However, if considered individually, they present different limitations. Planning techniques lack generalization among similar states and require experts to define behavioral routines at different levels of abstraction. Conversely, learning methods usually require a considerable number of training samples and iterations of the algorithm. To overcome these issues, and to efficiently generate robot behaviors, we introduce LoOP, an iterative learning algorithm for optimistic planning that combines state-of-the-art planning and learning techniques to generate action policies. The main contribution of LoOP is the combination of Monte-Carlo Search Planning and Q-learning, which enables focused exploration during policy refinement in different robotic applications. We demonstrate the robustness and flexibility of LoOP in various domains and multiple robotic platforms, by validating the proposed approach with an extensive experimental evaluation.

A new contribution along the DESPOT line focused on hybrid CPU+GPU platforms

Cai P, Luo Y, Hsu D, Lee WS., HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty, The International Journal of Robotics Research. 2021;40(2-3):558-573, DOI: 10.1177/0278364920937074.

Robust planning under uncertainty is critical for robots in uncertain, dynamic environments, but incurs high computational cost. State-of-the-art online search algorithms, such as DESPOT, have vastly improved the computational efficiency of planning under uncertainty and made it a valuable tool for robotics in practice. This work takes one step further by leveraging both CPU and GPU parallelization in order to achieve real-time online planning performance for complex tasks with large state, action, and observation spaces. Specifically, Hybrid Parallel DESPOT (HyP-DESPOT) is a massively parallel online planning algorithm that integrates CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT tree search by simultaneously traversing multiple independent paths using multi-core CPUs; it performs parallel Monte Carlo simulations at the leaf nodes of the search tree using GPUs. HyP-DESPOT provably converges in finite time under moderate conditions and guarantees near-optimality of the solution. Experimental results show that HyP-DESPOT speeds up online planning by up to a factor of several hundred in several challenging robotic tasks in simulation, compared with the original DESPOT algorithm. It also exhibits real-time performance on a robot vehicle navigating among many pedestrians.

Improving on-line Monte Carlo POMDP (DESTOP in particular) in discrete spaces through the use of importance sampling, and a nice summary of the problem and of current on-line POMDP approaches

Luo, Y., Bai, H., Hsu, D., & Lee, W. S., Importance sampling for online planning under uncertainty, The International Journal of Robotics Research, 38(2–3), 162–181, 2019 DOI: 10.1177/0278364918780322.

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.