Tag Archives: Pomdps

A novel RL setting for non-Markovian systems

Ronen I. Brafman, Giuseppe De Giacomo, Regular decision processes, Artificial Intelligence, Volume 331, 2024 DOI: 10.1016/j.artint.2024.104113.

We introduce and study Regular Decision Processes (RDPs), a new, compact model for domains with non-Markovian dynamics and rewards, in which the dependence on the past is regular, in the language theoretic sense. RDPs are an intermediate model between MDPs and POMDPs. They generalize k-order MDPs and can be viewed as a POMDP in which the hidden state is a regular function of the entire history. In factored RDPs, transition and reward functions are specified using formulas in linear temporal logics over finite traces, or using regular expressions. This allows specifying complex dependence on the past using intuitive and compact formulas, and building models of partially observable domains without specifying an underlying state space.

POMDPs focused on obtaining policies that can be understood well just through the observation of the robot actions

Miguel Faria, Francisco S. Melo, Ana Paiva, “Guess what I’m doing”: Extending legibility to sequential decision tasks, Artificial Intelligence, Volume 330, 2024 DOI: 10.1016/j.artint.2024.104107.

In this paper we investigate the notion of legibility in sequential decision tasks under uncertainty. Previous works that extend legibility to scenarios beyond robot motion either focus on deterministic settings or are computationally too expensive. Our proposed approach, dubbed PoLMDP, is able to handle uncertainty while remaining computationally tractable. We establish the advantages of our approach against state-of-the-art approaches in several scenarios of varying complexity. We also showcase the use of our legible policies as demonstrations in machine teaching scenarios, establishing their superiority in teaching new behaviours against the commonly used demonstrations based on the optimal policy. Finally, we assess the legibility of our computed policies through a user study, where people are asked to infer the goal of a mobile robot following a legible policy by observing its actions.

Integrating symbolic (common sense) reasoning and probabilistic planning (POMDPs) in robots

Shiqi Zhang, Piyush Khandelwal, Peter Stone, iCORPP: Interleaved commonsense reasoning and probabilistic planning on robots, Robotics and Autonomous Systems, Volume 174, 2024 DOI: 10.1016/j.robot.2023.104613.

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present an algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to “understand the world” and planning to “achieve the goal” respectively. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, and dialog management. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies.

Active Inference and Behaviour Trees as alternatives to POMDPs and the like in the perception and action of robots

C. Pezzato, C. H. Corbato, S. Bonhof and M. Wisse, Active Inference and Behavior Trees for Reactive Action Planning and Execution in Robotics, IEEE Transactions on Robotics, vol. 39, no. 2, pp. 1050-1069, April 2023 DOI: 10.1109/TRO.2022.3226144.

In this article, we propose a hybrid combination of active inference and behavior trees (BTs) for reactive action planning and execution in dynamic environments, showing how robotic tasks can be formulated as a free-energy minimization problem. The proposed approach allows handling partially observable initial states and improves the robustness of classical BTs against unexpected contingencies while at the same time reducing the number of nodes in a tree. In this work, we specify the nominal behavior offline, through BTs. However, in contrast to previous approaches, we introduce a new type of leaf node to specify the desired state to be achieved rather than an action to execute. The decision of which action to execute to reach the desired state is performed online through active inference. This results in continual online planning and hierarchical deliberation. By doing so, an agent can follow a predefined offline plan while still keeping the ability to locally adapt and take autonomous decisions at runtime, respecting safety constraints. We provide proof of convergence and robustness analysis, and we validate our method in two different mobile manipulators performing similar tasks, both in a simulated and real retail environment. The results showed improved runtime adaptability with a fraction of the hand-coded nodes compared to classical BTs.

Survey on POMDPs for robotics

M. Lauri, D. Hsu and J. Pajarinen, Partially Observable Markov Decision Processes in Robotics: A Survey, IEEE Transactions on Robotics, vol. 39, no. 1, pp. 21-40, Feb. 2023 DOI: 10.1109/TRO.2022.3200138.

Noisy sensing, imperfect control, and environment changes are defining characteristics of many real-world robot tasks. The partially observable Markov decision process (POMDP) provides a principled mathematical framework for modeling and solving robot decision and control tasks under uncertainty. Over the last decade, it has seen many successful applications, spanning localization and navigation, search and tracking, autonomous driving, multirobot systems, manipulation, and human\u2013robot interaction. This survey aims to bridge the gap between the development of POMDP models and algorithms at one end and application to diverse robot decision tasks at the other. It analyzes the characteristics of these tasks and connects them with the mathematical and algorithmic properties of the POMDP framework for effective modeling and solution. For practitioners, the survey provides some of the key task characteristics in deciding when and how to apply POMDPs to robot tasks successfully. For POMDP algorithm designers, the survey provides new insights into the unique challenges of applying POMDPs to robot systems and points to promising new directions for further research.

Using results from belief-based planning for Bayesian inference in robotics

Farhi, E.I., Indelman, V., Bayesian incremental inference update by re-using calculations from belief space planning: a new paradigm, Auton Robot 46, 783\u2013816 (2022). DOI: 10.1007/s10514-022-10045-w.

Inference and decision making under uncertainty are key processes in every autonomous system and numerous robotic problems. In recent years, the similarities between inference and decision making triggered much work, from developing unified computational frameworks to pondering about the duality between the two. In spite of these efforts, inference and control, as well as inference and belief space planning (BSP) are still treated as two separate processes. In this paper we propose a paradigm shift, a novel approach which deviates from conventional Bayesian inference and utilizes the similarities between inference and BSP. We make the key observation that inference can be efficiently updated using predictions made during the decision making stage, even in light of inconsistent data association between the two. We developed a two staged process that implements our novel approach and updates inference using calculations from the precursory planning phase. Using autonomous navigation in an unknown environment along with iSAM2 efficient methodologies as a test case, we benchmarked our novel approach against standard Bayesian inference, both with synthetic and real-world data (KITTI dataset). Results indicate that not only our approach improves running time by at least a factor of two while providing the same estimation accuracy, but it also alleviates the computational burden of state dimensionality and loop closures.

POMDP Planner that uses multiple levels of approximation to the system dynamics to reduce the number and complexity of forward simulations

Hoerger M, Kurniawati H, Elfes A. , Multilevel Monte Carlo for solving POMDPs on-line, The International Journal of Robotics Research. 2023;42(4-5):196-213 DOI: 10.1177/02783649221093658.

Planning under partial observability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for systems with complex dynamics remains challenging. Most on-line solvers rely on a large number of forward simulations and standard Monte Carlo methods to compute the expected outcomes of actions the robot can perform. For systems with complex dynamics, for example, those with non-linear dynamics that admit no closed-form solution, even a single forward simulation can be prohibitively expensive. Of course, this issue exacerbates for problems with long planning horizons. This paper aims to alleviate the above difficulty. To this end, we propose a new on-line POMDP solver, called Multilevel POMDP Planner\u2009(MLPP), that combines the commonly known Monte-Carlo-Tree-Search with the concept of Multilevel Monte Carlo to speed up our capability in generating approximately optimal solutions for POMDPs with complex dynamics. Experiments on four different problems involving torque control, navigation and grasping indicate that MLPP\u2009substantially outperforms state-of-the-art POMDP solvers.

Continuous POMDPs through belief state sparsification, applied to active SLAM

Elimelech K, Indelman V. Simplified decision making in the belief space using belief sparsification. The International Journal of Robotics Research. 2022;41(5):470-496 DOI: 10.1177/02783649221076381.

In this work, we introduce a new and efficient solution approach for the problem of decision making under uncertainty, which can be formulated as decision making in a belief space, over a possibly high-dimensional state space. Typically, to solve a decision problem, one should identify the optimal action from a set of candidates, according to some objective. We claim that one can often generate and solve an analogous yet simplified decision problem, which can be solved more efficiently. A wise simplification method can lead to the same action selection, or one for which the maximal loss in optimality can be guaranteed. Furthermore, such simplification is separated from the state inference and does not compromise its accuracy, as the selected action would finally be applied on the original state. First, we present the concept for general decision problems and provide a theoretical framework for a coherent formulation of the approach. We then practically apply these ideas to decision problems in the belief space, which can be simplified by considering a sparse approximation of their initial belief. The scalable belief sparsification algorithm we provide is able to yield solutions which are guaranteed to be consistent with the original problem. We demonstrate the benefits of the approach in the solution of a realistic active-SLAM problem and manage to significantly reduce computation time, with no loss in the quality of solution. This work is both fundamental and practical and holds numerous possible extensions.

POMDPs in robotics: QMDP-Net as a counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown

Collins N, Kurniawati H. Locally connected interrelated network: A forward propagation primitive, The International Journal of Robotics Research. 2023;42(6):371-384 DOI: 10.1177/02783649221093092.

End-to-end learning for planning is a promising approach for finding good robot strategies in situations where the state transition, observation, and reward functions are initially unknown. Many neural network architectures for this approach have shown positive results. Across these networks, seemingly small components have been used repeatedly in different architectures, which means improving the efficiency of these components has great potential to improve the overall performance of the network. This paper aims to improve one such component: The forward propagation module. In particular, we propose Locally Connected Interrelated Network (LCI-Net) \u2013 a novel type of locally connected layer with unshared but interrelated weights \u2013 to improve the efficiency of learning stochastic transition models for planning and propagating information via the learned transition models. LCI-Net is a small differentiable neural network module that can be plugged into various existing architectures. For evaluation purposes, we apply LCI-Net to VIN and QMDP-Net. VIN is an end-to-end neural network for solving Markov Decision Processes (MDPs) whose transition and reward functions are initially unknown, while QMDP-Net is its counterpart for the Partially Observable Markov Decision Process (POMDP) whose transition, observation, and reward functions are initially unknown. Simulation tests on benchmark problems involving 2D and 3D navigation and grasping indicate promising results: Changing only the forward propagation module alone with LCI-Net improves VIN\u2019s and QMDP-Net generalisation capability by more than 3� and 10�, respectively.

A MPC-based (non-POMDP) approach to sequential decision planning with partial observability in continuous time and space

Nishimura H, Schwager M., SACBP: Belief space planning for continuous-time dynamical systems via stochastic sequential action control, . The International Journal of Robotics Research. 2021;40(10-11):1167-1195 DOI: 10.1177/02783649211037697.

We propose a novel belief space planning technique for continuous dynamics by viewing the belief system as a hybrid dynamical system with time-driven switching. Our approach is based on the perturbation theory of differential equations and extends sequential action control to stochastic dynamics. The resulting algorithm, which we name SACBP, does not require discretization of spaces or time and synthesizes control signals in near real-time. SACBP is an anytime algorithm that can handle general parametric Bayesian filters under certain assumptions. We demonstrate the effectiveness of our approach in an active sensing scenario and a model-based Bayesian reinforcement learning problem. In these challenging problems, we show that the algorithm significantly outperforms other existing solution techniques including approximate dynamic programming and local trajectory optimization.