Tag Archives: Pomdps

Including the models into the state of a POMDP for learning them (using POMCPs in a robotic application)

Akinobu Hayashi, Dirk Ruiken, Tadaaki Hasegawa, Christian Goerick, Reasoning about uncertain parameters and agent behaviors through encoded experiences and belief planning, Artificial Intelligence, Volume 280, 2020 DOI: 10.1016/j.artint.2019.103228.

Robots are expected to handle increasingly complex tasks. Such tasks often include interaction with objects or collaboration with other agents. One of the key challenges for reasoning in such situations is the lack of accurate models that hinders the effectiveness of planners. We present a system for online model adaptation that continuously validates and improves models while solving tasks with a belief space planner. We employ the well known online belief planner POMCP. Particles are used to represent hypotheses about the current state and about models of the world. They are sufficient to configure a simulator to provide transition and observation models. We propose an enhanced particle reinvigoration process that leverages prior experiences encoded in a recurrent neural network (RNN). The network is trained through interaction with a large variety of object and agent parametrizations. The RNN is combined with a mixture density network (MDN) to process the current history of observations in order to propose suitable particles and models parametrizations. The proposed method also ensures that newly generated particles are consistent with the current history. These enhancements to the particle reinvigoration process help alleviate problems arising from poor sampling quality in large state spaces and enable handling of dynamics with discontinuities. The proposed approach can be applied to a variety of domains depending on what uncertainty the decision maker needs to reason about. We evaluate the approach with experiments in several domains and compare against other state-of-the-art methods. Experiments are done in a collaborative multi-agent and a single agent object manipulation domain. The experiments are performed both in simulation and on a real robot. The framework handles reasoning with uncertain agent behaviors and with unknown object and environment parametrizations well. The results show good performance and indicate that the proposed approach can improve existing state-of-the-art methods.

Quantizing a continuous POMDP into a finite MDP to preserve optimality

Naci Saldi; Serdar Yüksel; Tamás Linder, Asymptotic Optimality of Finite Model Approximations for Partially Observed Markov Decision Processes With Discounted Cost, IEEE Transactions on Automatic Control ( Volume: 65, Issue: 1, Jan. 2020), DOI: 10.1109/TAC.2019.2907172.

We consider finite model approximations of discrete-time partially observed Markov decision processes (POMDPs) under the discounted cost criterion. After converting the original partially observed stochastic control problem to a fully observed one on the belief space, the finite models are obtained through the uniform quantization of the state and action spaces of the belief space Markov decision process (MDP). Under mild assumptions on the components of the original model, it is established that the policies obtained from these finite models are nearly optimal for the belief space MDP, and so, for the original partially observed problem. The assumptions essentially require that the belief space MDP satisfies a mild weak continuity condition. We provide an example and introduce explicit approximation procedures for the quantization of the set of probability measures on the state space of POMDP (i.e., belief space).

Improving on-line Monte Carlo POMDP (DESTOP in particular) in discrete spaces through the use of importance sampling, and a nice summary of the problem and of current on-line POMDP approaches

Luo, Y., Bai, H., Hsu, D., & Lee, W. S., Importance sampling for online planning under uncertainty, The International Journal of Robotics Research, 38(2–3), 162–181, 2019 DOI: 10.1177/0278364918780322.

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.

On the need to replanning in POMDPs when applied to real systems, due to imperfect sensing and computational cost of online planning

Ali-akbar Agha-mohammadi et al., SLAP: Simultaneous Localization and Planning Under Uncertainty via Dynamic Replanning in Belief Space, IEEE Transactions on Robotics, vol. 34, no. 5, DOI: 10.1109/TRO.2018.2838556.

Simultaneous localization and planning (SLAP) is a crucial ability for an autonomous robot operating under uncertainty. In its most general form, SLAP induces a continuous partially observable Markov decision process (POMDP), which needs to be repeatedly solved online. This paper addresses this problem and proposes a dynamic replanning scheme in belief space. The underlying POMDP, which is continuous in state, action, and observation space, is approximated offline via sampling-based methods, but operates in a replanning loop online to admit local improvements to the coarse offline policy. This construct enables the proposed method to combat changing environments and large localization errors, even when the change alters the homotopy class of the optimal trajectory. It further outperforms the state-of-the-art Feedback-based Information RoadMap (FIRM) method by eliminating unnecessary stabilization steps. Applying belief space planning to physical systems brings with it a plethora of challenges. A key focus of this paper is to implement the proposed planner on a physical robot and show the SLAP solution performance under uncertainty, in changing environments and in the presence of large disturbances, such as a kidnapped robot situation.

Shared autonomy where the target is predicted with POMDPs to cope with uncertain predictions

Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, and J. Andrew Bagnell Shared autonomy via hindsight optimization for teleoperation and teaming, The International Journal of Robotics Research Vol 37, Issue 7, pp. 717 – 742 DOI: 10.1177/0278364918776060.

In shared autonomy, a user and autonomous system work together to achieve shared goals. To collaborate effectively, the autonomous system must know the user’s goal. As such, most prior works follow a predict-then-act model, first predicting the user’s goal with high confidence, then assisting given that goal. Unfortunately, confidently predicting the user’s goal may not be possible until they have nearly achieved it, causing predict-then-act methods to provide little assistance. However, the system can often provide useful assistance even when confidence for any single goal is low (e.g. move towards multiple goals). In this work, we formalize this insight by modeling shared autonomy as a partially observable Markov decision process (POMDP), providing assistance that minimizes the expected cost-to-go with an unknown goal. As solving this POMDP optimally is intractable, we use hindsight optimization to approximate. We apply our framework to both shared-control teleoperation and human–robot teaming. Compared with predict-then-act methods, our method achieves goals faster, requires less user input, decreases user idling time, and results in fewer user–robot collisions.

POMDPs aware of the data association problem

Shashank Pathak, Antony Thomas, and Vadim Indelman, A unified framework for data association aware robust belief space planning and perception, The International Journal of Robotics Research Vol 37, Issue 2-3, pp. 287 – 315, DOI: 10.1177/0278364918759606.

We develop a belief space planning approach that advances the state of the art by incorporating reasoning about data association within planning, while considering additional sources of uncertainty. Existing belief space planning approaches typically assume that data association is given and perfect, an assumption that can be harder to justify during operation in the presence of localization uncertainty, or in ambiguous and perceptually aliased environments. By contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about data association within belief evolution owing to candidate actions, and as such can better accommodate these challenging real-world scenarios. In particular, we show that, owing to perceptual aliasing, a posterior belief can become a mixture of probability distribution functions and design cost functions, which measure the expected level of ambiguity and posterior uncertainty given candidate action. Furthermore, we also investigate more challenging situations, such as when prior belief is multimodal and when data association aware planning is performed over several look-ahead steps. Our framework models the belief as a Gaussian mixture model. Another unique aspect of this approach is that the number of components of this Gaussian mixture model can increase as well as decrease, thereby reflecting reality more accurately. Using these and standard costs (e.g. control penalty, distance to goal) within the objective function yields a general framework that reliably represents action impact and, in particular, is capable of active disambiguation. Our approach is thus applicable to both robust perception in a passive setting with data given a priori and in an active setting, such as in autonomous navigation in perceptually aliased environments. We demonstrate key aspects of DA-BSP in a theoretical example, in a Gazebo-based realistic simulation, and also on the real robotic platform using a Pioneer robot in an office environment.

A novel approach to use POMDP in practical active perception, where rewards are needed to penalize uncertainty and therefore reomve the piecewise-linear and convex property of the value function

Satsangi, Y., Whiteson, S., Oliehoek, F.A. et al., Exploiting submodular value functions for scaling up active perception, Auton Robot (2018) 42: 209, DOI: 10.1007/s10514-017-9666-5.

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze ρ POMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a ρ POMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given ρ POMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and ρ POMDP). We propose greedy point-based value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multi-camera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.

Improving efficiency of decision with POMDPs in high-dimension state spaces

Dmitry Kopitkov and Vadim Indelman, No belief propagation required: Belief space planning in high-dimensional state spaces via factor graphs, the matrix determinant lemma, and re-use of calculation, The International Journal of Robotics Research, Vol 36, Issue 10, pp. 1088 – 1130, DOI: 10.1177/0278364917721629.

We develop a computationally efficient approach for evaluating the information-theoretic term within belief space planning (BSP), where during belief propagation the state vector can be constant or augmented. We consider both unfocused and focused problem settings, whereas uncertainty reduction of the entire system or only of chosen variables is of interest, respectively. State-of-the-art approaches typically propagate the belief state, for each candidate action, through calculation of the posterior information (or covariance) matrix and subsequently compute its determinant (required for entropy). In contrast, our approach reduces runtime complexity by avoiding these calculations. We formulate the problem in terms of factor graphs and show that belief propagation is not needed, requiring instead a one-time calculation that depends on (the increasing with time) state dimensionality, and per-candidate calculations that are independent of the latter. To that end, we develop an augmented version of the matrix determinant lemma, and show that computations can be re-used when evaluating impact of different candidate actions. These two key ingredients and the factor graph representation of the problem result in a computationally efficient (augmented) BSP approach that accounts for different sources of uncertainty and can be used with various sensing modalities. We examine the unfocused and focused instances of our approach, and compare it with the state of the art, in simulation and using real-world data, considering problems such as autonomous navigation in unknown environments, measurement selection and sensor deployment. We show that our approach significantly reduces running time without any compromise in performance.

An application of POMDPs to robot surveillance

S. Witwicki et al., Autonomous Surveillance Robots: A Decision-Making Framework for Networked Muiltiagent Systems, IEEE Robotics & Automation Magazine, vol. 24, no. 3, pp. 52-64, DOI: 10.1109/MRA.2017.2662222.

This article proposes an architecture for an intelligent surveillance system, where the aim is to mitigate the burden on humans in conventional surveillance systems by incorporating intelligent interfaces, computer vision, and autonomous mobile robots. Central to the intelligent surveillance system is the application of research into planning and decision making in this novel context. In this article, we describe the robot surveillance decision problem and explain how the integration of components in our system supports fully automated decision making. Several concrete scenarios deployed in real surveillance environments exemplify both the flexibility of our system to experiment with different representations and algorithms and the portability of our system into a variety of problem contexts. Moreover, these scenarios demonstrate how planning enables robots to effectively balance surveillance objectives, autonomously performing the job of human patrols and responders.

POMDPs with multicriteria in the cost to optimize – a hierarchical approach

Seyedshams Feyzabadi, Stefano Carpin, Planning using hierarchical constrained Markov decision processes, Autonomous Robots, Volume 41, Issue 8, pp 1589–1607, DOI: 10.1007/s10514-017-9630-4.

Constrained Markov decision processes offer a principled method to determine policies for sequential stochastic decision problems where multiple costs are concurrently considered. Although they could be very valuable in numerous robotic applications, to date their use has been quite limited. Among the reasons for their limited adoption is their computational complexity, since policy computation requires the solution of constrained linear programs with an extremely large number of variables. To overcome this limitation, we propose a hierarchical method to solve large problem instances. States are clustered into macro states and the parameters defining the dynamic behavior and the costs of the clustered model are determined using a Monte Carlo approach. We show that the algorithm we propose to create clustered states maintains valuable properties of the original model, like the existence of a solution for the problem. Our algorithm is validated in various planning problems in simulation and on a mobile robot platform, and we experimentally show that the clustered approach significantly outperforms the non-hierarchical solution while experiencing only moderate losses in terms of objective functions.