Tag Archives: Continuous Mdps

Quantizing a continuous POMDP into a finite MDP to preserve optimality

Naci Saldi; Serdar Yüksel; Tamás Linder, Asymptotic Optimality of Finite Model Approximations for Partially Observed Markov Decision Processes With Discounted Cost, IEEE Transactions on Automatic Control ( Volume: 65, Issue: 1, Jan. 2020), DOI: 10.1109/TAC.2019.2907172.

We consider finite model approximations of discrete-time partially observed Markov decision processes (POMDPs) under the discounted cost criterion. After converting the original partially observed stochastic control problem to a fully observed one on the belief space, the finite models are obtained through the uniform quantization of the state and action spaces of the belief space Markov decision process (MDP). Under mild assumptions on the components of the original model, it is established that the policies obtained from these finite models are nearly optimal for the belief space MDP, and so, for the original partially observed problem. The assumptions essentially require that the belief space MDP satisfies a mild weak continuity condition. We provide an example and introduce explicit approximation procedures for the quantization of the set of probability measures on the state space of POMDP (i.e., belief space).

A universal approximator for the value function in continuous-state VI

William B. Haskell; Rahul Jain; Hiteshi Sharma; Pengqian Yu, TA Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs, IEEE Transactions on Automatic Control ( Volume: 65, Issue: 1, Jan. 2020), DOI: 10.1109/TAC.2019.2907414.

We propose universal randomized function approximation-based empirical value learning (EVL) algorithms for Markov decision processes. The “empirical” nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a nonparametric method for function approximation using a parametric function space and a reproducing kernel Hilbert space respectively are then combined with EVL. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is performed using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and computational tractability of this approach.

Incremental (hierarchical) search for the optimal policy on markov decision processes

Vu Anh Huynh, Sertac Karaman, and Emilio Frazzoli, An incremental sampling-based algorithm for stochastic optimal control, The International Journal of Robotics Research April 2016 35: 305-333, DOI: 10.1177/0278364915616866.

In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control problems. Using the Markov chain approximation method and recent advances in sampling-based algorithms for deterministic path planning, we propose a novel algorithm called the incremental Markov Decision Process to incrementally compute control policies that approximate arbitrarily well an optimal policy in terms of the expected cost. The main idea behind the algorithm is to generate a sequence of finite discretizations of the original problem through random sampling of the state space. At each iteration, the discretized problem is a Markov Decision Process that serves as an incrementally refined model of the original problem. We show that with probability one, (i) the sequence of the optimal value functions for each of the discretized problems converges uniformly to the optimal value function of the original stochastic optimal control problem, and (ii) the original optimal value function can be computed efficiently in an incremental manner using asynchronous value iterations. Thus, the proposed algorithm provides an anytime approach to the computation of optimal control policies of the continuous problem. The effectiveness of the proposed approach is demonstrated on motion planning and control problems in cluttered environments in the presence of process noise.