Bo Wu, Xiaobin Zhang, Hai Lin, Supervisor synthesis of POMDP via automata learning, . Automatica, Volume 129, 2021 DOI: 10.1016/j.automatica.2021.109654.
Partially observable Markov decision process (POMDP) is a comprehensive modeling framework that captures uncertainties from sensing noises, actuation errors, and environments. Traditional POMDP planning finds an optimal policy for reward maximization. However, for safety-critical applications, it is often necessary to guarantee system performance described by high-level temporal logic specifications. Hence, we are motivated to develop a supervisor synthesis framework for POMDP with respect to given formal specifications. We propose an iterative learning-based algorithm, which can learn a permissive policy in the form of a deterministic finite automaton. A human–robot collaboration case study validates the proposed algorithm.