Tag Archives: Actor-critic

Application of Deep RL to person following by a robot, reducing the training effort of the network by reusing simple state situations in many artificially generated states

Pang, L., Zhang, Y., Coleman, S. et al., Efficient Hybrid-Supervised Deep Reinforcement Learning for Person Following Robot, J Intell Robot Syst 97, 299–312 (2020), DOI: 10.1007/s10846-019-01030-0.

Traditional person following robots usually need hand-crafted features and a well-designed controller to follow the assigned person. Normally it is difficult to be applied in outdoor situations due to variability and complexity of the environment. In this paper, we propose an approach in which an agent is trained by hybrid-supervised deep reinforcement learning (DRL) to perform a person following task in end-to-end manner. The approach enables the robot to learn features autonomously from monocular images and to enhance performance via robot-environment interaction. Experiments show that the proposed approach is adaptive to complex situations with significant illumination variation, object occlusion, target disappearance, pose change, and pedestrian interference. In order to speed up the training process to ensure easy application of DRL to real-world robotic follower controls, we apply an integration method through which the agent receives prior knowledge from a supervised learning (SL) policy network and reinforces its performance with a value-based or policy-based (including actor-critic method) DRL model. We also utilize an efficient data collection approach for supervised learning in the context of person following. Experimental results not only verify the robustness of the proposed DRL-based person following robot system, but also indicate how easily the robot can learn from mistakes and improve performance.

A good intro about actor-critic and decision making without model on MDPs

J. Wang and I. C. Paschalidis, “An Actor-Critic Algorithm With Second-Order Actor and Critic,” in IEEE Transactions on Automatic Control, vol. 62, no. 6, pp. 2689-2703, June 2017.DOI: 10.1109/TAC.2016.2616384.

Actor-critic algorithms solve dynamic decision making problems by optimizing a performance metric of interest over a user-specified parametric class of policies. They employ a combination of an actor, making policy improvement steps, and a critic, computing policy improvement directions. Many existing algorithms use a steepest ascent method to improve the policy, which is known to suffer from slow convergence for ill-conditioned problems. In this paper, we first develop an estimate of the (Hessian) matrix containing the second derivatives of the performance metric with respect to policy parameters. Using this estimate, we introduce a new second-order policy improvement method and couple it with a critic using a second-order learning method. We establish almost sure convergence of the new method to a neighborhood of a policy parameter stationary point. We compare the new algorithm with some existing algorithms in two applications and demonstrate that it leads to significantly faster convergence.