Doing a more intelligent exploration in RL based on measuring uncertainty through prediction

Xiaoshu Zhou, Fei Zhu, Peiyao Zhao, Within the scope of prediction: Shaping intrinsic rewards via evaluating uncertainty, Expert Systems with Applications, Volume 206, 2022 DOI: 10.1016/j.eswa.2022.117775.

The agent of reinforcement learning based approaches needs to explore to learn more about the environment to seek optimal policy. However, simply increasing the frequency of stochastic exploration sometimes fails to work or even causes the agent to fall into traps. To solve the problem, it is essential to improve the quality of exploration. An approach, referred to as the scope of prediction based on uncertainty exploration (SPE), is proposed, taking advantage of the uncertainty mechanism and considering the stochasticity of prospecting. As by uncertainty mechanism, the unexpected states make more curiosity, the model derives higher uncertainty by projecting future scenarios to compare with the actual future to explore the world. The SPE method utilizes a prediction network to predict subsequent observations and calculates the mean squared difference value of the real observations and the following observations to measure uncertainty, encouraging the agent to explore unknown regions more effectively. Moreover, to reduce the noise interference caused by uncertainty, a reward-penalty model is developed to discriminate the noise by current observations and action prediction for future rewards to improve the interference ability against noise so that the agent can escape from the noisy region. Experiment results showed that deep reinforcement learning approaches equipped with SPE demonstrated significant improvements in simulated environments.

Comments are closed.

Post Navigation