Author Archives: Juan-antonio Fernández-madrigal

A new variant of Q-learning that alleviates its slow learning speed (with a brief review of reinforcement learning algorithms)

J.C. van Rooijen, I. Grondman, R. Babuška, Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy, Mechatronics, Volume 24, Issue 8, December 2014, Pages 966-974, ISSN 0957-4158. DOI: 10.1016/j.mechatronics.2014.05.007

Reinforcement learning (RL) is a framework that enables a controller to find an optimal control policy for a task in an unknown environment. Although RL has been successfully used to solve optimal control problems, learning is generally slow. The main causes are the inefficient use of information collected during interaction with the system and the inability to use prior knowledge on the system or the control task. In addition, the learning speed heavily depends on the learning rate parameter, which is difficult to tune.
In this paper, we present a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm. The main difference between VGBP and other frequently used algorithms, such as Sarsa, is that in VGBP the learning agent has a direct access to the reward function, rather than just the immediate reward values. Furthermore, the agent learns a process model. This enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation. We demonstrate the fast learning convergence in simulations and experiments with the underactuated pendulum swing-up task. In addition, we present experimental results for a more complex 2-DOF robotic manipulator.

Estimating states of a human teleoperator and studying their influence in performing control

Yunyi Jia, Ning Xi, Shuang Liu, Yunxia Wang, Xin Li, and Sheng Bi, Quality of teleoperator adaptive control for telerobotic operations The International Journal of Robotics Research December 2014 33: 1765-1781, first published on November 13, 2014. DOI: 10.1177/0278364914556124

Extensive studies have been conducted on telerobotic operations for decades due to their widespread applications in a variety of areas. Most studies have been focused on two major issues: stability and telepresence. Few have studied the influence of the operation status of the teleoperator on the performance of telerobotic operations. As subnormal operation status of the teleoperator may result in insufficient and even incorrect operations, the quality of teleoperator (QoT) is an important impact on the performance of the telerobotic operations in terms of the efficiency and safety even if both the stability and telepresence are guaranteed. Therefore, this paper investigates the online identification of the QoT and its application to telerobotic operations. The QoT is identified based on five QoT indicators which are generated based on the teleoperator’s brain EEG signals. A QoT adaptive control method is designed to adapt the velocity and responsivity of the robotic system to the operation status of the teleoperator such that the teleoperation efficiency and safety can be enhanced. The online QoT identification method was conducted on various teleoperators and the QoT adaptive control method was implemented on a mobile manipulator teleoperation system. The experimental results demonstrated the effectiveness and advantages of the proposed methods.

Probabilistic models of several sensors plus a method for distinguishing the different hypotheses from the posterior of a PF

V. Alvarez-Santos, A. Canedo-Rodriguez, R. Iglesias, X.M. Pardo, C.V. Regueiro, M. Fernandez-Delgado, Route learning and reproduction in a tour-guide robot, Robotics and Autonomous Systems, Volume 63, Part 2, January 2015, Pages 206-213, ISSN 0921-8890. DOI: 10.1016/j.robot.2014.07.013

Traditionally, route information is introduced in tour-guide robots by experts in robotics. In the tour-guide robot that we are developing, we allow the robot to learn new routes while following an instructor. In this paper we describe the route recording process that takes place while following a human, as well as, how those routes are later reproduced.

A key element of both route recording and reproduction is a robust multi-sensorial localization algorithm that we have designed, which is able to combine various sources of information to obtain an estimate of the robot’s pose. In this work we detail how the algorithm works, and how we use it to record routes. Moreover, we describe how our robot reproduces routes, including path planning within route points, and dynamic obstacle avoidance for safe navigation. Finally, we show through several trajectories how the robot was able to learn and reproduce different routes.