Category Archives: Developmental Robotics

Bayesian estimation of the model in model-based RL for robots

Senda, Kei, Hishinuma, Toru, Tani, Yurika, Approximate Bayesian reinforcement learning based on estimation of plant, Autonomous Robots 44(5), DOI: 10.1007/s10514-020-09901-4.

This study proposes an approximate parametric model-based Bayesian reinforcement learning approach for robots, based on online Bayesian estimation and online planning for an estimated model. The proposed approach is designed to learn a robotic task with a few real-world samples and to be robust against model uncertainty, within feasible computational resources. The proposed approach employs two-stage modeling, which is composed of (1) a parametric differential equation model with a few parameters based on prior knowledge such as equations of motion, and (2) a parametric model that interpolates a finite number of transition probability models for online estimation and planning. The proposed approach modifies the online Bayesian estimation to be robust against approximation errors of the parametric model to a real plant. The policy planned for the interpolating model is proven to have a form of theoretical robustness. Numerical simulation and hardware experiments of a planar peg-in-hole task demonstrate the effectiveness of the proposed approach.

Including the models into the state of a POMDP for learning them (using POMCPs in a robotic application)

Akinobu Hayashi, Dirk Ruiken, Tadaaki Hasegawa, Christian Goerick, Reasoning about uncertain parameters and agent behaviors through encoded experiences and belief planning, Artificial Intelligence, Volume 280, 2020 DOI: 10.1016/j.artint.2019.103228.

Robots are expected to handle increasingly complex tasks. Such tasks often include interaction with objects or collaboration with other agents. One of the key challenges for reasoning in such situations is the lack of accurate models that hinders the effectiveness of planners. We present a system for online model adaptation that continuously validates and improves models while solving tasks with a belief space planner. We employ the well known online belief planner POMCP. Particles are used to represent hypotheses about the current state and about models of the world. They are sufficient to configure a simulator to provide transition and observation models. We propose an enhanced particle reinvigoration process that leverages prior experiences encoded in a recurrent neural network (RNN). The network is trained through interaction with a large variety of object and agent parametrizations. The RNN is combined with a mixture density network (MDN) to process the current history of observations in order to propose suitable particles and models parametrizations. The proposed method also ensures that newly generated particles are consistent with the current history. These enhancements to the particle reinvigoration process help alleviate problems arising from poor sampling quality in large state spaces and enable handling of dynamics with discontinuities. The proposed approach can be applied to a variety of domains depending on what uncertainty the decision maker needs to reason about. We evaluate the approach with experiments in several domains and compare against other state-of-the-art methods. Experiments are done in a collaborative multi-agent and a single agent object manipulation domain. The experiments are performed both in simulation and on a real robot. The framework handles reasoning with uncertain agent behaviors and with unknown object and environment parametrizations well. The results show good performance and indicate that the proposed approach can improve existing state-of-the-art methods.

Application of Deep RL to person following by a robot, reducing the training effort of the network by reusing simple state situations in many artificially generated states

Pang, L., Zhang, Y., Coleman, S. et al., Efficient Hybrid-Supervised Deep Reinforcement Learning for Person Following Robot, J Intell Robot Syst 97, 299–312 (2020), DOI: 10.1007/s10846-019-01030-0.

Traditional person following robots usually need hand-crafted features and a well-designed controller to follow the assigned person. Normally it is difficult to be applied in outdoor situations due to variability and complexity of the environment. In this paper, we propose an approach in which an agent is trained by hybrid-supervised deep reinforcement learning (DRL) to perform a person following task in end-to-end manner. The approach enables the robot to learn features autonomously from monocular images and to enhance performance via robot-environment interaction. Experiments show that the proposed approach is adaptive to complex situations with significant illumination variation, object occlusion, target disappearance, pose change, and pedestrian interference. In order to speed up the training process to ensure easy application of DRL to real-world robotic follower controls, we apply an integration method through which the agent receives prior knowledge from a supervised learning (SL) policy network and reinforces its performance with a value-based or policy-based (including actor-critic method) DRL model. We also utilize an efficient data collection approach for supervised learning in the context of person following. Experimental results not only verify the robustness of the proposed DRL-based person following robot system, but also indicate how easily the robot can learn from mistakes and improve performance.

On the importance of dynamics and diversity in (cognitive) symbol systems

Tadahiro Taniguchi; Emre Ugur; Matej Hoffmann; Lorenzo Jamone; Takayuki Nagai; Benjamin Rosman, Symbol Emergence in Cognitive Developmental Systems: A Survey, IEEE Transactions on Cognitive and Developmental Systems ( Volume: 11, Issue: 4, Dec. 2019), DOI: 10.1109/TCDS.2018.2867772.

Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol grounding problem has been regarded as one of the central problems related to symbols. However, the symbol grounding problem was originally posed to connect symbolic AI and sensorimotor information and did not consider many interdisciplinary phenomena in human communication and dynamic symbol systems in our society, which semiotics considered. In this paper, we focus on the symbol emergence problem, addressing not only cognitive dynamics but also the dynamics of symbol systems in society, rather than the symbol grounding problem. We first introduce the notion of a symbol in semiotics from the humanities, to leave the very narrow idea of symbols in symbolic AI. Furthermore, over the years, it became more and more clear that symbol emergence has to be regarded as a multifaceted problem. Therefore, second, we review the history of the symbol emergence problem in different fields, including both biological and artificial systems, showing their mutual relations. We summarize the discussion and provide an integrative viewpoint and comprehensive overview of symbol emergence in cognitive systems. Additionally, we describe the challenges facing the creation of cognitive systems that can be part of symbol emergence systems.

Interesting related work on internal models for action prediction and on the exploration/exploitation trade-off

Simón C. Smith; J. Michael Herrmann, Evaluation of Internal Models in Autonomous Learning, IEEE Transactions on Cognitive and Developmental Systems ( Volume: 11, Issue: 4, Dec. 2019), DOI: 10.1109/TCDS.2018.2865999.

Internal models (IMs) can represent relations between sensors and actuators in natural and artificial agents. In autonomous robots, the adaptation of IMs and the adaptation of the behavior are interdependent processes which have been studied under paradigms for self-organization of behavior such as homeokinesis. We compare the effect of various types of IMs on the generation of behavior in order to evaluate model quality across different behaviors. The considered IMs differ in the degree of flexibility and expressivity related to, respectively, learning speed and structural complexity of the model. We show that the different IMs generate different error characteristics which in turn lead to variations of the self-generated behavior of the robot. Due to the tradeoff between error minimization and complexity of the explored environment, we compare the models in the sense of Pareto optimality. Among the linear and nonlinear models that we analyze, echo-state networks achieve a particularly high performance which we explain as a result of the combination of fast learning and complex internal dynamics. More generally, we provide evidence that Pareto optimization is preferable in autonomous learning as it allows that a special solution can be negotiated in any particular environment.

Mixing human advice and reward functions for improving reinforcement learning of motor skills in robots with a nice related work on interactive RL

Carlos Celemin, Guilherme Maeda, Javier Ruiz-del-Solar, Jan Peters, Jens Kober, Reinforcement learning of motor skills using Policy Search and human corrective advice, The International Journal of Robotics Research, Vol 38, Issue 14, 2019, DOI: 10.1177/0278364919871998.

Robot learning problems are limited by physical constraints, which make learning successful policies for complex motor skills on real systems unfeasible. Some reinforcement learning methods, like Policy Search, offer stable convergence toward locally optimal solutions, whereas interactive machine learning or learning-from-demonstration methods allow fast transfer of human knowledge to the agents. However, most methods require expert demonstrations. In this work, we propose the use of human corrective advice in the actions domain for learning motor trajectories. Additionally, we combine this human feedback with reward functions in a Policy Search learning scheme. The use of both sources of information speeds up the learning process, since the intuitive knowledge of the human teacher can be easily transferred to the agent, while the Policy Search method with the cost/reward function take over for supervising the process and reducing the influence of occasional wrong human corrections. This interactive approach has been validated for learning movement primitives with simulated arms with several degrees of freedom in reaching via-point movements, and also using real robots in such tasks as “writing characters” and the ball-in-a-cup game. Compared with standard reinforcement learning without human advice, the results show that the proposed method not only converges to higher rewards when learning movement primitives, but also that the learning is sped up by a factor of 4–40 times, depending on the task.

Reinforcement learning for improving autonomy of mobile robots in calibrating visual sensors

Fernando Nobre, Christoffer Heckman, Learning to calibrate: Reinforcement learning for guided calibration of visual–inertial rigs,. The International Journal of Robotics Research, 38(12–13), 1352–1374, DOI: 10.1177/0278364919844824.

We present a new approach to assisted intrinsic and extrinsic calibration with an observability-aware visual–inertial calibration system that guides the user through the calibration procedure by suggesting easy-to-perform motions that render the calibration parameters observable. This is done by identifying which subset of the parameter space is rendered observable with a rank-revealing decomposition of the Fisher information matrix, modeling calibration as a Markov decision process and using reinforcement learning to establish which discrete sequence of motions optimizes for the regression of the desired parameters. The goal is to address the assumption common to most calibration solutions: that sufficiently informative motions are provided by the operator. We do not make use of a process model and instead leverage an experience-based approach that is broadly applicable to any platform in the context of simultaneous localization and mapping. This is a step in the direction of long-term autonomy and “power-on-and-go” robotic systems, making repeatable and reliable calibration accessible to the non-expert operator.

A kind of reinforcement learning that decouples modelling from planning using Gaussian Processes for the former

Rakicevic, N. & Kormushev, P., Active learning via informed search in movement parameter space for efficient robot task learning and transfer. Auton Robot (2019) 43: 1917, DOI: 10.1007/s10514-019-09842-7.

Learning complex physical tasks via trial-and-error is still challenging for high-degree-of-freedom robots. Greatest challenges are devising a suitable objective function that defines the task, and the high sample complexity of learning the task. We propose a novel active learning framework, consisting of decoupled task model and exploration components, which does not require an objective function. The task model is specific to a task and maps the parameter space, defining a trial, to the trial outcome space. The exploration component enables efficient search in the trial-parameter space to generate the subsequent most informative trials, by simultaneously exploiting all the information gained from previous trials and reducing the task model’s overall uncertainty. We analyse the performance of our framework in a simulation environment and further validate it on a challenging bimanual-robot puck-passing task. Results show that the robot successfully acquires the necessary skills after only 100 trials without any prior information about the task or target positions. Decoupling the framework’s components also enables efficient skill transfer to new environments which is validated experimentally.

Human interaction with the RL process

Celemin, C., Ruiz-del-Solar, J. & Kober, A fast hybrid reinforcement learning framework with human corrective feedback, Auton Robot (2019) 43: 1173, DOI: 10.1007/s10514-018-9786-6.

Reinforcement Learning agents can be supported by feedback from human teachers in the learning loop that guides the learning process. In this work we propose two hybrid strategies of Policy Search Reinforcement Learning and Interactive Machine Learning that benefit from both sources of information, the cost function and the human corrective feedback, for accelerating the convergence and improving the final performance of the learning process. Experiments with simulated and real systems of balancing tasks and a 3 DoF robot arm validate the advantages of the proposed learning strategies: (i) they speed up the convergence of the learning process between 3 and 30 times, saving considerable time during the agent adaptation, and (ii) they allow including non-expert feedback because they have low sensibility to erroneous human advice.

Improving Q-learning by initialization of the Q matrix and a nice related work of that approach

Ee Soong Low, Pauline Ong, Kah Chun Cheah, Solving the optimal path planning of a mobile robot using improved Q-learning, Robotics and Autonomous Systems, Volume 115, 2019, Pages 143-161, DOI: 10.1016/j.robot.2019.02.013.

Q-learning, a type of reinforcement learning, has gained increasing popularity in autonomous mobile robot path planning recently, due to its self-learning ability without requiring a priori model of the environment. Yet, despite such advantage, Q-learning exhibits slow convergence to the optimal solution. In order to address this limitation, the concept of partially guided Q-learning is introduced wherein, the flower pollination algorithm (FPA) is utilized to improve the initialization of Q-learning. Experimental evaluation of the proposed improved Q-learning under the challenging environment with a different layout of obstacles shows that the convergence of Q-learning can be accelerated when Q-values are initialized appropriately using the FPA. Additionally, the effectiveness of the proposed algorithm is validated in a real-world experiment using a three-wheeled mobile robot.