Category Archives: Developmental Robotics

A robot architecture for humanoids able to coordinate different cognitive processes (perception, decision-making, etc.) in a hierarchical fashion

J. Hwang and J. Tani, Seamless Integration and Coordination of Cognitive Skills in Humanoid Robots: A Deep Learning Approach, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 2, pp. 345-358 DOI: 10.1109/TCDS.2017.2714170.

This paper investigates how adequate coordination among the different cognitive processes of a humanoid robot can be developed through end-to-end learning of direct perception of visuomotor stream. We propose a deep dynamic neural network model built on a dynamic vision network, a motor generation network, and a higher-level network. The proposed model was designed to process and to integrate direct perception of dynamic visuomotor patterns in a hierarchical model characterized by different spatial and temporal constraints imposed on each level. We conducted synthetic robotic experiments in which a robot learned to read human’s intention through observing the gestures and then to generate the corresponding goal-directed actions. Results verify that the proposed model is able to learn the tutored skills and to generalize them to novel situations. The model showed synergic coordination of perception, action, and decision making, and it integrated and coordinated a set of cognitive skills including visual perception, intention reading, attention switching, working memory, action preparation, and execution in a seamless manner. Analysis reveals that coherent internal representations emerged at each level of the hierarchy. Higher-level representation reflecting actional intention developed by means of continuous integration of the lower-level visuo-proprioceptive stream.

An interesting model of Basal Ganglia that performs similarly to Q learning when applied to a robot

Y. Zeng, G. Wang and B. Xu, A Basal Ganglia Network Centric Reinforcement Learning Model and Its Application in Unmanned Aerial Vehicle, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 2, pp. 290-303 DOI: 10.1109/TCDS.2017.2649564.

Reinforcement learning brings flexibility and generality for machine learning, while most of them are mathematical optimization driven approaches, and lack of cognitive and neural evidence. In order to provide a more cognitive and neural mechanisms driven foundation and validate its applicability in complex task, we develop a basal ganglia (BG) network centric reinforcement learning model. Compared to existing work on modeling BG, this paper is unique from the following perspectives: 1) the orbitofrontal cortex (OFC) is taken into consideration. OFC is critical in decision making because of its responsibility for reward representation and is critical in controlling the learning process, while most of the BG centric models do not include OFC; 2) to compensate the inaccurate memory of numeric values, precise encoding is proposed to enable working memory system remember important values during the learning process. The method combines vector convolution and the idea of storage by digit bit and is efficient for accurate value storage; and 3) for information coding, the Hodgkin-Huxley model is used to obtain a more biological plausible description of action potential with plenty of ionic activities. To validate the effectiveness of the proposed model, we apply the model to the unmanned aerial vehicle (UAV) autonomous learning process in a 3-D environment. Experimental results show that our model is able to give the UAV the ability of free exploration in the environment and has comparable learning speed as the Q learning algorithm, while the major advances for our model is that it is with solid cognitive and neural basis.

How a robot can learn to recognize itself on a mirror

Zeng, Y., Zhao, Y., Bai, J. et al., Toward Robot Self-Consciousness (II): Brain-Inspired Robot Bodily Self Model for Self-Recognition, Cogn Comput (2018) 10: 307, DOI: 10.1007/s12559-017-9505-1.

The neural correlates and nature of self-consciousness is an advanced topic in Cognitive Neuroscience. Only a few animal species have been testified to be with this cognitive ability. From artificial intelligence and robotics point of view, few efforts are deeply rooted in the neural correlates and brain mechanisms of biological self-consciousness. Despite the fact that the scientific understanding of biological self-consciousness is still in preliminary stage, we make our efforts to integrate and adopt known biological findings of self-consciousness to build a brain-inspired model for robot self-consciousness. In this paper, we propose a brain-inspired robot bodily self model based on extensions to primate mirror neuron system and apply it to humanoid robot for self recognition. In this model, the robot firstly learns the correlations between self-generated actions and visual feedbacks in motion by learning with spike timing dependent plasticity (STDP), and then learns the appearance of body part with the expectation that the visual feedback is consistent with its motion. Based on this model, the robot uses multisensory integration to learn its own body in real world and in mirror. Then it can distinguish itself from others. In a mirror test setting with three robots with the same appearance, with the proposed brain-inspired robot bodily self model, each of them can recognize itself in the mirror after these robots make random movements at the same time. The theoretic modeling and experimental validations indicate that the brain-inspired robot bodily self model is biologically inspired, and computationally feasible as a foundation for robot self recognition.

Multi-agent reinfocerment learning for working with high-dimensional spaces

David L. Leottau, Javier Ruiz-del-Solar, Robert Babuška, Decentralized Reinforcement Learning of Robot Behaviors, Artificial Intelligence, Volume 256, 2018, Pages 130-159, DOI: 10.1016/j.artint.2017.12.001.

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In addition to proposing this methodology, three specific multi agent DRL approaches are considered: DRL-Independent, DRL Cooperative-Adaptive (CA), and DRL-Lenient. These approaches are validated and analyzed with an extensive empirical study using four different problems: 3D Mountain Car, SCARA Real-Time Trajectory Generation, Ball-Dribbling in humanoid soccer robotics, and Ball-Pushing using differential drive robots. The experimental validation provides evidence that DRL implementations show better performances and faster learning times than their centralized counterparts, while using less computational resources. DRL-Lenient and DRL-CA algorithms achieve the best final performances for the four tested problems, outperforming their DRL-Independent counterparts. Furthermore, the benefits of the DRL-Lenient and DRL-CA are more noticeable when the problem complexity increases and the centralized scheme becomes intractable given the available computational resources and training time.

Using interactive reinforcement learning with the advisor being another reinforcement learning agent

Francisco Cruz, Sven Magg, Yukie Nagai & Stefan Wermter, Improving interactive reinforcement learning: What makes a good teacher?, Connection Science, DOI: 10.1080/09540091.2018.1443318.

Interactive reinforcement learning (IRL) has become an important apprenticeship approach to speed up convergence in classic reinforcement learning (RL) problems. In this regard, a variant of IRL is policy shaping which uses a parent-like trainer to propose the next action to be performed and by doing so reduces the search space by advice. On some occasions, the trainer may be another artificial agent which in turn was trained using RL methods to afterward becoming an advisor for other learner-agents. In this work, we analyse internal representations and characteristics of artificial agents to determine which agent may outperform others to become a better trainer-agent. Using a polymath agent, as compared to a specialist agent, an advisor leads to a larger reward and faster convergence of the reward signal and also to a more stable behaviour in terms of the state visit frequency of the learner-agents. Moreover, we analyse system interaction parameters in order to determine how influential they are in the apprenticeship process, where the consistency of feedback is much more relevant when dealing with different learner obedience parameters.

Reinforcement learning to recover legged robots from damages

Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Jean-Baptiste Mouret, Reset-free Trial-and-Error Learning for Robot Damage Recovery, Robotics and Autonomous Systems, Volume 100, 2018, Pages 236-250, DOI: 10.1016/j.robot.2017.11.010.

The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called “Reset-free Trial-and-Error” (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.

Layered learning: how to learn hierarchically more complex behaviors based on simpler ones, applied to robot soccer

Patrick MacAlpine, Peter Stone, Overlapping layered learning, Artificial Intelligence, Volume 254, 2018, Pages 21-43, DOI: 10.1016/j.artint.2017.09.001.

Layered learning is a hierarchical machine learning paradigm that enables learning of complex behaviors by incrementally learning a series of sub-behaviors. A key feature of layered learning is that higher layers directly depend on the learned lower layers. In its original formulation, lower layers were frozen prior to learning higher layers. This article considers a major extension to the paradigm that allows learning certain behaviors independently, and then later stitching them together by learning at the “seams” where their influences overlap. The UT Austin Villa 2014 RoboCup 3D simulation team, using such overlapping layered learning, learned a total of 19 layered behaviors for a simulated soccer-playing robot, organized both in series and in parallel. To the best of our knowledge this is more than three times the number of layered behaviors in any prior layered learning system. Furthermore, the complete learning process is repeated on four additional robot body types, showcasing its generality as a paradigm for efficient behavior learning. The resulting team won the RoboCup 2014 championship with an undefeated record, scoring 52 goals and conceding none. This article includes a detailed experimental analysis of the team’s performance and the overlapping layered learning approach that led to its success.

Interesting approach to learning the sensorimotor behavior of a robot and of its predictive capabilities through NN

R. Santos, R. Ferreira, Â. Cardoso and A. Bernardino, SNet: Co-Developing Artificial Retinas and Predictive Internal Models for Real Robots, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 3, pp. 213-222, DOI: 10.1109/TCDS.2016.2638885.

This paper focuses on a recently developed biologically inspired architecture, here denoted as sensorimotor network (SNet), able to co-develop sensorimotor structures directly from data acquired by a robot interacting with its environment. Such networks learn efficient internal models of the sensorimotor system, developing simultaneously sensor and motor representations as well as predictive models of the sensorimotor relationships adapted to their operating environment. Here, we describe our recent model of sensorimotor development and compare its performance with neural network models in predicting self-induced stimuli. In addition, we illustrate the influence of available resources and environment characteristics in the development of the SNet structures. Finally, an SNet is trained using real data recorded during a quadricopter drone flight.

Chaos theory for modeling behavior of mobile robots that solve tasks evolutionarily

Federico Da Rold, Chaotic analysis of embodied and situated agents, Robotics and Autonomous Systems, Volume 95, 2017, Pages 143-159, DOI: 10.1016/j.robot.2017.06.004.

Embodied and situated view of cognition is a transdisciplinary framework which stresses the importance of real time and dynamical interaction of an agent with the surrounding environment. This article presents a series of evolutionary robotics experiments that operationalize such concept, training miniature two-wheeled mobile robots to autonomously solve a temporal task. In order to provide a numerical description of the robots’ behavior, chaotic measures are estimated on the attractor reconstructed from the recorded positions of the agent. Chaos theory provides a rigorous mathematical framework consistent with an antireductionist approach, useful for understanding embodied and situated systems while avoiding a decomposition of the integrated system brain–body–environment. Time series are analyzed in detail using nonlinear mathematical tools in order to verify the presence of low-dimensional deterministic dynamical systems, a fundamental prerequisite for chaos theory. In particular, the recorded time series are evaluated with nonlinear prediction error to unveil deterministic dynamics, cross-prediction error to determine the stationarity of the signal, and surrogate data testing to verify the existence of nonlinear components in the underlying system. Estimators for quantifying level of chaos and fractal dimension are applied to suitable datasets. Results show that robots governed by a chaotic dynamic are more efficient at adapting to environments never experience during evolution, demonstrating robustness towards novel and unpredictable situations. Furthermore, chaotic measures, in particular fractal dimension, are correlated with the performance if robots exhibit a similar behavioral strategy.

Using bad results during policy iteration, and not only good ones, to improve the learning process

A. Colomé and C. Torras, Dual REPS: A Generalization of Relative Entropy Policy Search Exploiting Bad Experiences, IEEE Transactions on Robotics, vol. 33, no. 4, pp. 978-985, DOI: 10.1109/TRO.2017.2679202.

Policy search (PS) algorithms are widely used for their simplicity and effectiveness in finding solutions for robotic problems. However, most current PS algorithms derive policies by statistically fitting the data from the best experiments only. This means that experiments yielding a poor performance are usually discarded or given too little influence on the policy update. In this paper, we propose a generalization of the relative entropy policy search (REPS) algorithm that takes bad experiences into consideration when computing a policy. The proposed approach, named dual REPS (DREPS) following the philosophical interpretation of the duality between good and bad, finds clusters of experimental data yielding a poor behavior and adds them to the optimization problem as a repulsive constraint. Thus, considering that there is a duality between good and bad data samples, both are taken into account in the stochastic search for a policy. Additionally, a cluster with the best samples may be included as an attractor to enforce faster convergence to a single optimal solution in multimodal problems. We first tested our proposed approach in a simulated reinforcement learning setting and found that DREPS considerably speeds up the learning process, especially during the early optimization steps and in cases where other approaches get trapped in between several alternative maxima. Further experiments in which a real robot had to learn a task with a multimodal reward function confirm the advantages of our proposed approach with respect to REPS.