Category Archives: Developmental Robotics

Improving Q-learning by initialization of the Q matrix and a nice related work of that approach

Ee Soong Low, Pauline Ong, Kah Chun Cheah, Solving the optimal path planning of a mobile robot using improved Q-learning, Robotics and Autonomous Systems, Volume 115, 2019, Pages 143-161, DOI: 10.1016/j.robot.2019.02.013.

Q-learning, a type of reinforcement learning, has gained increasing popularity in autonomous mobile robot path planning recently, due to its self-learning ability without requiring a priori model of the environment. Yet, despite such advantage, Q-learning exhibits slow convergence to the optimal solution. In order to address this limitation, the concept of partially guided Q-learning is introduced wherein, the flower pollination algorithm (FPA) is utilized to improve the initialization of Q-learning. Experimental evaluation of the proposed improved Q-learning under the challenging environment with a different layout of obstacles shows that the convergence of Q-learning can be accelerated when Q-values are initialized appropriately using the FPA. Additionally, the effectiveness of the proposed algorithm is validated in a real-world experiment using a three-wheeled mobile robot.

RL and Inverse RL based on MDPs for autonomous vehicles, and a nice historical review of the topic of a.v.

Changxi You, Jianbo Lu, Dimitar Filev, Panagiotis Tsiotras, Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning, Robotics and Autonomous Systems, Volume 114, 2019, Pages 1-18 DOI: 10.1016/j.robot.2019.01.003.

Autonomous vehicles promise to improve traffic safety while, at the same time, increase fuel efficiency and reduce congestion. They represent the main trend in future intelligent transportation systems. This paper concentrates on the planning problem of autonomous vehicles in traffic. We model the interaction between the autonomous vehicle and the environment as a stochastic Markov decision process (MDP) and consider the driving style of an expert driver as the target to be learned. The road geometry is taken into consideration in the MDP model in order to incorporate more diverse driving styles. The desired, expert-like driving behavior of the autonomous vehicle is obtained as follows: First, we design the reward function of the corresponding MDP and determine the optimal driving strategy for the autonomous vehicle using reinforcement learning techniques. Second, we collect a number of demonstrations from an expert driver and learn the optimal driving strategy based on data using inverse reinforcement learning. The unknown reward function of the expert driver is approximated using a deep neural-network (DNN). We clarify and validate the application of the maximum entropy principle (MEP) to learn the DNN reward function, and provide the necessary derivations for using the maximum entropy principle to learn a parameterized feature (reward) function. Simulated results demonstrate the desired driving behaviors of an autonomous vehicle using both the reinforcement learning and inverse reinforcement learning techniques.

A developmental architecture for sensory-motor skills based on predictors, and a nice state-of-the-art in cognitive architectures for sensory-motor skill learning

E. Wieser and G. Cheng, A Self-Verifying Cognitive Architecture for Robust Bootstrapping of Sensory-Motor Skills via Multipurpose Predictors, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 4, pp. 1081-1095, DOI: 10.1109/TCDS.2018.2871857.

The autonomous acquisition of sensory-motor skills along multiple developmental stages is one of the current challenges in robotics. To this end, we propose a new developmental cognitive architecture that combines multipurpose predictors and principles of self-verification for the robust bootstrapping of sensory-motor skills. Our architecture operates with loops formed by both mental simulation of sensory-motor sequences and their subsequent physical trial on a robot. During these loops, verification algorithms monitor the predicted and the physically observed sensory-motor data. Multiple types of predictors are acquired through several developmental stages. As a result, the architecture can select and plan actions, adapt to various robot platforms by adjusting proprioceptive feedback, predict the risk of self-collision, learn from a previous interaction stage by validating and extracting sensory-motor data for training the predictor of a subsequent stage, and finally acquire an internal representation for evaluating the performance of its predictors. These cognitive capabilities in turn realize the bootstrapping of early hand-eye coordination and its improvement. We validate the cognitive capabilities experimentally and, in particular, show an improvement of reaching as an example skill.

A definition of emergence and its application to emergence in robots

R. L. Sturdivant and E. K. P. Chong, The Necessary and Sufficient Conditions for Emergence in Systems Applied to Symbol Emergence in Robots, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 4, pp. 1035-1042, DOI: 10.1109/TCDS.2017.2731361.

A conceptual model for emergence with downward causation is developed. In addition, the necessary and sufficient conditions are identified for a phenomenon to be considered emergent in a complex system. It is then applied to symbol emergence in robots. This paper is motivated by the usefulness of emergence to explain a wide variety of phenomena in systems, and cognition in natural and artificial creatures. Downward causation is shown to be a critical requirement for potentially emergent phenomena to be considered actually emergent. Models of emergence with and without downward causation are described and how weak emergence can include downward causation. A process flow is developed for distinguishing emergence from nonemergence based upon the application of reductionism and detection of downward causation. Examples are shown for applying the necessary and sufficient conditions to filter out actually emergent phenomena from nonemergent ones. Finally, this approach for detecting emergence is applied to complex projects and symbol emergence in robots.

A cognitive architecture for self-development in robots that interact with humans, with a nice state-of-the-art of robot cognitive architectures

C. Moulin-Frier et al., DAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express Knowledge About the World and the Self, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 4, pp. 1005-1022, DOI: 10.1109/TCDS.2017.2754143.

This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both human and robot. The framework, based on a biologically grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the-art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.

A new model of reinforcement learning based on the human brain that copes with continuous spaces through continuous rewards, with a short but nice state-of-the-art of RL applied to large, continuous spaces

Feifei Zhao, Yi Zeng, Guixiang Wang, Jun Bai, Bo Xu, A Brain-Inspired Decision Making Model Based on Top-Down Biasing of Prefrontal Cortex to Basal Ganglia and Its Application in Autonomous UAV Explorations, Cognitive Computation, Volume 10, Issue 2, pp 296–306, DOI: 10.1007/s12559-017-9511-3.

Decision making is a fundamental ability for intelligent agents (e.g., humanoid robots and unmanned aerial vehicles). During decision making process, agents can improve the strategy for interacting with the dynamic environment through reinforcement learning. Many state-of-the-art reinforcement learning models deal with relatively smaller number of state-action pairs, and the states are preferably discrete, such as Q-learning and Actor-Critic algorithms. While in practice, in many scenario, the states are continuous and hard to be properly discretized. Better autonomous decision making methods need to be proposed to handle these problems. Inspired by the mechanism of decision making in human brain, we propose a general computational model, named as prefrontal cortex-basal ganglia (PFC-BG) algorithm. The proposed model is inspired by the biological reinforcement learning pathway and mechanisms from the following perspectives: (1) Dopamine signals continuously update reward-relevant information for both basal ganglia and working memory in prefrontal cortex. (2) We maintain the contextual reward information in working memory. This has a top-down biasing effect on reinforcement learning in basal ganglia. The proposed model separates the continuous states into smaller distinguishable states, and introduces continuous reward function for each state to obtain reward information at different time. To verify the performance of our model, we apply it to many UAV decision making experiments, such as avoiding obstacles and flying through window and door, and the experiments support the effectiveness of the model. Compared with traditional Q-learning and Actor-Critic algorithms, the proposed model is more biologically inspired, and more accurate and faster to make decision.

A robot architecture for humanoids able to coordinate different cognitive processes (perception, decision-making, etc.) in a hierarchical fashion

J. Hwang and J. Tani, Seamless Integration and Coordination of Cognitive Skills in Humanoid Robots: A Deep Learning Approach, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 2, pp. 345-358 DOI: 10.1109/TCDS.2017.2714170.

This paper investigates how adequate coordination among the different cognitive processes of a humanoid robot can be developed through end-to-end learning of direct perception of visuomotor stream. We propose a deep dynamic neural network model built on a dynamic vision network, a motor generation network, and a higher-level network. The proposed model was designed to process and to integrate direct perception of dynamic visuomotor patterns in a hierarchical model characterized by different spatial and temporal constraints imposed on each level. We conducted synthetic robotic experiments in which a robot learned to read human’s intention through observing the gestures and then to generate the corresponding goal-directed actions. Results verify that the proposed model is able to learn the tutored skills and to generalize them to novel situations. The model showed synergic coordination of perception, action, and decision making, and it integrated and coordinated a set of cognitive skills including visual perception, intention reading, attention switching, working memory, action preparation, and execution in a seamless manner. Analysis reveals that coherent internal representations emerged at each level of the hierarchy. Higher-level representation reflecting actional intention developed by means of continuous integration of the lower-level visuo-proprioceptive stream.

An interesting model of Basal Ganglia that performs similarly to Q learning when applied to a robot

Y. Zeng, G. Wang and B. Xu, A Basal Ganglia Network Centric Reinforcement Learning Model and Its Application in Unmanned Aerial Vehicle, IEEE Transactions on Cognitive and Developmental Systems, vol. 10, no. 2, pp. 290-303 DOI: 10.1109/TCDS.2017.2649564.

Reinforcement learning brings flexibility and generality for machine learning, while most of them are mathematical optimization driven approaches, and lack of cognitive and neural evidence. In order to provide a more cognitive and neural mechanisms driven foundation and validate its applicability in complex task, we develop a basal ganglia (BG) network centric reinforcement learning model. Compared to existing work on modeling BG, this paper is unique from the following perspectives: 1) the orbitofrontal cortex (OFC) is taken into consideration. OFC is critical in decision making because of its responsibility for reward representation and is critical in controlling the learning process, while most of the BG centric models do not include OFC; 2) to compensate the inaccurate memory of numeric values, precise encoding is proposed to enable working memory system remember important values during the learning process. The method combines vector convolution and the idea of storage by digit bit and is efficient for accurate value storage; and 3) for information coding, the Hodgkin-Huxley model is used to obtain a more biological plausible description of action potential with plenty of ionic activities. To validate the effectiveness of the proposed model, we apply the model to the unmanned aerial vehicle (UAV) autonomous learning process in a 3-D environment. Experimental results show that our model is able to give the UAV the ability of free exploration in the environment and has comparable learning speed as the Q learning algorithm, while the major advances for our model is that it is with solid cognitive and neural basis.

How a robot can learn to recognize itself on a mirror

Zeng, Y., Zhao, Y., Bai, J. et al., Toward Robot Self-Consciousness (II): Brain-Inspired Robot Bodily Self Model for Self-Recognition, Cogn Comput (2018) 10: 307, DOI: 10.1007/s12559-017-9505-1.

The neural correlates and nature of self-consciousness is an advanced topic in Cognitive Neuroscience. Only a few animal species have been testified to be with this cognitive ability. From artificial intelligence and robotics point of view, few efforts are deeply rooted in the neural correlates and brain mechanisms of biological self-consciousness. Despite the fact that the scientific understanding of biological self-consciousness is still in preliminary stage, we make our efforts to integrate and adopt known biological findings of self-consciousness to build a brain-inspired model for robot self-consciousness. In this paper, we propose a brain-inspired robot bodily self model based on extensions to primate mirror neuron system and apply it to humanoid robot for self recognition. In this model, the robot firstly learns the correlations between self-generated actions and visual feedbacks in motion by learning with spike timing dependent plasticity (STDP), and then learns the appearance of body part with the expectation that the visual feedback is consistent with its motion. Based on this model, the robot uses multisensory integration to learn its own body in real world and in mirror. Then it can distinguish itself from others. In a mirror test setting with three robots with the same appearance, with the proposed brain-inspired robot bodily self model, each of them can recognize itself in the mirror after these robots make random movements at the same time. The theoretic modeling and experimental validations indicate that the brain-inspired robot bodily self model is biologically inspired, and computationally feasible as a foundation for robot self recognition.

Multi-agent reinfocerment learning for working with high-dimensional spaces

David L. Leottau, Javier Ruiz-del-Solar, Robert Babuška, Decentralized Reinforcement Learning of Robot Behaviors, Artificial Intelligence, Volume 256, 2018, Pages 130-159, DOI: 10.1016/j.artint.2017.12.001.

A multi-agent methodology is proposed for Decentralized Reinforcement Learning (DRL) of individual behaviors in problems where multi-dimensional action spaces are involved. When using this methodology, sub-tasks are learned in parallel by individual agents working toward a common goal. In addition to proposing this methodology, three specific multi agent DRL approaches are considered: DRL-Independent, DRL Cooperative-Adaptive (CA), and DRL-Lenient. These approaches are validated and analyzed with an extensive empirical study using four different problems: 3D Mountain Car, SCARA Real-Time Trajectory Generation, Ball-Dribbling in humanoid soccer robotics, and Ball-Pushing using differential drive robots. The experimental validation provides evidence that DRL implementations show better performances and faster learning times than their centralized counterparts, while using less computational resources. DRL-Lenient and DRL-CA algorithms achieve the best final performances for the four tested problems, outperforming their DRL-Independent counterparts. Furthermore, the benefits of the DRL-Lenient and DRL-CA are more noticeable when the problem complexity increases and the centralized scheme becomes intractable given the available computational resources and training time.