Category Archives: Developmental Robotics

Developmental approach for a robot manipulator that learns in several bootstrapped stages, strongly inspired in infant development

Ugur, E.; Nagai, Y.; Sahin, E.; Oztop, E., Staged Development of Robot Skills: Behavior Formation, Affordance Learning and Imitation with Motionese, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.119,139, June 2015, DOI: 10.1109/TAMD.2015.2426192.

Inspired by infant development, we propose a three staged developmental framework for an anthropomorphic robot manipulator. In the first stage, the robot is initialized with a basic reach-and- enclose-on-contact movement capability, and discovers a set of behavior primitives by exploring its movement parameter space. In the next stage, the robot exercises the discovered behaviors on different objects, and learns the caused effects; effectively building a library of affordances and associated predictors. Finally, in the third stage, the learned structures and predictors are used to bootstrap complex imitation and action learning with the help of a cooperative tutor. The main contribution of this paper is the realization of an integrated developmental system where the structures emerging from the sensorimotor experience of an interacting real robot are used as the sole building blocks of the subsequent stages that generate increasingly more complex cognitive capabilities. The proposed framework includes a number of common features with infant sensorimotor development. Furthermore, the findings obtained from the self-exploration and motionese guided human-robot interaction experiments allow us to reason about the underlying mechanisms of simple-to-complex sensorimotor skill progression in human infants.

Efficient sampling of the agent-world interaction in reinforcement learning through the use of simulators with diverse fidelity to the real system

Cutler, M.; Walsh, T.J.; How, J.P., Real-World Reinforcement Learning via Multifidelity Simulators, Robotics, IEEE Transactions on , vol.31, no.3, pp.655,671, June 2015, DOI: 10.1109/TRO.2015.2419431.

Reinforcement learning (RL) can be a tool for designing policies and controllers for robotic systems. However, the cost of real-world samples remains prohibitive as many RL algorithms require a large number of samples before learning useful policies. Simulators are one way to decrease the number of required real-world samples, but imperfect models make deciding when and how to trust samples from a simulator difficult. We present a framework for efficient RL in a scenario where multiple simulators of a target task are available, each with varying levels of fidelity. The framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator that will still provide it with useful information. Theoretical proofs of the framework’s sample complexity are given and empirical results are demonstrated on a remote-controlled car with multiple simulators. The approach enables RL algorithms to find near-optimal policies in a physical robot domain with fewer expensive real-world samples than previous transfer approaches or learning without simulators.

Deducing the space concept from the sensorimotor behaviour of a robot, and an interesting related work of uninterpreted sensors and actuators in developmental robotics that deserves a deeper look

Alban Laflaquière, J. Kevin O’Regan, Sylvain Argentieri, Bruno Gas, Alexander V. Terekhov, Learning agent’s spatial configuration from sensorimotor invariants, Robotics and Autonomous Systems, Volume 71, September 2015, Pages 49-59, ISSN 0921-8890, DOI: 10.1016/j.robot.2015.01.003.

The design of robotic systems is largely dictated by our purely human intuition about how we perceive the world. This intuition has been proven incorrect with regard to a number of critical issues, such as visual change blindness. In order to develop truly autonomous robots, we must step away from this intuition and let robotic agents develop their own way of perceiving. The robot should start from scratch and gradually develop perceptual notions, under no prior assumptions, exclusively by looking into its sensorimotor experience and identifying repetitive patterns and invariants. One of the most fundamental perceptual notions, space, cannot be an exception to this requirement. In this paper we look into the prerequisites for the emergence of simplified spatial notions on the basis of a robot’s sensorimotor flow. We show that the notion of space as environment-independent cannot be deduced solely from exteroceptive information, which is highly variable and is mainly determined by the contents of the environment. The environment-independent definition of space can be approached by looking into the functions that link the motor commands to changes in exteroceptive inputs. In a sufficiently rich environment, the kernels of these functions correspond uniquely to the spatial configuration of the agent’s exteroceptors. We simulate a redundant robotic arm with a retina installed at its end-point and show how this agent can learn the configuration space of its retina. The resulting manifold has the topology of the Cartesian product of a plane and a circle, and corresponds to the planar position and orientation of the retina.

Reinforcement learning used for an adaptive attention mechanism, and integrated in an architecture with both top-down and bottom-up vision processing

Ognibene, D.; Baldassare, G., Ecological Active Vision: Four Bioinspired Principles to Integrate Bottom–Up and Adaptive Top–Down Attention Tested With a Simple Camera-Arm Robot, Autonomous Mental Development, IEEE Transactions on , vol.7, no.1, pp.3,25, March 2015. DOI: 10.1109/TAMD.2014.2341351.

Vision gives primates a wealth of information useful to manipulate the environment, but at the same time it can easily overwhelm their computational resources. Active vision is a key solution found by nature to solve this problem: a limited fovea actively displaced in space to collect only relevant information. Here we highlight that in ecological conditions this solution encounters four problems: 1) the agent needs to learn where to look based on its goals; 2) manipulation causes learning feedback in areas of space possibly outside the attention focus; 3) good visual actions are needed to guide manipulation actions, but only these can generate learning feedback; and 4) a limited fovea causes aliasing problems. We then propose a computational architecture (“BITPIC”) to overcome the four problems, integrating four bioinspired key ingredients: 1) reinforcement-learning fovea-based top-down attention; 2) a strong vision-manipulation coupling; 3) bottom-up periphery-based attention; and 4) a novel action-oriented memory. The system is tested with a simple simulated camera-arm robot solving a class of search-and-reach tasks involving color-blob “objects.” The results show that the architecture solves the problems, and hence the tasks, very efficiently, and highlight how the architecture principles can contribute to a full exploitation of the advantages of active vision in ecological conditions.

Mental imaginery for a mobile robot that learns obstacle avoidance

Wilmer Gaona, Esaú Escobar, Jorge Hermosillo, Bruno Lara (2015), Anticipation by multi-modal association through an artificial mental imagery process, Connection Science, 27:1, 68-88, DOI: 10.1080/09540091.2014.95628

Mental imagery has become a central issue in research laboratories seeking to emulate basic cognitive abilities in artificial agents. In this work, we propose a computational model to produce an anticipatory behaviour by means of a multi-modal off-line hebbian association. Unlike the current state of the art, we propose to apply hebbian learning during an internal sensorimotor simulation, emulating a process of mental imagery. We associate visual and tactile stimuli re-enacted by a long-term predictive simulation chain motivated by covert actions. As a result, we obtain a neural network which provides a robot with a mechanism to produce a visually conditioned obstacle avoidance behaviour. We developed our system in a physical Pioneer 3-DX robot and realised two experiments. In the first experiment we test our model on one individual navigating in two different mazes. In the second experiment we assess the robustness of the model by testing in a single environment five individuals trained under different conditions. We believe that our work offers an underpinning mechanism in cognitive robotics for the study of motor control strategies based on internal simulations. These strategies can be seen analogous to the mental imagery process known in humans, opening thus interesting pathways to the construction of upper-level grounded cognitive abilities.

Active exploration strategy for RL in robots, and approximation of value function by Gaussian processes

Jen Jen Chung, Nicholas R.J. Lawrance, Salah Sukkarieh (2015), Learning to soar: Resource-constrained exploration in reinforcement learning, The International Journal of Robotics Research vol. 34, pp. 158-172. DOI: 10.1177/0278364914553683

This paper examines temporal difference reinforcement learning with adaptive and directed exploration for resource-limited missions. The scenario considered is that of an unpowered aerial glider learning to perform energy-gaining flight trajectories in a thermal updraft. The presented algorithm, eGP-SARSA(\u03bb), uses a Gaussian process regression model to estimate the value function in a reinforcement learning framework. The Gaussian process also provides a variance on these estimates that is used to measure the contribution of future observations to the Gaussian process value function model in terms of information gain. To avoid myopic exploration we developed a resource-weighted objective function that combines an estimate of the future information gain using an action rollout with the estimated value function to generate directed explorative action sequences. A number of modifications and computational speed-ups to the algorithm are presented along with a standard GP-SARSA(\u03bb) implementation with Formula -greedy exploration to compare the respective learning performances. The results show that under this objective function, the learning agent is able to continue exploring for better state-action trajectories when platform energy is high and follow conservative energy-gaining trajectories when platform energy is low.

Solving the problem of the slow learning rate of reinfocerment learning through the acquisition of the transition model from the data

Deisenroth, M.P.; Fox, D.; Rasmussen, C.E., Gaussian Processes for Data-Efficient Learning in Robotics and Control, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.2, pp.408,423, Feb. 2015, DOI: 10.1109/TPAMI.2013.218

Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

A new variant of Q-learning that alleviates its slow learning speed (with a brief review of reinforcement learning algorithms)

J.C. van Rooijen, I. Grondman, R. Babuška, Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy, Mechatronics, Volume 24, Issue 8, December 2014, Pages 966-974, ISSN 0957-4158. DOI: 10.1016/j.mechatronics.2014.05.007

Reinforcement learning (RL) is a framework that enables a controller to find an optimal control policy for a task in an unknown environment. Although RL has been successfully used to solve optimal control problems, learning is generally slow. The main causes are the inefficient use of information collected during interaction with the system and the inability to use prior knowledge on the system or the control task. In addition, the learning speed heavily depends on the learning rate parameter, which is difficult to tune.
In this paper, we present a sample-efficient, learning-rate-free version of the Value-Gradient Based Policy (VGBP) algorithm. The main difference between VGBP and other frequently used algorithms, such as Sarsa, is that in VGBP the learning agent has a direct access to the reward function, rather than just the immediate reward values. Furthermore, the agent learns a process model. This enables the algorithm to select control actions by optimizing over the right-hand side of the Bellman equation. We demonstrate the fast learning convergence in simulations and experiments with the underactuated pendulum swing-up task. In addition, we present experimental results for a more complex 2-DOF robotic manipulator.