Semantic and syntactic bootstrapped learning for robots, inspired in similar processes in humans, that use language as a scaffolding mechanism to improve learning in unknown situations

Worgotter, F.; Geib, C.; Tamosiunaite, M.; Aksoy, E.E.; Piater, J.; Hanchen Xiong; Ude, A.; Nemec, B.; Kraft, D.; Kruger, N.; Wachter, M.; Asfour, T., Structural Bootstrapping—A Novel, Generative Mechanism for Faster and More Efficient Acquisition of Action-Knowledge, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.140,154, June 2015, DOI: 10.1109/TAMD.2015.2427233.

Humans, but also robots, learn to improve their behavior. Without existing knowledge, learning either needs to be explorative and, thus, slow or-to be more efficient-it needs to rely on supervision, which may not always be available. However, once some knowledge base exists an agent can make use of it to improve learning efficiency and speed. This happens for our children at the age of around three when they very quickly begin to assimilate new information by making guided guesses how this fits to their prior knowledge. This is a very efficient generative learning mechanism in the sense that the existing knowledge is generalized into as-yet unexplored, novel domains. So far generative learning has not been employed for robots and robot learning remains to be a slow and tedious process. The goal of the current study is to devise for the first time a general framework for a generative process that will improve learning and which can be applied at all different levels of the robot’s cognitive architecture. To this end, we introduce the concept of structural bootstrapping-borrowed and modified from child language acquisition-to define a probabilistic process that uses existing knowledge together with new observations to supplement our robot’s data-base with missing information about planning-, object-, as well as, action-relevant entities. In a kitchen scenario, we use the example of making batter by pouring and mixing two components and show that the agent can efficiently acquire new knowledge about planning operators, objects as well as required motor pattern for stirring by structural bootstrapping. Some benchmarks are shown, too, that demonstrate how structural bootstrapping improves performance.

Developmental approach for a robot manipulator that learns in several bootstrapped stages, strongly inspired in infant development

Ugur, E.; Nagai, Y.; Sahin, E.; Oztop, E., Staged Development of Robot Skills: Behavior Formation, Affordance Learning and Imitation with Motionese, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.119,139, June 2015, DOI: 10.1109/TAMD.2015.2426192.

Inspired by infant development, we propose a three staged developmental framework for an anthropomorphic robot manipulator. In the first stage, the robot is initialized with a basic reach-and- enclose-on-contact movement capability, and discovers a set of behavior primitives by exploring its movement parameter space. In the next stage, the robot exercises the discovered behaviors on different objects, and learns the caused effects; effectively building a library of affordances and associated predictors. Finally, in the third stage, the learned structures and predictors are used to bootstrap complex imitation and action learning with the help of a cooperative tutor. The main contribution of this paper is the realization of an integrated developmental system where the structures emerging from the sensorimotor experience of an interacting real robot are used as the sole building blocks of the subsequent stages that generate increasingly more complex cognitive capabilities. The proposed framework includes a number of common features with infant sensorimotor development. Furthermore, the findings obtained from the self-exploration and motionese guided human-robot interaction experiments allow us to reason about the underlying mechanisms of simple-to-complex sensorimotor skill progression in human infants.

Finding the common utility of actions in several tasks learnt in the same domain in order to reduce the learning cost of reinforcement learning

Rosman, B.; Ramamoorthy, S., Action Priors for Learning Domain Invariances, Autonomous Mental Development, IEEE Transactions on , vol.7, no.2, pp.107,118, June 2015, DOI: 10.1109/TAMD.2015.2419715.

An agent tasked with solving a number of different decision making problems in similar environments has an opportunity to learn over a longer timescale than each individual task. Through examining solutions to different tasks, it can uncover behavioral invariances in the domain, by identifying actions to be prioritized in local contexts, invariant to task details. This information has the effect of greatly increasing the speed of solving new problems. We formalise this notion as action priors, defined as distributions over the action space, conditioned on environment state, and show how these can be learnt from a set of value functions. We apply action priors in the setting of reinforcement learning, to bias action selection during exploration. Aggressive use of action priors performs context based pruning of the available actions, thus reducing the complexity of lookahead during search. We additionally define action priors over observation features, rather than states, which provides further flexibility and generalizability, with the additional benefit of enabling feature selection. Action priors are demonstrated in experiments in a simulated factory environment and a large random graph domain, and show significant speed ups in learning new tasks. Furthermore, we argue that this mechanism is cognitively plausible, and is compatible with findings from cognitive psychology.

A brief general explanation of Rao-Blacwellization and a new way of applying it to reduce the variance of a point estimation in a sequential bayesian setting

Petetin, Y.; Desbouvries, F., Bayesian Conditional Monte Carlo Algorithms for Nonlinear Time-Series State Estimation, Signal Processing, IEEE Transactions on , vol.63, no.14, pp.3586,3598, DOI: 10.1109/TSP.2015.2423251.

Bayesian filtering aims at estimating sequentially a hidden process from an observed one. In particular, sequential Monte Carlo (SMC) techniques propagate in time weighted trajectories which represent the posterior probability density function (pdf) of the hidden process given the available observations. On the other hand, conditional Monte Carlo (CMC) is a variance reduction technique which replaces the estimator of a moment of interest by its conditional expectation given another variable. In this paper, we show that up to some adaptations, one can make use of the time recursive nature of SMC algorithms in order to propose natural temporal CMC estimators of some point estimates of the hidden process, which outperform the associated crude Monte Carlo (MC) estimator whatever the number of samples. We next show that our Bayesian CMC estimators can be computed exactly, or approximated efficiently, in some hidden Markov chain (HMC) models; in some jump Markov state-space systems (JMSS); as well as in multitarget filtering. Finally our algorithms are validated via simulations.

Efficient sampling of the agent-world interaction in reinforcement learning through the use of simulators with diverse fidelity to the real system

Cutler, M.; Walsh, T.J.; How, J.P., Real-World Reinforcement Learning via Multifidelity Simulators, Robotics, IEEE Transactions on , vol.31, no.3, pp.655,671, June 2015, DOI: 10.1109/TRO.2015.2419431.

Reinforcement learning (RL) can be a tool for designing policies and controllers for robotic systems. However, the cost of real-world samples remains prohibitive as many RL algorithms require a large number of samples before learning useful policies. Simulators are one way to decrease the number of required real-world samples, but imperfect models make deciding when and how to trust samples from a simulator difficult. We present a framework for efficient RL in a scenario where multiple simulators of a target task are available, each with varying levels of fidelity. The framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator that will still provide it with useful information. Theoretical proofs of the framework’s sample complexity are given and empirical results are demonstrated on a remote-controlled car with multiple simulators. The approach enables RL algorithms to find near-optimal policies in a physical robot domain with fewer expensive real-world samples than previous transfer approaches or learning without simulators.

Checking the behavior of robotic software (i.e., verification) and embedded sw in general, with a good related work on the issue

Lyons, D.M.; Arkin, R.C.; Shu Jiang; Tsung-Ming Liu; Nirmal, P., Performance Verification for Behavior-Based Robot Missions, Robotics, IEEE Transactions on , vol.31, no.3, pp.619,636, June 2015, DOI: 10.1109/TRO.2015.2418592.

Certain robot missions need to perform predictably in a physical environment that may have significant uncertainty. One approach is to leverage automatic software verification techniques to establish a performance guarantee. The addition of an environment model and uncertainty in both program and environment, however, means that the state space of a model-checking solution to the problem can be prohibitively large. An approach based on behavior-based controllers in a process-algebra framework that avoids state-space combinatorics is presented here. In this approach, verification of the robot program in the uncertain environment is reduced to a filtering problem for a Bayesian network. Validation results are presented for the verification of a multiple-waypoint and an autonomous exploration robot mission.

Example of application of bayesian network learning and inference to robotics, and a brief but useful related work on learning by imitation

Dan Song; Ek, C.H.; Huebner, K.; Kragic, D., Task-Based Robot Grasp Planning Using Probabilistic Inference, Robotics, IEEE Transactions on , vol.31, no.3, pp.546,561, June 2015, DOI: 10.1109/TRO.2015.2409912.

Grasping and manipulating everyday objects in a goal-directed manner is an important ability of a service robot. The robot needs to reason about task requirements and ground these in the sensorimotor information. Grasping and interaction with objects are challenging in real-world scenarios, where sensorimotor uncertainty is prevalent. This paper presents a probabilistic framework for the representation and modeling of robot-grasping tasks. The framework consists of Gaussian mixture models for generic data discretization, and discrete Bayesian networks for encoding the probabilistic relations among various task-relevant variables, including object and action features as well as task constraints. We evaluate the framework using a grasp database generated in a simulated environment including a human and two robot hand models. The generative modeling approach allows the prediction of grasping tasks given uncertain sensory data, as well as object and grasp selection in a task-oriented manner. Furthermore, the graphical model framework provides insights into dependencies between variables and features relevant for object grasping.

The problem of monitoring events that can only be predicted stochastically, applied to mobile sensors for monitoring

Jingjin Yu; Karaman, S.; Rus, D., Persistent Monitoring of Events With Stochastic Arrivals at Multiple Stations, Robotics, IEEE Transactions on , vol.31, no.3, pp.521,535, June 2015, DOI: 10.1109/TRO.2015.2409453.

This paper introduces a new mobile sensor scheduling problem involving a single robot tasked to monitor several events of interest that are occurring at different locations (stations). Of particular interest is the monitoring of transient events of a stochastic nature, with applications ranging from natural phenomena (e.g., monitoring abnormal seismic activity around a volcano using a ground robot) to urban activities (e.g., monitoring early formations of traffic congestion using an aerial robot). Motivated by examples like these, this paper focuses on problems in which the precise occurrence times of the events are unknown apriori, but statistics for their interarrival times are available. In monitoring such events, the robot seeks to: (1) maximize the number of events observed and (2) minimize the delay between two consecutive observations of events occurring at the same location. This paper considers the case when a robot is tasked with optimizing the event observations in a balanced manner, following a cyclic patrolling route. To tackle this problem, first, assuming that the cyclic ordering of stations is known, we prove the existence and uniqueness of the optimal solution and show that the solution has desirable convergence rate and robustness. Our constructive proof also yields an efficient algorithm for computing the unique optimal solution with O(n) time complexity, in which n is the number of stations, with O(log n) time complexity for incrementally adding or removing stations. Except for the algorithm, our analysis remains valid when the cyclic order is unknown. We then provide a polynomial-time approximation scheme that computes for any ε > 0 a (1 + ε)-optimal solution for this more general, NP-hard problem.

Brief but nice related work about structured prediction (MRFs, CRFs, etc.)

Bratieres, S.; Quadrianto, N.; Ghahramani, Z., GPstruct: Bayesian Structured Prediction Using Gaussian Processes, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.7, pp.1514,1520, July 1 2015, DOI: 10.1109/TPAMI.2014.2366151.

We introduce a conceptually novel structured prediction model, GPstruct, which is kernelized, non-parametric and Bayesian, by design. We motivate the model with respect to existing approaches, among others, conditional random fields (CRFs), maximum margin Markov networks (M ^3 N), and structured support vector machines (SVMstruct), which embody only a subset of its properties. We present an inference procedure based on Markov Chain Monte Carlo. The framework can be instantiated for a wide range of structured objects such as linear chains, trees, grids, and other general graphs. As a proof of concept, the model is benchmarked on several natural language processing tasks and a video gesture segmentation task involving a linear chain structure. We show prediction accuracies for GPstruct which are comparable to or exceeding those of CRFs and SVMstruct.

Accelerating the updating stage of a PF through selection of a few representative particles and interpolation of their weights to the rest, with interesting methods for selection and interpolation and a nice related work of efficiency-improved PFs

Shabat, G.; Shmueli, Y.; Bermanis, A.; Averbuch, A., Accelerating Particle Filter Using Randomized Multiscale and Fast Multipole Type Methods, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.7, pp.1396,1407, July 1 2015, DOI: 10.1109/TPAMI.2015.2392754.

Particle filter is a powerful tool for state tracking using non-linear observations. We present a multiscale based method that accelerates the tracking computation by particle filters. Unlike the conventional way, which calculates weights over all particles in each cycle of the algorithm, we sample a small subset from the source particles using matrix decomposition methods. Then, we apply a function extension algorithm that uses a particle subset to recover the density function for all the rest of the particles not included in the chosen subset. The computational effort is substantial especially when multiple objects are tracked concurrently. The proposed algorithm significantly reduces the computational load. By using the Fast Gaussian Transform, the complexity of the particle selection step is reduced to a linear time in n and k , where n is the number of particles and k is the number of particles in the selected subset. We demonstrate our method on both simulated and on real data such as object tracking in video sequences.