Efficient sampling of the agent-world interaction in reinforcement learning through the use of simulators with diverse fidelity to the real system

Cutler, M.; Walsh, T.J.; How, J.P., Real-World Reinforcement Learning via Multifidelity Simulators, Robotics, IEEE Transactions on , vol.31, no.3, pp.655,671, June 2015, DOI: 10.1109/TRO.2015.2419431.

Reinforcement learning (RL) can be a tool for designing policies and controllers for robotic systems. However, the cost of real-world samples remains prohibitive as many RL algorithms require a large number of samples before learning useful policies. Simulators are one way to decrease the number of required real-world samples, but imperfect models make deciding when and how to trust samples from a simulator difficult. We present a framework for efficient RL in a scenario where multiple simulators of a target task are available, each with varying levels of fidelity. The framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing a learning agent to choose to run trajectories at the lowest level simulator that will still provide it with useful information. Theoretical proofs of the framework’s sample complexity are given and empirical results are demonstrated on a remote-controlled car with multiple simulators. The approach enables RL algorithms to find near-optimal policies in a physical robot domain with fewer expensive real-world samples than previous transfer approaches or learning without simulators.

Checking the behavior of robotic software (i.e., verification) and embedded sw in general, with a good related work on the issue

Lyons, D.M.; Arkin, R.C.; Shu Jiang; Tsung-Ming Liu; Nirmal, P., Performance Verification for Behavior-Based Robot Missions, Robotics, IEEE Transactions on , vol.31, no.3, pp.619,636, June 2015, DOI: 10.1109/TRO.2015.2418592.

Certain robot missions need to perform predictably in a physical environment that may have significant uncertainty. One approach is to leverage automatic software verification techniques to establish a performance guarantee. The addition of an environment model and uncertainty in both program and environment, however, means that the state space of a model-checking solution to the problem can be prohibitively large. An approach based on behavior-based controllers in a process-algebra framework that avoids state-space combinatorics is presented here. In this approach, verification of the robot program in the uncertain environment is reduced to a filtering problem for a Bayesian network. Validation results are presented for the verification of a multiple-waypoint and an autonomous exploration robot mission.

Example of application of bayesian network learning and inference to robotics, and a brief but useful related work on learning by imitation

Dan Song; Ek, C.H.; Huebner, K.; Kragic, D., Task-Based Robot Grasp Planning Using Probabilistic Inference, Robotics, IEEE Transactions on , vol.31, no.3, pp.546,561, June 2015, DOI: 10.1109/TRO.2015.2409912.

Grasping and manipulating everyday objects in a goal-directed manner is an important ability of a service robot. The robot needs to reason about task requirements and ground these in the sensorimotor information. Grasping and interaction with objects are challenging in real-world scenarios, where sensorimotor uncertainty is prevalent. This paper presents a probabilistic framework for the representation and modeling of robot-grasping tasks. The framework consists of Gaussian mixture models for generic data discretization, and discrete Bayesian networks for encoding the probabilistic relations among various task-relevant variables, including object and action features as well as task constraints. We evaluate the framework using a grasp database generated in a simulated environment including a human and two robot hand models. The generative modeling approach allows the prediction of grasping tasks given uncertain sensory data, as well as object and grasp selection in a task-oriented manner. Furthermore, the graphical model framework provides insights into dependencies between variables and features relevant for object grasping.

The problem of monitoring events that can only be predicted stochastically, applied to mobile sensors for monitoring

Jingjin Yu; Karaman, S.; Rus, D., Persistent Monitoring of Events With Stochastic Arrivals at Multiple Stations, Robotics, IEEE Transactions on , vol.31, no.3, pp.521,535, June 2015, DOI: 10.1109/TRO.2015.2409453.

This paper introduces a new mobile sensor scheduling problem involving a single robot tasked to monitor several events of interest that are occurring at different locations (stations). Of particular interest is the monitoring of transient events of a stochastic nature, with applications ranging from natural phenomena (e.g., monitoring abnormal seismic activity around a volcano using a ground robot) to urban activities (e.g., monitoring early formations of traffic congestion using an aerial robot). Motivated by examples like these, this paper focuses on problems in which the precise occurrence times of the events are unknown apriori, but statistics for their interarrival times are available. In monitoring such events, the robot seeks to: (1) maximize the number of events observed and (2) minimize the delay between two consecutive observations of events occurring at the same location. This paper considers the case when a robot is tasked with optimizing the event observations in a balanced manner, following a cyclic patrolling route. To tackle this problem, first, assuming that the cyclic ordering of stations is known, we prove the existence and uniqueness of the optimal solution and show that the solution has desirable convergence rate and robustness. Our constructive proof also yields an efficient algorithm for computing the unique optimal solution with O(n) time complexity, in which n is the number of stations, with O(log n) time complexity for incrementally adding or removing stations. Except for the algorithm, our analysis remains valid when the cyclic order is unknown. We then provide a polynomial-time approximation scheme that computes for any ε > 0 a (1 + ε)-optimal solution for this more general, NP-hard problem.

Brief but nice related work about structured prediction (MRFs, CRFs, etc.)

Bratieres, S.; Quadrianto, N.; Ghahramani, Z., GPstruct: Bayesian Structured Prediction Using Gaussian Processes, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.7, pp.1514,1520, July 1 2015, DOI: 10.1109/TPAMI.2014.2366151.

We introduce a conceptually novel structured prediction model, GPstruct, which is kernelized, non-parametric and Bayesian, by design. We motivate the model with respect to existing approaches, among others, conditional random fields (CRFs), maximum margin Markov networks (M ^3 N), and structured support vector machines (SVMstruct), which embody only a subset of its properties. We present an inference procedure based on Markov Chain Monte Carlo. The framework can be instantiated for a wide range of structured objects such as linear chains, trees, grids, and other general graphs. As a proof of concept, the model is benchmarked on several natural language processing tasks and a video gesture segmentation task involving a linear chain structure. We show prediction accuracies for GPstruct which are comparable to or exceeding those of CRFs and SVMstruct.

Accelerating the updating stage of a PF through selection of a few representative particles and interpolation of their weights to the rest, with interesting methods for selection and interpolation and a nice related work of efficiency-improved PFs

Shabat, G.; Shmueli, Y.; Bermanis, A.; Averbuch, A., Accelerating Particle Filter Using Randomized Multiscale and Fast Multipole Type Methods, Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.37, no.7, pp.1396,1407, July 1 2015, DOI: 10.1109/TPAMI.2015.2392754.

Particle filter is a powerful tool for state tracking using non-linear observations. We present a multiscale based method that accelerates the tracking computation by particle filters. Unlike the conventional way, which calculates weights over all particles in each cycle of the algorithm, we sample a small subset from the source particles using matrix decomposition methods. Then, we apply a function extension algorithm that uses a particle subset to recover the density function for all the rest of the particles not included in the chosen subset. The computational effort is substantial especially when multiple objects are tracked concurrently. The proposed algorithm significantly reduces the computational load. By using the Fast Gaussian Transform, the complexity of the particle selection step is reduced to a linear time in n and k , where n is the number of particles and k is the number of particles in the selected subset. We demonstrate our method on both simulated and on real data such as object tracking in video sequences.

Deducing the space concept from the sensorimotor behaviour of a robot, and an interesting related work of uninterpreted sensors and actuators in developmental robotics that deserves a deeper look

Alban Laflaquière, J. Kevin O’Regan, Sylvain Argentieri, Bruno Gas, Alexander V. Terekhov, Learning agent’s spatial configuration from sensorimotor invariants, Robotics and Autonomous Systems, Volume 71, September 2015, Pages 49-59, ISSN 0921-8890, DOI: 10.1016/j.robot.2015.01.003.

The design of robotic systems is largely dictated by our purely human intuition about how we perceive the world. This intuition has been proven incorrect with regard to a number of critical issues, such as visual change blindness. In order to develop truly autonomous robots, we must step away from this intuition and let robotic agents develop their own way of perceiving. The robot should start from scratch and gradually develop perceptual notions, under no prior assumptions, exclusively by looking into its sensorimotor experience and identifying repetitive patterns and invariants. One of the most fundamental perceptual notions, space, cannot be an exception to this requirement. In this paper we look into the prerequisites for the emergence of simplified spatial notions on the basis of a robot’s sensorimotor flow. We show that the notion of space as environment-independent cannot be deduced solely from exteroceptive information, which is highly variable and is mainly determined by the contents of the environment. The environment-independent definition of space can be approached by looking into the functions that link the motor commands to changes in exteroceptive inputs. In a sufficiently rich environment, the kernels of these functions correspond uniquely to the spatial configuration of the agent’s exteroceptors. We simulate a redundant robotic arm with a retina installed at its end-point and show how this agent can learn the configuration space of its retina. The resulting manifold has the topology of the Cartesian product of a plane and a circle, and corresponds to the planar position and orientation of the retina.

A new approach to solve POMDP-like problems through gradient descent and optimal control

Vadim Indelman, Luca Carlone, Frank Dellaert, Planning in the continuous domain: A generalized belief space approach for autonomous navigation in unknown environments, The International Journal of Robotics Research, vol. 34 no. 7, pp. 849-882, DOI: 10.1177/0278364914561102.

We investigate the problem of planning under uncertainty, with application to mobile robotics. We propose a probabilistic framework in which the robot bases its decisions on the generalized belief, which is a probabilistic description of its own state and of external variables of interest. The approach naturally leads to a dual-layer architecture: an inner estimation layer, which performs inference to predict the outcome of possible decisions; and an outer decisional layer which is in charge of deciding the best action to undertake. Decision making is entrusted to a model predictive control (MPC) scheme. The formulation is valid for general cost functions and does not discretize the state or control space, enabling planning in continuous domain. Moreover, it allows to relax the assumption of maximum likelihood observations: predicted measurements are treated as random variables, and binary random variables are used to model the event that a measurement is actually taken by the robot. We successfully apply our approach to the problem of uncertainty-constrained exploration, in which the robot has to perform tasks in an unknown environment, while maintaining localization uncertainty within given bounds. We present an extensive numerical analysis of the proposed approach and compare it against related work. In practice, our planning approach produces smooth and natural trajectories and is able to impose soft upper bounds on the uncertainty. Finally, we exploit the results of this analysis to identify current limitations and show that the proposed framework can accommodate several desirable extensions.

Neural support for the cognitive map: place cells and grid cells

Kate J. Jeffery, Distorting the metric fabric of the cognitive map, Trends in Cognitive Sciences, Volume 19, Issue 6, June 2015, Pages 300-301, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.04.001..

Grid cells are neurons whose regularly spaced firing fields form apparently symmetric arrays, or grids, that are thought to collectively provide an environment-independent metric framework for the brain’s cognitive map of space. However, two recent studies show that grids are naturally distorted, revealing greater local environment-specific effects than previously recognized.

A quick, formal explanation of the PageRank algorithm and its existing variants

Lei, J.; Chen, H., Distributed Randomized PageRank Algorithm Based on Stochastic Approximation, Automatic Control, IEEE Transactions on , vol.60, no.6, pp.1641,1646, June 2015. DOI: 10.1109/TAC.2014.2359311.

A distributed randomized PageRank algorithm based on stochastic approximation (SA) is proposed to estimate the importance scores of web pages. Compared with the existing methods, the algorithm given here has wider applications in the sense that it can deal with a larger class of randomizations. The strong consistency of the estimates is proved, and the robustness of the PageRank value is analyzed as well. Numerical examples are given to verify the obtained theoretic results.