Author Archives: Juan-antonio Fernández-madrigal

On how value of actions (in the RL sense) can be coded in the brain

Rory J. Bufacchi, Gian Domenico Iannetti, The Value of Actions, in Time and Space, Trends in Cognitive Sciences, Volume 23, Issue 4, 2019, Pages 270-271, DOI: 10.1016/j.tics.2019.01.011.

This value-output function can be a neural network, in which case the assumptions about the future are stored in the precise network configuration. The values that such a network outputs, or at least the intermediate steps necessary for calculating the final values, are the ‘action relevances’ we mention in our original paper (in the case of the brain, the inputs to such a value-calculating network should be state estimators, which likely include activity coming from the ventral stream, frontal areas, and limbic regions [3]). Our claim was thus that PPS-related measures reflect the instantaneous value of particular types of actions, and not that PPS measures explicitly reflect the value of any possible action at any given time (i.e., for any possible state): PPS measures reflect the instantaneous output of a function rather than the infinite array of values that the output of this function could take. We might have contributed to this misunderstanding when claiming that a field is ‘a quantity that has a magnitude for each point in space and time’. We should have clarified that the magnitude of a PPS measure can be seen as a specific sample from a field in the here and now rather than as a database containing all possible field values.

Improving Q-learning by initialization of the Q matrix and a nice related work of that approach

Ee Soong Low, Pauline Ong, Kah Chun Cheah, Solving the optimal path planning of a mobile robot using improved Q-learning, Robotics and Autonomous Systems, Volume 115, 2019, Pages 143-161, DOI: 10.1016/j.robot.2019.02.013.

Q-learning, a type of reinforcement learning, has gained increasing popularity in autonomous mobile robot path planning recently, due to its self-learning ability without requiring a priori model of the environment. Yet, despite such advantage, Q-learning exhibits slow convergence to the optimal solution. In order to address this limitation, the concept of partially guided Q-learning is introduced wherein, the flower pollination algorithm (FPA) is utilized to improve the initialization of Q-learning. Experimental evaluation of the proposed improved Q-learning under the challenging environment with a different layout of obstacles shows that the convergence of Q-learning can be accelerated when Q-values are initialized appropriately using the FPA. Additionally, the effectiveness of the proposed algorithm is validated in a real-world experiment using a three-wheeled mobile robot.

A MATLAB toolbox for controlling and programming KUKA robots and a list of robotics toolboxes

M. Safeea and P. Neto, KUKA Sunrise Toolbox: Interfacing Collaborative Robots With MATLAB, IEEE Robotics & Automation Magazine, vol. 26, no. 1, pp. 91-96, 2019 DOI: 10.1109/MRA.2018.2877776.

Collaborative robots are increasingly present in our lives. The KUKA LBR iiwa, equipped with the KUKA Sunrise.OS controller, is one example of a collaborative/sensitive robot. This tutorial presents the KUKA Sunrise Toolbox (KST), a MATLAB toolbox that interfaces with KUKA Sunrise.OS. KST contains functionalities for networking, soft control in real time, point-to-point motion, parameter setters/getters, general purpose, and physical interaction. It includes approximately 100 functions and runs on a remote computer connected with the KUKA Sunrise controller via Transmission Control Protocol/Internet Protocol (TCP/IP). The potentialities of the KST are demonstrated in nine application examples.

A new pose-graph optimization algorithm for SLAM and other problems whose, through a formulation as global optimization in SE(3), results are certifiable and more robust than standard approaches, and a curious relation between this problem and the clock synchronization problem

Rosen, D. M., Carlone, L., Bandeira, A. S., & Leonard, J. J., SE-Sync: A certifiably correct algorithm for synchronization over the special Euclidean group, The International Journal of Robotics Research, 38(2–3), 95–125, 2019 DOI: 10.1177/0278364918784361.

Many important geometric estimation problems naturally take the form of synchronization over the special Euclidean group: estimate the values of a set of unknown group elements x1,…,xn∈SE(d) given noisy measurements of a subset of their pairwise relative transforms x−1ixj. Examples of this class include the foundational problems of pose-graph simultaneous localization and mapping (SLAM) (in robotics), camera motion estimation (in computer vision), and sensor network localization (in distributed sensing), among others. This inference problem is typically formulated as a non-convex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of the special Euclidean synchronization problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation (MLE) whose minimizer provides an exact maximum-likelihood estimate so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this semidefinite relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem defined on a low-dimensional Riemannian manifold, and then design a Riemannian truncated-Newton trust-region method to solve this reduction efficiently. Finally, we combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is capable of recovering certifiably globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics and computer vision applications, and does so significantly faster than the Gauss–Newton-based approach that forms the basis of current state-of-the-art techniques.

Improving on-line Monte Carlo POMDP (DESTOP in particular) in discrete spaces through the use of importance sampling, and a nice summary of the problem and of current on-line POMDP approaches

Luo, Y., Bai, H., Hsu, D., & Lee, W. S., Importance sampling for online planning under uncertainty, The International Journal of Robotics Research, 38(2–3), 162–181, 2019 DOI: 10.1177/0278364918780322.

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.

On the use of ubiquous supercomputing for robotics

Leonardo Camargo-Forero, Pablo Royo, Xavier Prats, The ARCHADE: Ubiquitous Supercomputing for robotics. Part I: Philosophy, Robotics and Autonomous Systems, Volume 114, 2019, Pages 187-198 DOI: 10.1016/j.robot.2019.01.006.

In this work, we introduce Ubiquitous Supercomputing for robotics with the objective of opening our imagination to the development of new powerful heterogeneous multi-robot systems able to perform all kind of missions. Supercomputing, also known as High Performance computing (HPC) is the tool that allows us to predict the weather, understand the origins of the universe, create incredibly realistic fantasy movies, send personalized advertisement to millions of users worldwide and much more. Robotics has been mostly absent in its use of HPC but some previous works have lightly flirted with it. With the findings presented in here, we propose a ubiquitous supercomputing ontology, which allows describing systems made up of robots, traditional HPC infrastructures, sensors, actuators and people and exhibiting scalability, user-transparency and ultimately higher computing efficiency. Moreover, we present a technology called The ARCHADE, which facilitates the development, implementation and operation of such systems, and we propose a mechanism to define and automatize missions carried out by ubiquitous supercomputing systems. As a proof of concept, we present a system depicted as Tigers VS Hunters, which illustrates the potential of this technology. The results presented in here are part of a two series work introducing The ARCHADE. This first delivery presents its philosophy and main features. Correspondingly the second part will present a set of use cases and a complete performance benchmark. Supercomputing is part of our lives and it can be found in many research and industrial endeavors. With the ubiquitous supercomputing ontology and The ARCHADE, supercomputing will become part of robotics as well, bringing it therefore everywhere.

RL and Inverse RL based on MDPs for autonomous vehicles, and a nice historical review of the topic of a.v.

Changxi You, Jianbo Lu, Dimitar Filev, Panagiotis Tsiotras, Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning, Robotics and Autonomous Systems, Volume 114, 2019, Pages 1-18 DOI: 10.1016/j.robot.2019.01.003.

Autonomous vehicles promise to improve traffic safety while, at the same time, increase fuel efficiency and reduce congestion. They represent the main trend in future intelligent transportation systems. This paper concentrates on the planning problem of autonomous vehicles in traffic. We model the interaction between the autonomous vehicle and the environment as a stochastic Markov decision process (MDP) and consider the driving style of an expert driver as the target to be learned. The road geometry is taken into consideration in the MDP model in order to incorporate more diverse driving styles. The desired, expert-like driving behavior of the autonomous vehicle is obtained as follows: First, we design the reward function of the corresponding MDP and determine the optimal driving strategy for the autonomous vehicle using reinforcement learning techniques. Second, we collect a number of demonstrations from an expert driver and learn the optimal driving strategy based on data using inverse reinforcement learning. The unknown reward function of the expert driver is approximated using a deep neural-network (DNN). We clarify and validate the application of the maximum entropy principle (MEP) to learn the DNN reward function, and provide the necessary derivations for using the maximum entropy principle to learn a parameterized feature (reward) function. Simulated results demonstrate the desired driving behaviors of an autonomous vehicle using both the reinforcement learning and inverse reinforcement learning techniques.

Comparison of map-matching methods

Héber Sobreira, Carlos M. Costa, Ivo Sousa, Luis Rocha, José Lima, P. C. M. A. Farias, Paulo Costa, A. Paulo Moreira, Map-Matching Algorithms for Robot Self-Localization: A Comparison Between Perfect Match, Iterative Closest Point and Normal Distributions Transform, Journal of Intelligent & Robotic Systems, March 2019, Volume 93, Issue 3–4, pp 533–546 DOI: 10.1007/s10846-017-0765-5.

The self-localization of mobile robots in the environment is one of the most fundamental problems in the robotics navigation field. It is a complex and challenging problem due to the high requirements of autonomous mobile vehicles, particularly with regard to the algorithms accuracy, robustness and computational efficiency. In this paper, we present a comparison of three of the most used map-matching algorithms applied in localization based on natural landmarks: our implementation of the Perfect Match (PM) and the Point Cloud Library (PCL) implementation of the Iterative Closest Point (ICP) and the Normal Distribution Transform (NDT). For the purpose of this comparison we have considered a set of representative metrics, such as pose estimation accuracy, computational efficiency, convergence speed, maximum admissible initialization error and robustness to the presence of outliers in the robots sensors data. The test results were retrieved using our ROS natural landmark public dataset, containing several tests with simulated and real sensor data. The performance and robustness of the Perfect Match is highlighted throughout this article and is of paramount importance for real-time embedded systems with limited computing power that require accurate pose estimation and fast reaction times for high speed navigation. Moreover, we added to PCL a new algorithm for performing correspondence estimation using lookup tables that was inspired by the PM approach to solve this problem. This new method for computing the closest map point to a given sensor reading proved to be 40 to 60 times faster than the existing k-d tree approach in PCL and allowed the Iterative Closest Point algorithm to perform point cloud registration 5 to 9 times faster.

Models of brain based on artificial neural networks

James C.R. Whittington, Rafal Bogacz, Theories of Error Back-Propagation in the Brain, Trends in Cognitive Sciences, Volume 23, Issue 3, 2019, Pages 235-250 DOI: 10.1016/j.tics.2018.12.005.

This review article summarises recently proposed theories on how neural circuits in the brain could approximate the error back-propagation algorithm used by artificial neural networks. Computational models implementing these theories achieve learning as efficient as artificial neural networks, but they use simple synaptic plasticity rules based on activity of presynaptic and postsynaptic neurons. The models have similarities, such as including both feedforward and feedback connections, allowing information about error to propagate throughout the network. Furthermore, they incorporate experimental evidence on neural connectivity, responses, and plasticity. These models provide insights on how brain networks might be organised such that modification of synaptic weights on multiple levels of cortical hierarchy leads to improved performance on tasks.

Model-based RL for controling a soft manipulator arm

T. G. Thuruthel, E. Falotico, F. Renda and C. Laschi, Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators, IEEE Transactions on Robotics, vol. 35, no. 1, pp. 124-134, Feb. 2019. DOI: 10.1109/TRO.2018.2878318.

Dynamic control of soft robotic manipulators is an open problem yet to be well explored and analyzed. Most of the current applications of soft robotic manipulators utilize static or quasi-dynamic controllers based on kinematic models or linearity in the joint space. However, such approaches are not truly exploiting the rich dynamics of a soft-bodied system. In this paper, we present a model-based policy learning algorithm for closed-loop predictive control of a soft robotic manipulator. The forward dynamic model is represented using a recurrent neural network. The closed-loop policy is derived using trajectory optimization and supervised learning. The approach is verified first on a simulated piecewise constant strain model of a cable driven under-actuated soft manipulator. Furthermore, we experimentally demonstrate on a soft pneumatically actuated manipulator how closed-loop control policies can be derived that can accommodate variable frequency control and unmodeled external loads.