Author Archives: Juan-antonio Fernández-madrigal

Model-based reinforcement learning with a reduced number of basis functions to aproximate the value function, a study of its convergence guarantees, and a nice state of the art on the use of (mdel-based) reinforcement learning for automatic control

Rushikesh Kamalapurkar, Joel A. Rosenfeld, Warren E. Dixon, Efficient model-based reinforcement learning for approximate online optimal control, Automatica, Volume 74, 2016, Pages 247-258, ISSN 0005-1098, DOI: 10.1016/j.automatica.2016.08.004.

An infinite horizon optimal regulation problem is solved online for a deterministic control-affine nonlinear dynamical system using a state following (StaF) kernel method to approximate the value function. Unlike traditional methods that aim to approximate a function over a large compact set, the StaF kernel method aims to approximate a function in a small neighborhood of a state that travels within a compact set. Simulation results demonstrate that stability and approximate optimality of the control system can be achieved with significantly fewer basis functions than may be required for global approximation methods.

A novel particle filter algorithm with an adaptive number of particles, and a curious and interesting table I about the pros and cons of different sensors

T. de J. Mateo Sanguino and F. Ponce Gómez, “Toward Simple Strategy for Optimal Tracking and Localization of Robots With Adaptive Particle Filtering,” in IEEE/ASME Transactions on Mechatronics, vol. 21, no. 6, pp. 2793-2804, Dec. 2016.DOI: 10.1109/TMECH.2016.2531629.

The ability of robotic systems to autonomously understand and/or navigate in uncertain environments is critically dependent on fairly accurate strategies, which are not always optimally achieved due to effectiveness, computational cost, and parameter settings. In this paper, we propose a novel and simple adaptive strategy to increase the efficiency and drastically reduce the computational effort in particle filters (PFs). The purpose of the adaptive approach (dispersion-based adaptive particle filter – DAPF) is to provide higher number of particles during the initial searching state (when the localization presents greater uncertainty) and fewer particles during the subsequent state (when the localization exhibits less uncertainty). With the aim of studying the dynamical PF behavior regarding others and putting the proposed algorithm into practice, we designed a methodology based on different target applications and a Kinect sensor. The various experiments conducted for both color tracking and mobile robot localization problems served to demonstrate that the DAPF algorithm can be further generalized. As a result, the DAPF approach significantly improved the computational performance over two well-known filtering strategies: 1) the classical PF with fixed particle set sizes, and 2) the adaptive technique named Kullback-Leiber distance.

How hierarchical reinforcement learning resembles human creativity, i.e., matching the psychological aspects with the engineering ones

Thomas R. Colin, Tony Belpaeme, Angelo Cangelosi, Nikolas Hemion, Hierarchical reinforcement learning as creative problem solving, Robotics and Autonomous Systems, Volume 86, 2016, Pages 196-206, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.08.021.

Although creativity is studied from philosophy to cognitive robotics, a definition has proven elusive. We argue for emphasizing the creative process (the cognition of the creative agent), rather than the creative product (the artifact or behavior). Owing to developments in experimental psychology, the process approach has become an increasingly attractive way of characterizing creative problem solving. In particular, the phenomenon of insight, in which an individual arrives at a solution through a sudden change in perspective, is a crucial component of the process of creativity. These developments resonate with advances in machine learning, in particular hierarchical and modular approaches, as the field of artificial intelligence aims for general solutions to problems that typically rely on creativity in humans or other animals. We draw a parallel between the properties of insight according to psychology and the properties of Hierarchical Reinforcement Learning (HRL) systems for embodied agents. Using the Creative Systems Framework developed by Wiggins and Ritchie, we analyze both insight and HRL, establishing that they are creative in similar ways. We highlight the key challenges to be met in order to call an artificial system “insightful”.

Survey and taxonomy of path planning algorithms

Thi Thoa Mac, Cosmin Copot, Duc Trung Tran, Robin De Keyser, Heuristic approaches in robot path planning: A survey, Robotics and Autonomous Systems, Volume 86, 2016, Pages 13-28, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.08.001.

Autonomous navigation of a robot is a promising research domain due to its extensive applications. The navigation consists of four essential requirements known as perception, localization, cognition and path planning, and motion control in which path planning is the most important and interesting part. The proposed path planning techniques are classified into two main categories: classical methods and heuristic methods. The classical methods consist of cell decomposition, potential field method, subgoal network and road map. The approaches are simple; however, they commonly consume expensive computation and may possibly fail when the robot confronts with uncertainty. This survey concentrates on heuristic-based algorithms in robot path planning which are comprised of neural network, fuzzy logic, nature-inspired algorithms and hybrid algorithms. In addition, potential field method is also considered due to the good results. The strengths and drawbacks of each algorithm are discussed and future outline is provided.

Interesting mixture of automated planning with reinforcement learning

Matteo Leonetti, Luca Iocchi, Peter Stone, A synthesis of automated planning and reinforcement learning for efficient, robust decision-making, Artificial Intelligence, Volume 241, 2016, Pages 103-130, ISSN 0004-3702, DOI: 10.1016/j.artint.2016.07.004.

Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes.

Reducing error in time synchronization for multisensor arrangements in aerial applications, with interesting formulae for the clock drift of IMUs

J. Li, L. Jia and G. Liu, “Multisensor Time Synchronization Error Modeling and Compensation Method for Distributed POS,” in IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 11, pp. 2637-2645, Nov. 2016. DOI: 10.1109/TIM.2016.2598020.

An airborne distributed position and orientation system (POS) is high-precision measurement equipment that can accurately provide multinode time-spatial reference for novel remote sensing system as multitask imaging sensors and array antenna synthetic aperture radar. However, it is difficult for multisensor to precisely acquire information at the same moment and result in data fusion error. Thus, the measurement precision is severely degraded. To solve the problem, a multisensor time synchronization error modeling and compensation method is proposed. Based on the component and operation principles of distributed POS, the time synchronization mechanism is analyzed. Multisensor time synchronization error models that include time delay error, random error, and time-varying error are established. A time synchronization error compensation method of the distributed POS is proposed. The experiment results show that the proposed method can accurately calibrate and compensate for the time synchronization error, and improve the measurement precision of the distributed POS. It verified the validity of the proposed method.

Globally optimal ICP

J. Yang, H. Li, D. Campbell and Y. Jia, “Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2241-2254, Nov. 1 2016. DOI: 10.1109/TPAMI.2015.2513405.

The Iterative Closest Point (ICP) algorithm is one of the most widely used methods for point-set registration. However, being based on local iterative optimization, ICP is known to be susceptible to local minima. Its performance critically relies on the quality of the initialization and only local optimality is guaranteed. This paper presents the first globally optimal algorithm, named Go-ICP, for Euclidean (rigid) registration of two 3D point-sets under the $L_2$ error metric defined in ICP. The Go-ICP method is based on a branch-and-bound scheme that searches the entire 3D motion space $SE(3)$ . By exploiting the special structure of $SE(3)$ geometry, we derive novel upper and lower bounds for the registration error function. Local ICP is integrated into the BnB scheme, which speeds up the new method while guaranteeing global optimality. We also discuss extensions, addressing the issue of outlier robustness. The evaluation demonstrates that the proposed method is able to produce reliable registration results regardless of the initialization. Go-ICP can be applied in scenarios where an optimal solution is desirable or where a good initialization is not always available.

Learning from demonstration through inverse reinforcement learning enhaced with neural network for generalizing demonstrations and improve visiting of states

Chen Xia, Abdelkader El Kamel, Neural inverse reinforcement learning in autonomous navigation, Robotics and Autonomous Systems, Volume 84, 2016, Pages 1-14, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.06.003.

Designing intelligent and robust autonomous navigation systems remains a great challenge in mobile robotics. Inverse reinforcement learning (IRL) offers an efficient learning technique from expert demonstrations to teach robots how to perform specific tasks without manually specifying the reward function. Most of existing IRL algorithms assume the expert policy to be optimal and deterministic, and are applied to experiments with relatively small-size state spaces. However, in autonomous navigation tasks, the state spaces are frequently large and demonstrations can hardly visit all the states. Meanwhile the expert policy may be non-optimal and stochastic. In this paper, we focus on IRL with large-scale and high-dimensional state spaces by introducing the neural network to generalize the expert’s behaviors to unvisited regions of the state space and an explicit policy representation is easily expressed by neural network, even for the stochastic expert policy. An efficient and convenient algorithm, Neural Inverse Reinforcement Learning (NIRL), is proposed. Experimental results on simulated autonomous navigation tasks show that a mobile robot using our approach can successfully navigate to the target position without colliding with unpredicted obstacles, largely reduce the learning time, and has a good generalization performance on undemonstrated states. Hence prove the robot intelligence of autonomous navigation transplanted from limited demonstrations to completely unknown tasks.

Estimating the execution time of programs before compiling

Peter Altenbernd, Jan Gustafsson, Björn Lisper, Friedhelm Stappert, Early execution time-estimation through automatically generated timing models,, Real-Time Systems, November 2016, Volume 52, Issue 6, pp 731–760, DOI: 10.1007/s11241-016-9250-7.

Traditional timing analysis, such as worst-case execution time analysis, is normally applied only in the late stages of embedded system software development, when the hardware is available and the code is compiled and linked. However, preliminary timing estimates are often needed in early stages of system development as an essential prerequisite for the configuration of the hardware setup and dimensioning of the system. During this phase the hardware is often not available, and the code might not be ready to link. This article describes an approach to predict the execution time of software through an early, source-level timing analysis. A timing model for source code is automatically derived from a given combination of hardware architecture and compiler. The model is identified from measured execution times for a set of synthetic training programs, compiled for the hardware platform in question. It can be used to estimate the execution time for code running on the platform: the estimation is then done directly from the source code, without compiling and running it. Our experiments show that, using this model, we can predict the execution times of the final, compiled code surprisingly well. For instance, we achieve an average deviation of 8 % for a set of benchmark programs for the ARM7 architecture.

Survey of Cognitive Offloading

Evan F. Risko, Sam J. Gilbert, Cognitive Offloading, Trends in Cognitive Sciences, Volume 20, Issue 9, 2016, Pages 676-688, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.07.002.

If you have ever tilted your head to perceive a rotated image, or programmed a smartphone to remind you of an upcoming appointment, you have engaged in cognitive offloading: the use of physical action to alter the information processing requirements of a task so as to reduce cognitive demand. Despite the ubiquity of this type of behavior, it has only recently become the target of systematic investigation in and of itself. We review research from several domains that focuses on two main questions: (i) what mechanisms trigger cognitive offloading, and (ii) what are the cognitive consequences of this behavior? We offer a novel metacognitive framework that integrates results from diverse domains and suggests avenues for future research.