Author Archives: Juan-antonio Fernández-madrigal

How hierarchical reinforcement learning resembles human creativity, i.e., matching the psychological aspects with the engineering ones

Thomas R. Colin, Tony Belpaeme, Angelo Cangelosi, Nikolas Hemion, Hierarchical reinforcement learning as creative problem solving, Robotics and Autonomous Systems, Volume 86, 2016, Pages 196-206, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.08.021.

Although creativity is studied from philosophy to cognitive robotics, a definition has proven elusive. We argue for emphasizing the creative process (the cognition of the creative agent), rather than the creative product (the artifact or behavior). Owing to developments in experimental psychology, the process approach has become an increasingly attractive way of characterizing creative problem solving. In particular, the phenomenon of insight, in which an individual arrives at a solution through a sudden change in perspective, is a crucial component of the process of creativity. These developments resonate with advances in machine learning, in particular hierarchical and modular approaches, as the field of artificial intelligence aims for general solutions to problems that typically rely on creativity in humans or other animals. We draw a parallel between the properties of insight according to psychology and the properties of Hierarchical Reinforcement Learning (HRL) systems for embodied agents. Using the Creative Systems Framework developed by Wiggins and Ritchie, we analyze both insight and HRL, establishing that they are creative in similar ways. We highlight the key challenges to be met in order to call an artificial system “insightful”.

Survey and taxonomy of path planning algorithms

Thi Thoa Mac, Cosmin Copot, Duc Trung Tran, Robin De Keyser, Heuristic approaches in robot path planning: A survey, Robotics and Autonomous Systems, Volume 86, 2016, Pages 13-28, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.08.001.

Autonomous navigation of a robot is a promising research domain due to its extensive applications. The navigation consists of four essential requirements known as perception, localization, cognition and path planning, and motion control in which path planning is the most important and interesting part. The proposed path planning techniques are classified into two main categories: classical methods and heuristic methods. The classical methods consist of cell decomposition, potential field method, subgoal network and road map. The approaches are simple; however, they commonly consume expensive computation and may possibly fail when the robot confronts with uncertainty. This survey concentrates on heuristic-based algorithms in robot path planning which are comprised of neural network, fuzzy logic, nature-inspired algorithms and hybrid algorithms. In addition, potential field method is also considered due to the good results. The strengths and drawbacks of each algorithm are discussed and future outline is provided.

Interesting mixture of automated planning with reinforcement learning

Matteo Leonetti, Luca Iocchi, Peter Stone, A synthesis of automated planning and reinforcement learning for efficient, robust decision-making, Artificial Intelligence, Volume 241, 2016, Pages 103-130, ISSN 0004-3702, DOI: 10.1016/j.artint.2016.07.004.

Automated planning and reinforcement learning are characterized by complementary views on decision making: the former relies on previous knowledge and computation, while the latter on interaction with the world, and experience. Planning allows robots to carry out different tasks in the same domain, without the need to acquire knowledge about each one of them, but relies strongly on the accuracy of the model. Reinforcement learning, on the other hand, does not require previous knowledge, and allows robots to robustly adapt to the environment, but often necessitates an infeasible amount of experience. We present Domain Approximation for Reinforcement LearnING (DARLING), a method that takes advantage of planning to constrain the behavior of the agent to reasonable choices, and of reinforcement learning to adapt to the environment, and increase the reliability of the decision making process. We demonstrate the effectiveness of the proposed method on a service robot, carrying out a variety of tasks in an office building. We find that when the robot makes decisions by planning alone on a given model it often fails, and when it makes decisions by reinforcement learning alone it often cannot complete its tasks in a reasonable amount of time. When employing DARLING, even when seeded with the same model that was used for planning alone, however, the robot can quickly learn a behavior to carry out all the tasks, improves over time, and adapts to the environment as it changes.

Reducing error in time synchronization for multisensor arrangements in aerial applications, with interesting formulae for the clock drift of IMUs

J. Li, L. Jia and G. Liu, “Multisensor Time Synchronization Error Modeling and Compensation Method for Distributed POS,” in IEEE Transactions on Instrumentation and Measurement, vol. 65, no. 11, pp. 2637-2645, Nov. 2016. DOI: 10.1109/TIM.2016.2598020.

An airborne distributed position and orientation system (POS) is high-precision measurement equipment that can accurately provide multinode time-spatial reference for novel remote sensing system as multitask imaging sensors and array antenna synthetic aperture radar. However, it is difficult for multisensor to precisely acquire information at the same moment and result in data fusion error. Thus, the measurement precision is severely degraded. To solve the problem, a multisensor time synchronization error modeling and compensation method is proposed. Based on the component and operation principles of distributed POS, the time synchronization mechanism is analyzed. Multisensor time synchronization error models that include time delay error, random error, and time-varying error are established. A time synchronization error compensation method of the distributed POS is proposed. The experiment results show that the proposed method can accurately calibrate and compensate for the time synchronization error, and improve the measurement precision of the distributed POS. It verified the validity of the proposed method.

Globally optimal ICP

J. Yang, H. Li, D. Campbell and Y. Jia, “Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp. 2241-2254, Nov. 1 2016. DOI: 10.1109/TPAMI.2015.2513405.

The Iterative Closest Point (ICP) algorithm is one of the most widely used methods for point-set registration. However, being based on local iterative optimization, ICP is known to be susceptible to local minima. Its performance critically relies on the quality of the initialization and only local optimality is guaranteed. This paper presents the first globally optimal algorithm, named Go-ICP, for Euclidean (rigid) registration of two 3D point-sets under the $L_2$ error metric defined in ICP. The Go-ICP method is based on a branch-and-bound scheme that searches the entire 3D motion space $SE(3)$ . By exploiting the special structure of $SE(3)$ geometry, we derive novel upper and lower bounds for the registration error function. Local ICP is integrated into the BnB scheme, which speeds up the new method while guaranteeing global optimality. We also discuss extensions, addressing the issue of outlier robustness. The evaluation demonstrates that the proposed method is able to produce reliable registration results regardless of the initialization. Go-ICP can be applied in scenarios where an optimal solution is desirable or where a good initialization is not always available.

Learning from demonstration through inverse reinforcement learning enhaced with neural network for generalizing demonstrations and improve visiting of states

Chen Xia, Abdelkader El Kamel, Neural inverse reinforcement learning in autonomous navigation, Robotics and Autonomous Systems, Volume 84, 2016, Pages 1-14, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.06.003.

Designing intelligent and robust autonomous navigation systems remains a great challenge in mobile robotics. Inverse reinforcement learning (IRL) offers an efficient learning technique from expert demonstrations to teach robots how to perform specific tasks without manually specifying the reward function. Most of existing IRL algorithms assume the expert policy to be optimal and deterministic, and are applied to experiments with relatively small-size state spaces. However, in autonomous navigation tasks, the state spaces are frequently large and demonstrations can hardly visit all the states. Meanwhile the expert policy may be non-optimal and stochastic. In this paper, we focus on IRL with large-scale and high-dimensional state spaces by introducing the neural network to generalize the expert’s behaviors to unvisited regions of the state space and an explicit policy representation is easily expressed by neural network, even for the stochastic expert policy. An efficient and convenient algorithm, Neural Inverse Reinforcement Learning (NIRL), is proposed. Experimental results on simulated autonomous navigation tasks show that a mobile robot using our approach can successfully navigate to the target position without colliding with unpredicted obstacles, largely reduce the learning time, and has a good generalization performance on undemonstrated states. Hence prove the robot intelligence of autonomous navigation transplanted from limited demonstrations to completely unknown tasks.

Estimating the execution time of programs before compiling

Peter Altenbernd, Jan Gustafsson, Björn Lisper, Friedhelm Stappert, Early execution time-estimation through automatically generated timing models,, Real-Time Systems, November 2016, Volume 52, Issue 6, pp 731–760, DOI: 10.1007/s11241-016-9250-7.

Traditional timing analysis, such as worst-case execution time analysis, is normally applied only in the late stages of embedded system software development, when the hardware is available and the code is compiled and linked. However, preliminary timing estimates are often needed in early stages of system development as an essential prerequisite for the configuration of the hardware setup and dimensioning of the system. During this phase the hardware is often not available, and the code might not be ready to link. This article describes an approach to predict the execution time of software through an early, source-level timing analysis. A timing model for source code is automatically derived from a given combination of hardware architecture and compiler. The model is identified from measured execution times for a set of synthetic training programs, compiled for the hardware platform in question. It can be used to estimate the execution time for code running on the platform: the estimation is then done directly from the source code, without compiling and running it. Our experiments show that, using this model, we can predict the execution times of the final, compiled code surprisingly well. For instance, we achieve an average deviation of 8 % for a set of benchmark programs for the ARM7 architecture.

Survey of Cognitive Offloading

Evan F. Risko, Sam J. Gilbert, Cognitive Offloading, Trends in Cognitive Sciences, Volume 20, Issue 9, 2016, Pages 676-688, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.07.002.

If you have ever tilted your head to perceive a rotated image, or programmed a smartphone to remind you of an upcoming appointment, you have engaged in cognitive offloading: the use of physical action to alter the information processing requirements of a task so as to reduce cognitive demand. Despite the ubiquity of this type of behavior, it has only recently become the target of systematic investigation in and of itself. We review research from several domains that focuses on two main questions: (i) what mechanisms trigger cognitive offloading, and (ii) what are the cognitive consequences of this behavior? We offer a novel metacognitive framework that integrates results from diverse domains and suggests avenues for future research.

Mapping (and navitating in) outdoor unstructured environments with low-cost and few sensors, using relations between landmarks instead of absolute or metrical positions

Mark McClelland, Mark Campbell, Tara Estlin, Qualitative relational mapping and navigation for planetary rovers, Robotics and Autonomous Systems, Volume 83, 2016, Pages 73-86, ISSN 0921-8890, DOI: j.robot.2016.05.017.

This paper presents a novel method for qualitative mapping of large scale spaces which decouples the mapping problem from that of position estimation. The proposed framework makes use of a graphical representation of the world in order to build a map consisting of qualitative constraints on the geometric relationships between landmark triplets. This process allows a mobile robot to extract information about landmark positions using a set of minimal sensors in the absence of GPS. A novel measurement method based on camera imagery is presented which extends previous work from the field of Qualitative Spatial Reasoning. A Branch-and-Bound approach is taken to solve a set of non-convex feasibility problems required for generating off-line operator lookup tables and on-line measurements, which are fused into the map using an iterative graph update. A navigation approach for travel between distant landmarks is developed, using estimates of the Relative Neighborhood Graph extracted from the qualitative map in order to generate a sequence of landmark objectives based on proximity. Average and asymptotic performance of the mapping algorithm is evaluated using Monte Carlo tests on randomly generated maps, and a data-driven simulation is presented for a robot traversing the Jet Propulsion Laboratory Mars Yard while building a relational map. These results demonstrate that the system can be effectively used to build a map sufficiently complete and accurate for long-distance navigation as well as other applications.

Sample-based approximation to POMDPs integrated with forward simulation for robot active exploration, with a nice related work about active exploration in robotics

Mikko Lauri, Risto Ritala, Planning for robotic exploration based on forward simulation, Robotics and Autonomous Systems, Volume 83, 2016, Pages 15-31, ISSN 0921-8890, DOI: 10.1016/j.robot.2016.06.008.

We address the problem of controlling a mobile robot to explore a partially known environment. The robot’s objective is the maximization of the amount of information collected about the environment. We formulate the problem as a partially observable Markov decision process (POMDP) with an information-theoretic objective function, and solve it applying forward simulation algorithms with an open-loop approximation. We present a new sample-based approximation for mutual information useful in mobile robotics. The approximation can be seamlessly integrated with forward simulation planning algorithms. We investigate the usefulness of POMDP based planning for exploration, and to alleviate some of its weaknesses propose a combination with frontier based exploration. Experimental results in simulated and real environments show that, depending on the environment, applying POMDP based planning for exploration can improve performance over frontier exploration.