Heuristic, real-time search with weighted heuristic function

Nicolás Rivera, Jorge A. Baier, Carlos Hernández, Incorporating weights into real-time heuristic search, Artificial Intelligence, Volume 225, August 2015, Pages 1-23, ISSN 0004-3702, DOI: 10.1016/j.artint.2015.03.008.

Multiplying the heuristic function by a weight greater than one is a well-known technique in heuristic search. When this technique is applied to A* with an admissible heuristic it yields substantial runtime savings, at the expense of sacrificing solution optimality. Its applicability to real-time heuristic search, a search approach that builds upon heuristic search, however, has only been explored by a few studies. In this article we present two new approaches to using weights in real-time heuristic search, applicable to a wide range of algorithms. The first one, weighted lookahead, is a variant of an existing approach by Shimbo and Ishida, and utilizes the weight while the algorithm performs lookahead search. The second one, weighted update, incorporates the weight to the edges of the search graph during the learning phase. We implemented both techniques within LSS-LRTA* and evaluated them in path-planning benchmarks. We show that weighted lookahead outperforms an existing approach by Shimbo and Ishida but that it does not improve over existing approaches that do not use weights. Weighted update, on the other hand, yields performance improvements of up to one order of magnitude both in solution cost and total search time. To illustrate further the generality of weighted update, we incorporate the technique in two other well-known real-time heuristic search algorithms: LRTA*-LS and daLSS-LRTA*, and we empirically show significant improvements for LRTA*-LS and modest but still important improvements for daLSS-LRTA*. We analyze the properties of weighted update in depth, showing, among other things, that it guarantees termination. Convergence behavior of LSS-LRTA*, modified to use weighted update, is also analyzed. In such a setting, we prove solutions are w-optimal, and provide additional bounds on solution quality that in practice are tighter than w-optimality.

Reinforcement learning applied to select which parts of a Neural Turing Machine are to be updated with backpropagation during learning

Wojciech Zaremba, Ilya Sutskever, Reinforcement Learning Neural Turing Machines, arXiv.org, arXiv:1505.00521.

The expressive power of a machine learning model is closely related to the number of sequential computational steps it can learn. For example, Deep Neural Networks have been more successful than shallow networks because they can perform a greater number of sequential computational steps (each highly parallel). The Neural Turing Machine (NTM) is a model that can compactly express an even greater number of sequential computational steps, so it is even more powerful than a DNN. Its memory addressing operations are designed to be differentiable; thus the NTM can be trained with backpropagation.
While differentiable memory is relatively easy to implement and train, it necessitates accessing the entire memory content at each computational step. This makes it difficult to implement a fast NTM. In this work, we use the Reinforce algorithm to learn where to access the memory, while using backpropagation to learn what to write to the memory. We call this model the RL-NTM. Reinforce allows our model to access a constant number of memory cells at each computational step, so its implementation can be faster. The RL-NTM is the first model that can, in principle, learn programs of unbounded running time. We successfully trained the RL-NTM to solve a number of algorithmic tasks that are simpler than the ones solvable by the fully differentiable NTM.
As the RL-NTM is a fairly intricate model, we needed a method for verifying the correctness of our implementation. To do so, we developed a simple technique for numerically checking arbitrary implementations of models that use Reinforce, which may be of independent interest.

A bayesian framework to explain magnitude estimation in the human mind

Frederike H. Petzschner, Stefan Glasauer, Klaas E. Stephan, A Bayesian perspective on magnitude estimation, Trends in Cognitive Sciences, Volume 19, Issue 5, May 2015, Pages 285-293, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.03.002.

Our representation of the physical world requires judgments of magnitudes, such as loudness, distance, or time. Interestingly, magnitude estimates are often not veridical but subject to characteristic biases. These biases are strikingly similar across different sensory modalities, suggesting common processing mechanisms that are shared by different sensory systems. However, the search for universal neurobiological principles of magnitude judgments requires guidance by formal theories. Here, we discuss a unifying Bayesian framework for understanding biases in magnitude estimation. This Bayesian perspective enables a re-interpretation of a range of established psychophysical findings, reconciles seemingly incompatible classical views on magnitude estimation, and can guide future investigations of magnitude estimation and its neurobiological mechanisms in health and in psychiatric diseases, such as schizophrenia.

Reinforcement learning to explain emotions

Joost Broekensa, Elmer Jacobsa & Catholijn M. Jonker, A reinforcement learning model of joy, distress, hope and fear, Connection Science, DOI: 10.1080/09540091.2015.1031081.

In this paper we computationally study the relation between adaptive behaviour and emotion. Using the reinforcement learning framework, we propose that learned state utility, V(s), models fear (negative) and hope (positive) based on the fact that both signals are about anticipation of loss or gain. Further, we propose that joy/distress is a signal similar to the error signal. We present agent-based simulation experiments that show that this model replicates psychological and behavioural dynamics of emotion. This work distinguishes itself by assessing the dynamics of emotion in an adaptive agent framework – coupling it to the literature on habituation, development, extinction and hope theory. Our results support the idea that the function of emotion is to provide a complex feedback signal for an organism to adapt its behaviour. Our work is relevant for understanding the relation between emotion and adaptation in animals, as well as for human–robot interaction, in particular how emotional signals can be used to communicate between adaptive agents and humans.

Analysis of the deterioration of several Kalman Filters depending on the amount of uncertainty in the observations, when the observation model is non-linear

Mark R. Morelande and Ángel F. García-Fernández, Analysis of Kalman Filter Approximations for Nonlinear Measurements, IEEE Transactions on signal processing, vol. 61, no. 22, 2013 DOI: 10.1109/TSP.2013.2279367.

A theoretical analysis is presented of the correction step of the Kalman filter (KF) and its various approximations for the case of a nonlinear measurement equation with additive Gaussian noise. The KF is based on a Gaussian app roximation to the joint density of the state and the measurement. The analysis metric is the Kullback-Leibler divergence of this approximation from the true joint density. The purpose of the analysis is to provide a quantitative tool for understanding and assessing the performance of the KF and its variants in nonlinear scenarios. This is illustrated using a numerical example.

Application of reinforcement learning to the defense against attacks on communication networks

Kleanthis Malialisa, Sam Devlina & Daniel Kudenkoa, Distributed reinforcement learning for adaptive and robust network intrusion response, Connection Science, DOI: 0.1080/09540091.2015.1031082.

Distributed denial of service (DDoS) attacks constitute a rapidly evolving threat in the current Internet. Multiagent Router Throttling is a novel approach to defend against DDoS attacks where multiple reinforcement learning agents are installed on a set of routers and learn to rate-limit or throttle traffic towards a victim server. The focus of this paper is on online learning and scalability. We propose an approach that incorporates task decomposition, team rewards and a form of reward shaping called difference rewards. One of the novel characteristics of the proposed system is that it provides a decentralised coordinated response to the DDoS problem, thus being resilient to DDoS attacks themselves. The proposed system learns remarkably fast, thus being suitable for online learning. Furthermore, its scalability is successfully demonstrated in experiments involving 1000 learning agents. We compare our approach against a baseline and a popular state-of-the-art throttling technique from the network security literature and show that the proposed approach is more effective, adaptive to sophisticated attack rate dynamics and robust to agent failures.

Example of both bottom-up and top-down processes that are integrated in a solution for the recognition of shapes

Ching L. Teo, Cornelia Fermüller, and Yiannis Aloimonos, A Gestaltist approach to contour-based object recognition: Combining bottom-up and top-down cues, The International Journal of Robotics Research April 2015 34: 627-652, first published on March 25, 2015, DOI: 10.1177/0278364914558493.

This paper proposes a method for detecting generic classes of objects from their representative contours that can be used by a robot with vision to find objects in cluttered environments. The approach uses a mid-level image operator to group edges into contours which likely correspond to object boundaries. This mid-level operator is used in two ways, bottom-up on simple edges and top-down incorporating object shape information, thus acting as the intermediary between low-level and high-level information. First, the mid-level operator, called the image torque, is applied to simple edges to extract likely fixation locations of objects. Using the operator’s output, a novel contour-based descriptor is created that extends the shape context descriptor to include boundary ownership information and accounts for rotation. This descriptor is then used in a multi-scale matching approach to modulate the torque operator towards the target, so it indicates its location and size. Unlike other approaches that use edges directly to guide the independent edge grouping and matching processes for recognition, both of these steps are effectively combined using the proposed method. We evaluate the performance of our approach using four diverse datasets containing a variety of object categories in clutter, occlusion and viewpoint changes. Compared with current state-of-the-art approaches, our approach is able to detect the target with fewer false alarms in most object categories. The performance is further improved when we exploit depth information available from the Kinect RGB-Depth sensor by imposing depth consistency when applying the image torque.

Robot kidnapping detection based on support vector machines

Dylan Campbell, Mark Whitty, Metric-based detection of robot kidnapping with an SVM classifier, Robotics and Autonomous Systems, Volume 69, July 2015, Pages 40-51, ISSN 0921-8890, DOI: 10.1016/j.robot.2014.08.004.

Kidnapping occurs when a robot is unaware that it has not correctly ascertained its position, potentially causing severe map deformation and reducing the robot’s functionality. This paper presents metric-based techniques for real-time kidnap detection, utilising either linear or SVM classifiers to identify all kidnapping events during the autonomous operation of a mobile robot. In contrast, existing techniques either solve specific cases of kidnapping, such as elevator motion, without addressing the general case or remove dependence on local pose estimation entirely, an inefficient and computationally expensive approach. Three metrics that measured the quality of a pose estimate were evaluated and a joint classifier was constructed by combining the most discriminative quality metric with a fourth metric that measured the discrepancy between two independent pose estimates. A multi-class Support Vector Machine classifier was also trained using all four metrics and produced better classification results than the simpler joint classifier, at the cost of requiring a larger training dataset. While metrics specific to 3D point clouds were used, the approach can be generalised to other forms of data, including visual, provided that two independent ways of estimating pose are available.

A nice SLAM approach based on hybrid Normal Distribution Transform (NDT) + occupancy grid maps intended for long term operation in dynamic environments

Erik Einhorn, Horst-Michael Gross, Generic NDT mapping in dynamic environments and its application for lifelong SLAM, Robotics and Autonomous Systems, Volume 69, July 2015, Pages 28-39, ISSN 0921-8890, DOI: 10.1016/j.robot.2014.08.008.

In this paper, we present a new, generic approach for Simultaneous Localization and Mapping (SLAM). First of all, we propose an abstraction of the underlying sensor data using Normal Distribution Transform (NDT) maps that are suitable for making our approach independent from the used sensor and the dimension of the generated maps. We present several modifications for the original NDT mapping to handle free-space measurements explicitly. We additionally describe a method to detect and handle dynamic objects such as moving persons. This enables the usage of the proposed approach in highly dynamic environments. In the second part of this paper we describe our graph-based SLAM approach that is designed for lifelong usage. Therefore, the memory and computational complexity is limited by pruning the pose graph in an appropriate way.

Study of the explanation of probability and reasoning in the human mind through mental models, probability logic and classical logic

P.N. Johnson-Laird, Sangeet S. Khemlani, Geoffrey P. Goodwin, Logic, probability, and human reasoning, Trends in Cognitive Sciences, Volume 19, Issue 4, April 2015, Pages 201-214, ISSN 1364-6613, DOI: 10.1016/j.tics.2015.02.006.

This review addresses the long-standing puzzle of how logic and probability fit together in human reasoning. Many cognitive scientists argue that conventional logic cannot underlie deductions, because it never requires valid conclusions to be withdrawn – not even if they are false; it treats conditional assertions implausibly; and it yields many vapid, although valid, conclusions. A new paradigm of probability logic allows conclusions to be withdrawn and treats conditionals more plausibly, although it does not address the problem of vapidity. The theory of mental models solves all of these problems. It explains how people reason about probabilities and postulates that the machinery for reasoning is itself probabilistic. Recent investigations accordingly suggest a way to integrate probability and deduction.