Tag Archives: Deep Neural Networks

A nice analysis of the particularities of deep learning when applied to robotics, where the need to act is principal (unlike in other disciplines such as computer vision)

Niko Sünderhauf, Oliver Brock, Walter Scheirer, Raia Hadsell, Dieter Fox, Jürgen Leitner, Ben Upcroft, Pieter Abbeel, Wolfram Burgard, Michael Milford, and Peter Corke, The limits and potentials of deep learning for robotics, The International Journal of Robotics Research Vol 37, Issue 4-5, pp. 405 – 420, DOI: 10.1177/0278364918770733.

The application of deep learning in robotics leads to very specific problems and research questions that are typically not addressed by the computer vision and machine learning communities. In this paper we discuss a number of robotics-specific learning, reasoning, and embodiment challenges for deep learning. We explain the need for better evaluation metrics, highlight the importance and unique challenges for deep robotic learning in simulation, and explore the spectrum between purely data-driven and model-driven approaches. We hope this paper provides a motivating overview of important research directions to overcome the current limitations, and helps to fulfill the promising potentials of deep learning in robotics.

Some quotes beyond the abstract:

Deep learning systems, e.g. for classification or detection, typically return scores from their softmax layers that are proportional to the system’s confidence, but are not calibrated probabilities, and therefore not useable in a Bayesian sensor fusion framework

If, for example, an object detection system is fooled by data outside of its training data distribution (Goodfellow et al., 2014; Nguyen et al., 2015a), the consequences for a robot acting on false, but high-confidence detections can be catastrophic

As the robot moves in its environment, the camera will observe the scene from different viewpoints, which poses both challenges and opportunities to a robotic vision system […] One of the biggest advantages robotic vision can draw from its embodiment is the potential to control the camera, move it, and change its viewpoint to improve its perception or gather additional information about the scene. This is in stark contrast to most computer vision scenarios […] As an extension of active vision, a robotic system could purposefully manipulate the scene to aid its perception

In his influential 1867 book on physiological optics, Von Helmholtz (1867) formulated the idea that humans use unconscious reasoning, inference or conclusion, when processing visual information. Since then, psychologists have devised various experiments to investigate these unconscious mechanisms (Goldstein and Brockmole, 2016), modernized Helmholtz’s original ideas (Rock, 1983), and reformulated them in the framework of Bayesian inference (Kersten et al., 2004).

The properties of model-based and deep-learned approaches can be measured along multiple dimensions, including the kind of representations used for reasoning, how generally applicable their solutions are, how robust they are in real-world settings, how efficiently they make use of data, and how computationally efficient they are during operation. Model-based approaches often rely on explicit models of objects and their shape, surface, and mass properties, and use these to predict and control motion through time. In deep learning, models are typically implicitly encoded via networks and their parameters. As a consequence, model-based approaches have wide applicability, since the physics underlying them are universal. However, at the same time, the parameters of these models are difficult to estimate from perception, resulting in rather brittle performance operating only in local basins of convergence. Deep learning, on the other hand, enables highly robust performance when trained on sufficiently large data sets that are representative of the operating regime of the system. However, the implicit models learned by current deep learning techniques do not have the general applicability of physics-based reasoning. Model-based approaches are significantly more data efficient, related to their smaller number of parameters. The optimizations required for model-based approaches can be performed efficiently, but the basin of convergence can be rather small. In contrast, deep-learned solutions are often very fast and can have very large basins of convergence. However, they do not perform well if applied in a regime outside the training data

Deep reinforcement learning applied to learn both attention and classification in a task of vehicle classification

D. Zhao, Y. Chen and L. Lv, Deep Reinforcement Learning With Visual Attention for Vehicle Classification, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 4, pp. 356-367, DOI: 10.1109/TCDS.2016.2614675.

Automatic vehicle classification is crucial to intelligent transportation system, especially for vehicle-tracking by police. Due to the complex lighting and image capture conditions, image-based vehicle classification in real-world environments is still a challenging task and the performance is far from being satisfactory. However, owing to the mechanism of visual attention, the human vision system shows remarkable capability compared with the computer vision system, especially in distinguishing nuances processing. Inspired by this mechanism, we propose a convolutional neural network (CNN) model of visual attention for image classification. A visual attention-based image processing module is used to highlight one part of an image and weaken the others, generating a focused image. Then the focused image is input into the CNN to be classified. According to the classification probability distribution, we compute the information entropy to guide a reinforcement learning agent to achieve a better policy for image classification to select the key parts of an image. Systematic experiments on a surveillance-nature dataset which contains images captured by surveillance cameras in the front view, demonstrate that the proposed model is more competitive than the large-scale CNN in vehicle classification tasks.

Using deep learning for extracting features from range data

Y. Liao, S. Kodagoda, Y. Wang, L. Shi and Y. Liu, Place Classification With a Graph Regularized Deep Neural Network, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 4, pp. 304-315, DOI: 10.1109/TCDS.2016.2586183.

Place classification is a fundamental ability that a robot should possess to carry out effective human-robot interactions. In recent years, there is a high exploitation of artificial intelligence algorithms in robotics applications. Inspired by the recent successes of deep learning methods, we propose an end-to-end learning approach for the place classification problem. With deep architectures, this methodology automatically discovers features and contributes in general to higher classification accuracies. The pipeline of our approach is composed of three parts. First, we construct multiple layers of laser range data to represent the environment information in different levels of granularity. Second, each layer of data are fed into a deep neural network for classification, where a graph regularization is imposed to the deep architecture for keeping local consistency between adjacent samples. Finally, the predicted labels obtained from all layers are fused based on confidence trees to maximize the overall confidence. Experimental results validate the effectiveness of our end-to-end place classification framework in which both the multilayer structure and the graph regularization promote the classification performance. Furthermore, results show that the features automatically learned from the raw input range data can achieve competitive results to the features constructed based on statistical and geometrical information.

On how psychologists realize that the brain, after all, may be creating symbols (concepts), like deep neural networks show

Jeffrey S. Bowers, Parallel Distributed Processing Theory in the Age of Deep Networks, Trends in Cognitive Sciences, Volume 21, Issue 12, 2017, Pages 950-961, DOI: 10.1016/j.tics.2017.09.013.

Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory.

First end-to-end implementation of (monocular) Visual Odometry with deep neural networks, including output with the uncertainty of the result

Sen Wang, Ronald Clark, Hongkai Wen, and Niki Trigoni, End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks, The International Journal of Robotics Research Vol 37, Issue 4-5, pp. 513 – 542, DOI: 0.1177/0278364917734298.

This paper studies visual odometry (VO) from the perspective of deep learning. After tremendous efforts in the robotics and computer vision communities over the past few decades, state-of-the-art VO algorithms have demonstrated incredible performance. However, since the VO problem is typically formulated as a pure geometric problem, one of the key features still missing from current VO systems is the capability to automatically gain knowledge and improve performance through learning. In this paper, we investigate whether deep neural networks can be effective and beneficial to the VO problem. An end-to-end, sequence-to-sequence probabilistic visual odometry (ESP-VO) framework is proposed for the monocular VO based on deep recurrent convolutional neural networks. It is trained and deployed in an end-to-end manner, that is, directly inferring poses and uncertainties from a sequence of raw images (video) without adopting any modules from the conventional VO pipeline. It can not only automatically learn effective feature representation encapsulating geometric information through convolutional neural networks, but also implicitly model sequential dynamics and relation for VO using deep recurrent neural networks. Uncertainty is also derived along with the VO estimation without introducing much extra computation. Extensive experiments on several datasets representing driving, flying and walking scenarios show competitive performance of the proposed ESP-VO to the state-of-the-art methods, demonstrating a promising potential of the deep learning technique for VO and verifying that it can be a viable complement to current VO systems.

A formal study of the guarantees that deep neural network offer for classification

R. Giryes, G. Sapiro and A. M. Bronstein, “Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?,” in IEEE Transactions on Signal Processing, vol. 64, no. 13, pp. 3444-3457, July1, 1 2016. DOI: 10.1109/TSP.2016.2546221.

Three important properties of a classification machinery are i) the system preserves the core information of the input data; ii) the training examples convey information about unseen data; and iii) the system is able to treat differently points from different classes. In this paper, we show that these fundamental properties are satisfied by the architecture of deep neural networks. We formally prove that these networks with random Gaussian weights perform a distance-preserving embedding of the data, with a special treatment for in-class and out-of-class data. Similar points at the input of the network are likely to have a similar output. The theoretical analysis of deep networks here presented exploits tools used in the compressed sensing and dictionary learning literature, thereby making a formal connection between these important topics. The derived results allow drawing conclusions on the metric learning properties of the network and their relation to its structure, as well as providing bounds on the required size of the training set such that the training examples would represent faithfully the unseen data. The results are validated with state-of-the-art trained networks.

Application of deep learning and reinforcement learning to an industrial process, with a gentle introduction to both and a clear explanation of the process and decisions made to build the whole control system

Johannes Günther, Patrick M. Pilarski, Gerhard Helfrich, Hao Shen, Klaus Diepold, Intelligent laser welding through representation, prediction, and control learning: An architecture with deep neural networks and reinforcement learning, Mechatronics, Volume 34, March 2016, Pages 1-11, ISSN 0957-4158, DOI: 10.1016/j.mechatronics.2015.09.004.

Laser welding is a widely used but complex industrial process. In this work, we propose the use of an integrated machine intelligence architecture to help address the significant control difficulties that prevent laser welding from seeing its full potential in process engineering and production. This architecture combines three contemporary machine learning techniques to allow a laser welding controller to learn and improve in a self-directed manner. As a first contribution of this work, we show how a deep, auto-encoding neural network is capable of extracting salient, low-dimensional features from real high-dimensional laser welding data. As a second contribution and novel integration step, these features are then used as input to a temporal-difference learning algorithm (in this case a general-value-function learner) to acquire important real-time information about the process of laser welding; temporally extended predictions are used in combination with deep learning to directly map sensor data to the final quality of a welding seam. As a third contribution and final part of our proposed architecture, we suggest that deep learning features and general-value-function predictions can be beneficially combined with actor–critic reinforcement learning to learn context-appropriate control policies to govern welding power in real time. Preliminary control results are demonstrated using multiple runs with a laser-welding simulator. The proposed intelligent laser-welding architecture combines representation, prediction, and control learning: three of the main hallmarks of an intelligent system. As such, we suggest that an integration approach like the one described in this work has the capacity to improve laser welding performance without ongoing and time-intensive human assistance. Our architecture therefore promises to address several key requirements of modern industry. To our knowledge, this architecture is the first demonstrated combination of deep learning and general value functions. It also represents the first use of deep learning for laser welding specifically and production engineering in general. We believe that it would be straightforward to adapt our architecture for use in other industrial and production engineering settings.