Tag Archives: Attention

Deep reinforcement learning applied to learn both attention and classification in a task of vehicle classification

D. Zhao, Y. Chen and L. Lv, Deep Reinforcement Learning With Visual Attention for Vehicle Classification, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 4, pp. 356-367, DOI: 10.1109/TCDS.2016.2614675.

Automatic vehicle classification is crucial to intelligent transportation system, especially for vehicle-tracking by police. Due to the complex lighting and image capture conditions, image-based vehicle classification in real-world environments is still a challenging task and the performance is far from being satisfactory. However, owing to the mechanism of visual attention, the human vision system shows remarkable capability compared with the computer vision system, especially in distinguishing nuances processing. Inspired by this mechanism, we propose a convolutional neural network (CNN) model of visual attention for image classification. A visual attention-based image processing module is used to highlight one part of an image and weaken the others, generating a focused image. Then the focused image is input into the CNN to be classified. According to the classification probability distribution, we compute the information entropy to guide a reinforcement learning agent to achieve a better policy for image classification to select the key parts of an image. Systematic experiments on a surveillance-nature dataset which contains images captured by surveillance cameras in the front view, demonstrate that the proposed model is more competitive than the large-scale CNN in vehicle classification tasks.

Empirical evidence of the negative correlation between cognitive workload and attention in humans

Kyle J. Jaquess, Rodolphe J. Gentili, Li-Chuan Lo, Hyuk Oh, Jing Zhang, Jeremy C. Rietschel, Matthew W. Miller, Ying Ying Tan, Bradley D. Hatfield, Empirical evidence for the relationship between cognitive workload and attentional reserve, International Journal of Psychophysiology, Volume 121, 2017, Pages 46-55, DOI: 10.1016/j.ijpsycho.2017.09.007.

While the concepts of cognitive workload and attentional reserve have been thought to have an inverse relationship for some time, such a relationship has never been empirically tested. This was the purpose of the present study. Aspects of the electroencephalogram were used to assess both cognitive workload and attentional reserve. Specifically, spectral measures of cortical activation were used to assess cognitive workload, while amplitudes of the event-related potential from the presentation of unattended “novel” sounds were used to assess attentional reserve. The relationship between these two families of measures was assessed using canonical correlation. Twenty-seven participants performed a flight simulator task under three levels of challenge. Verification of manipulation was performed using self-report measures of task demand, objective task performance, and heart rate variability using electrocardiography. Results revealed a strong, negative relationship between the spectral measures of cortical activation, believed to be representative of cognitive workload, and ERP amplitudes, believed to be representative of attentional reserve. This finding provides support for the theoretical and intuitive notion that cognitive workload and attentional reserve are inversely related. The practical implications of this result include improved state classification using advanced machine learning techniques, enhanced personnel selection/recruitment/placement, and augmented learning/training.

Survey on visual attention in 3D for robotics

Ekaterina Potapova, Michael Zillich, and Markus Vincze, Survey of recent advances in 3D visual attention for robotics, The International Journal of Robotics Research, Vol 36, Issue 11, pp. 1159 – 1176, DOI: 10.1177/0278364917726587.

3D visual attention plays an important role in both human and robotics perception that yet has to be explored in full detail. However, the majority of computer vision and robotics methods are concerned only with 2D visual attention. This survey presents findings and approaches that cover 3D visual attention in both human and robot vision, summarizing the last 30 years of research and also looking beyond computational methods. First, we present work in such fields as biological vision and neurophysiology, studying 3D attention in human observers. This provides a view of the role attention plays at the system level for biological vision. Then, we cover computer and robot vision approaches that take 3D visual attention into account. We compare approaches with respect to different categories, such as feature-based, data-based, or depth-based visual attention, and draw conclusions on what advances will help robotics to cope better with complex real-world settings and tasks.

A very good survey of visual saliency methods, with a list of robotic tasks that have benefit from attention

Ali Borji, Dicky N. Sihite, and Laurent Itti, Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study, IEEE Transactions on Image Processing, V. 22, N. 1, 2013, DOI: 10.1109/TIP.2012.2210727.

Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling,
eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

Approach to explain gaze: gaze is directed to task- and goal-relevant scene regions

John M. Henderson, Gaze Control as Prediction, Trends in Cognitive Sciences, Volume 21, Issue 1, January 2017, Pages 15-23, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.11.003.

The recent study of overt attention during complex scene viewing has emphasized explaining gaze behavior in terms of image properties and image salience independently of the viewer’s intentions and understanding of the scene. In this Opinion article, I outline an alternative approach proposing that gaze control in natural scenes can be characterized as the result of knowledge-driven prediction. This view provides a theoretical context for integrating and unifying many of the disparate phenomena observed in active scene viewing, offers the potential for integrating the behavioral study of gaze with the neurobiological study of eye movements, and provides a theoretical framework for bridging gaze control and other related areas of perception and cognition at both computational and neurobiological levels of analysis.

Including selective attention and cortical magnification to improve computer vision

Ala Aboudib, Vincent Gripon, Gilles Coppin, A Biologically Inspired Framework for Visual Information Processing and an Application on Modeling Bottom-Up Visual Attention, Cognitive Computation, December 2016, Volume 8, Issue 6, pp 1007–1026, DOI: 10.1007/s12559-016-9430-8.

An emerging trend in visual information processing is toward incorporating some interesting properties of the ventral stream in order to account for some limitations of machine learning algorithms. Selective attention and cortical magnification are two such important phenomena that have been the subject of a large body of research in recent years. In this paper, we focus on designing a new model for visual acquisition that takes these important properties into account.We propose a new framework for visual information acquisition and representation that emulates the architecture of the primate visual system by integrating features such as retinal sampling and cortical magnification while avoiding spatial deformations and other side effects produced by models that tried to implement these two features. It also explicitly integrates the notion of visual angle, which is rarely taken into account by vision models. We argue that this framework can provide the infrastructure for implementing vision tasks such as object recognition and computational visual attention algorithms.To demonstrate the utility of the proposed vision framework, we propose an algorithm for bottom-up saliency prediction implemented using the proposed architecture. We evaluate the performance of the proposed model on the MIT saliency benchmark and show that it attains state-of-the-art performance, while providing some advantages over other models.

Physiological evidences that visual attention is based on predictions

Martin Rolfs, Martin Szinte, Remapping Attention Pointers: Linking Physiology and Behavior, Trends in Cognitive Sciences, Volume 20, Issue 6, 2016, Pages 399-401, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.04.003.

Our eyes rapidly scan visual scenes, displacing the projection on the retina with every move. Yet these frequent retinal image shifts do not appear to hamper vision. Two recent physiological studies shed new light on the role of attention in visual processing across saccadic eye movements.

Cognitive control: a nice bunch of definitions and state-of-the-art

S. Haykin, M. Fatemi, P. Setoodeh and Y. Xue, Cognitive Control, in Proceedings of the IEEE, vol. 100, no. 12, pp. 3156-3169, Dec. 2012., DOI: 10.1109/JPROC.2012.2215773.

This paper is inspired by how cognitive control manifests itself in the human brain and does so in a remarkable way. It addresses the many facets involved in the control of directed information flow in a dynamic system, culminating in the notion of information gap, defined as the difference between relevant information (useful part of what is extracted from the incoming measurements) and sufficient information representing the information needed for achieving minimal risk. The notion of information gap leads naturally to how cognitive control can itself be defined. Then, another important idea is described, namely the two-state model, in which one is the system’s state and the other is the entropic state that provides an essential metric for quantifying the information gap. The entropic state is computed in the perceptual part (i.e., perceptor) of the dynamic system and sent to the controller directly as feedback information. This feedback information provides the cognitive controller the information needed about the environment and the system to bring reinforcement leaning into play; reinforcement learning (RL), incorporating planning as an integral part, is at the very heart of cognitive control. The stage is now set for a computational experiment, involving cognitive radar wherein the cognitive controller is enabled to control the receiver via the environment. The experiment demonstrates how RL provides the mechanism for improved utilization of computational resources, and yet is able to deliver good performance through the use of planning. The paper finishes with concluding remarks.