Tag Archives: Attention

Including attention mechanisms in long-short term memory

Lin, X., Zhong, G., Chen, K. et al, Attention-Augmented Machine Memory, . Cogn Comput 13, 751–760 (2021) DOI: 10.1007/s12559-021-09854-5.

Attention mechanism plays an important role in the perception and cognition of human beings. Among others, many machine learning models have been developed to memorize the sequential data, such as the Long Short-Term Memory (LSTM) network and its extensions. However, due to lack of the attention mechanism, they cannot pay special attention to the important parts of the sequences. In this paper, we present a novel machine learning method called attention-augmented machine memory (AAMM). It seamlessly integrates the attention mechanism into the memory cell of LSTM. As a result, it facilitates the network to focus on valuable information in the sequences and ignore irrelevant information during its learning. We have conducted experiments on two sequence classification tasks for pattern classification and sentiment analysis, respectively. The experimental results demonstrate the advantages of AAMM over LSTM and some other related approaches. Hence, AAMM can be considered as a substitute of LSTM in the sequence learning applications.

Adapting the resolution of depth sensors and the location of the high-resolution area (fovea) as a possible attention mechanism in robots

Tasneem Z, Adhivarahan C, Wang D, Xie H, Dantu K, Koppal SJ., Adaptive fovea for scanning depth sensors, The International Journal of Robotics Research. 2020;39(7):837-855, DOI: 10.1177/0278364920920931.

Depth sensors have been used extensively for perception in robotics. Typically these sensors have a fixed angular resolution and field of view (FOV). This is in contrast to human perception, which involves foveating: scanning with the eyes’ highest angular resolution over regions of interest (ROIs). We build a scanning depth sensor that can control its angular resolution over the FOV. This opens up new directions for robotics research, because many algorithms in localization, mapping, exploration, and manipulation make implicit assumptions about the fixed resolution of a depth sensor, impacting latency, energy efficiency, and accuracy. Our algorithms increase resolution in ROIs either through deconvolutions or intelligent sample distribution across the FOV. The areas of high resolution in the sensor FOV act as artificial fovea and we adaptively vary the fovea locations to maximize a well-known information theoretic measure. We demonstrate novel applications such as adaptive time-of-flight (TOF) sensing, LiDAR zoom, gradient-based LiDAR sensing, and energy-efficient LiDAR scanning. As a proof of concept, we mount the sensor on a ground robot platform, showing how to reduce robot motion to obtain a desired scanning resolution. We also present a ROS wrapper for active simulation for our novel sensor in Gazebo. Finally, we provide extensive empirical analysis of all our algorithms, demonstrating trade-offs between time, resolution and stand-off distance.

Deep reinforcement learning applied to learn both attention and classification in a task of vehicle classification

D. Zhao, Y. Chen and L. Lv, Deep Reinforcement Learning With Visual Attention for Vehicle Classification, IEEE Transactions on Cognitive and Developmental Systems, vol. 9, no. 4, pp. 356-367, DOI: 10.1109/TCDS.2016.2614675.

Automatic vehicle classification is crucial to intelligent transportation system, especially for vehicle-tracking by police. Due to the complex lighting and image capture conditions, image-based vehicle classification in real-world environments is still a challenging task and the performance is far from being satisfactory. However, owing to the mechanism of visual attention, the human vision system shows remarkable capability compared with the computer vision system, especially in distinguishing nuances processing. Inspired by this mechanism, we propose a convolutional neural network (CNN) model of visual attention for image classification. A visual attention-based image processing module is used to highlight one part of an image and weaken the others, generating a focused image. Then the focused image is input into the CNN to be classified. According to the classification probability distribution, we compute the information entropy to guide a reinforcement learning agent to achieve a better policy for image classification to select the key parts of an image. Systematic experiments on a surveillance-nature dataset which contains images captured by surveillance cameras in the front view, demonstrate that the proposed model is more competitive than the large-scale CNN in vehicle classification tasks.

Empirical evidence of the negative correlation between cognitive workload and attention in humans

Kyle J. Jaquess, Rodolphe J. Gentili, Li-Chuan Lo, Hyuk Oh, Jing Zhang, Jeremy C. Rietschel, Matthew W. Miller, Ying Ying Tan, Bradley D. Hatfield, Empirical evidence for the relationship between cognitive workload and attentional reserve, International Journal of Psychophysiology, Volume 121, 2017, Pages 46-55, DOI: 10.1016/j.ijpsycho.2017.09.007.

While the concepts of cognitive workload and attentional reserve have been thought to have an inverse relationship for some time, such a relationship has never been empirically tested. This was the purpose of the present study. Aspects of the electroencephalogram were used to assess both cognitive workload and attentional reserve. Specifically, spectral measures of cortical activation were used to assess cognitive workload, while amplitudes of the event-related potential from the presentation of unattended “novel” sounds were used to assess attentional reserve. The relationship between these two families of measures was assessed using canonical correlation. Twenty-seven participants performed a flight simulator task under three levels of challenge. Verification of manipulation was performed using self-report measures of task demand, objective task performance, and heart rate variability using electrocardiography. Results revealed a strong, negative relationship between the spectral measures of cortical activation, believed to be representative of cognitive workload, and ERP amplitudes, believed to be representative of attentional reserve. This finding provides support for the theoretical and intuitive notion that cognitive workload and attentional reserve are inversely related. The practical implications of this result include improved state classification using advanced machine learning techniques, enhanced personnel selection/recruitment/placement, and augmented learning/training.

Survey on visual attention in 3D for robotics

Ekaterina Potapova, Michael Zillich, and Markus Vincze, Survey of recent advances in 3D visual attention for robotics, The International Journal of Robotics Research, Vol 36, Issue 11, pp. 1159 – 1176, DOI: 10.1177/0278364917726587.

3D visual attention plays an important role in both human and robotics perception that yet has to be explored in full detail. However, the majority of computer vision and robotics methods are concerned only with 2D visual attention. This survey presents findings and approaches that cover 3D visual attention in both human and robot vision, summarizing the last 30 years of research and also looking beyond computational methods. First, we present work in such fields as biological vision and neurophysiology, studying 3D attention in human observers. This provides a view of the role attention plays at the system level for biological vision. Then, we cover computer and robot vision approaches that take 3D visual attention into account. We compare approaches with respect to different categories, such as feature-based, data-based, or depth-based visual attention, and draw conclusions on what advances will help robotics to cope better with complex real-world settings and tasks.

A very good survey of visual saliency methods, with a list of robotic tasks that have benefit from attention

Ali Borji, Dicky N. Sihite, and Laurent Itti, Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study, IEEE Transactions on Image Processing, V. 22, N. 1, 2013, DOI: 10.1109/TIP.2012.2210727.

Visual attention is a process that enables biological and machine vision systems to select the most relevant regions from a scene. Relevance is determined by two components: 1) top-down factors driven by task and 2) bottom-up factors that highlight image regions that are different from their surroundings. The latter are often referred to as “visual saliency.” Modeling bottom-up visual saliency has been the subject of numerous research efforts during the past 20 years, with many successful applications in computer vision and robotics. Available models have been tested with different datasets (e.g., synthetic psychological search arrays, natural images or videos) using different evaluation scores (e.g., search slopes, comparison to human eye tracking) and parameter settings. This has made direct comparison of models difficult. Here, we perform an exhaustive comparison of 35 state-of-the-art saliency models over 54 challenging synthetic patterns, three natural image datasets, and two video datasets, using three evaluation scores. We find that although model rankings vary, some models consistently perform better. Analysis of datasets reveals that existing datasets are highly center-biased, which influences some of the evaluation scores. Computational complexity analysis shows that some models are very fast, yet yield competitive eye movement prediction accuracy. Different models often have common easy/difficult stimuli. Furthermore, several concerns in visual saliency modeling,
eye movement datasets, and evaluation scores are discussed and insights for future work are provided. Our study allows one to assess the state-of-the-art, helps to organizing this rapidly growing field, and sets a unified comparison framework for gauging future efforts, similar to the PASCAL VOC challenge in the object recognition and detection domains.

Approach to explain gaze: gaze is directed to task- and goal-relevant scene regions

John M. Henderson, Gaze Control as Prediction, Trends in Cognitive Sciences, Volume 21, Issue 1, January 2017, Pages 15-23, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.11.003.

The recent study of overt attention during complex scene viewing has emphasized explaining gaze behavior in terms of image properties and image salience independently of the viewer’s intentions and understanding of the scene. In this Opinion article, I outline an alternative approach proposing that gaze control in natural scenes can be characterized as the result of knowledge-driven prediction. This view provides a theoretical context for integrating and unifying many of the disparate phenomena observed in active scene viewing, offers the potential for integrating the behavioral study of gaze with the neurobiological study of eye movements, and provides a theoretical framework for bridging gaze control and other related areas of perception and cognition at both computational and neurobiological levels of analysis.

Including selective attention and cortical magnification to improve computer vision

Ala Aboudib, Vincent Gripon, Gilles Coppin, A Biologically Inspired Framework for Visual Information Processing and an Application on Modeling Bottom-Up Visual Attention, Cognitive Computation, December 2016, Volume 8, Issue 6, pp 1007–1026, DOI: 10.1007/s12559-016-9430-8.

An emerging trend in visual information processing is toward incorporating some interesting properties of the ventral stream in order to account for some limitations of machine learning algorithms. Selective attention and cortical magnification are two such important phenomena that have been the subject of a large body of research in recent years. In this paper, we focus on designing a new model for visual acquisition that takes these important properties into account.We propose a new framework for visual information acquisition and representation that emulates the architecture of the primate visual system by integrating features such as retinal sampling and cortical magnification while avoiding spatial deformations and other side effects produced by models that tried to implement these two features. It also explicitly integrates the notion of visual angle, which is rarely taken into account by vision models. We argue that this framework can provide the infrastructure for implementing vision tasks such as object recognition and computational visual attention algorithms.To demonstrate the utility of the proposed vision framework, we propose an algorithm for bottom-up saliency prediction implemented using the proposed architecture. We evaluate the performance of the proposed model on the MIT saliency benchmark and show that it attains state-of-the-art performance, while providing some advantages over other models.

Physiological evidences that visual attention is based on predictions

Martin Rolfs, Martin Szinte, Remapping Attention Pointers: Linking Physiology and Behavior, Trends in Cognitive Sciences, Volume 20, Issue 6, 2016, Pages 399-401, ISSN 1364-6613, DOI: 10.1016/j.tics.2016.04.003.

Our eyes rapidly scan visual scenes, displacing the projection on the retina with every move. Yet these frequent retinal image shifts do not appear to hamper vision. Two recent physiological studies shed new light on the role of attention in visual processing across saccadic eye movements.

Cognitive control: a nice bunch of definitions and state-of-the-art

S. Haykin, M. Fatemi, P. Setoodeh and Y. Xue, Cognitive Control, in Proceedings of the IEEE, vol. 100, no. 12, pp. 3156-3169, Dec. 2012., DOI: 10.1109/JPROC.2012.2215773.

This paper is inspired by how cognitive control manifests itself in the human brain and does so in a remarkable way. It addresses the many facets involved in the control of directed information flow in a dynamic system, culminating in the notion of information gap, defined as the difference between relevant information (useful part of what is extracted from the incoming measurements) and sufficient information representing the information needed for achieving minimal risk. The notion of information gap leads naturally to how cognitive control can itself be defined. Then, another important idea is described, namely the two-state model, in which one is the system’s state and the other is the entropic state that provides an essential metric for quantifying the information gap. The entropic state is computed in the perceptual part (i.e., perceptor) of the dynamic system and sent to the controller directly as feedback information. This feedback information provides the cognitive controller the information needed about the environment and the system to bring reinforcement leaning into play; reinforcement learning (RL), incorporating planning as an integral part, is at the very heart of cognitive control. The stage is now set for a computational experiment, involving cognitive radar wherein the cognitive controller is enabled to control the receiver via the environment. The experiment demonstrates how RL provides the mechanism for improved utilization of computational resources, and yet is able to deliver good performance through the use of planning. The paper finishes with concluding remarks.