Tag Archives: Visual Slam

A nice review of visual SLAM with deep learning, and its evolution from non-learning visual SLAM

Ruihao Li, Sen Wang, DongBing Gu, Ongoing Evolution of Visual SLAM from Geometry to Deep Learning: Challenges and Opportunities, Cognitive Computation, December 2018, Volume 10, Issue 6, pp 875–889, DOI: 10.1007/s12559-018-9591-8.

Visual simultaneous localization and mapping (SLAM) has been investigated in the robotics community for decades. Significant progress and achievements on visual SLAM have been made, with geometric model-based techniques becoming increasingly mature and accurate. However, they tend to be fragile under challenging environments. Recently, there is a trend to develop data-driven approaches, e.g., deep learning, for visual SLAM problems with more robust performance. This paper aims to witness the ongoing evolution of visual SLAM techniques from geometric model-based to data-driven approaches by providing a comprehensive technical review. Our contribution is not only just a compilation of state-of-the-art end-to-end deep learning SLAM work, but also an insight into the underlying mechanism of deep learning SLAM. For such a purpose, we provide a concise overview of geometric model-based approaches first. Next, we identify visual depth estimation using deep learning is a starting point of the evolution. It is from depth estimation that ego-motion or pose estimation techniques using deep learning flourish rapidly. In addition, we strive to link semantic segmentation using deep learning with emergent semantic SLAM techniques to shed light on simultaneous estimation of ego-motion and high-level understanding. Finally, we visualize some further opportunities in this research direction.

Using sequences of images for loop closure instead of only one

Loukas Bampis, Angelos Amanatiadis, and Antonios Gasteratos, Fast loop-closure detection using visual-word-vectors from image sequences, The International Journal of Robotics Research Vol 37, Issue 1, pp. 62 – 82, DOI: 10.1177/0278364917740639.

In this paper, a novel pipeline for loop-closure detection is proposed. We base our work on a bag of binary feature words and we produce a description vector capable of characterizing a physical scene as a whole. Instead of relying on single camera measurements, the robot’s trajectory is dynamically segmented into image sequences according to its content. The visual word occurrences from each sequence are then combined to create sequence-visual-word-vectors and provide additional information to the matching functionality. In this way, scenes with considerable visual differences are firstly discarded, while the respective image-to-image associations are provided subsequently. With the purpose of further enhancing the system’s performance, a novel temporal consistency filter (trained offline) is also introduced to advance matches that persist over time. Evaluation results prove that the presented method compares favorably with other state-of-the-art techniques, while our algorithm is tested on a tablet device, verifying the computational efficiency of the approach.

Interesting survey on Visual SLAM without filtering and of its future lines of research

Georges Younes, Daniel Asmar, Elie Shammas, John Zelek, Keyframe-based monocular SLAM: design, survey, and future directions, Robotics and Autonomous Systems, Volume 98, 2017, Pages 67-88, DOI: 10.1016/j.robot.2017.09.010.

Extensive research in the field of monocular SLAM for the past fifteen years has yielded workable systems that found their way into various applications in robotics and augmented reality. Although filter-based monocular SLAM systems were common at some time, the more efficient keyframe-based solutions are becoming the de facto methodology for building a monocular SLAM system. The objective of this paper is threefold: first, the paper serves as a guideline for people seeking to design their own monocular SLAM according to specific environmental constraints. Second, it presents a survey that covers the various keyframe-based monocular SLAM systems in the literature, detailing the components of their implementation, and critically assessing the specific strategies made in each proposed solution. Third, the paper provides insight into the direction of future research in this field, to address the major limitations still facing monocular SLAM; namely, in the issues of illumination changes, initialization, highly dynamic motion, poorly textured scenes, repetitive textures, map maintenance, and failure recovery.

An open-source implementation of visual SLAM with a very nice related-work section

R. Mur-Artal and J. D. Tardós, ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras, IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, DOI: 10.1109/TRO.2017.2705103.

We present ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities. The system works in real time on standard central processing units in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end, based on bundle adjustment with monocular and stereo observations, allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches with map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

Interesting implementation of visual graph SLAM in C++ for educational purposes

Dominik Schlegel, Mirco Colosi, Giorgio Grisetti, ProSLAM: Graph SLAM from a Programmer’s Perspective/strong>, arXiv:1709.04377.

In this paper we present ProSLAM, a lightweight stereo visual SLAM system designed with simplicity in mind. Our work stems from the experience gathered by the authors while teaching SLAM to students and aims at providing a highly modular system that can be easily implemented and understood. Rather than focusing on the well known mathematical aspects of Stereo Visual SLAM, in this work we highlight the data structures and the algorithmic aspects that one needs to tackle during the design of such a system. We implemented ProSLAM using the C++ programming language in combination with a minimal set of well known used external libraries. In addition to an open source implementation, we provide several code snippets that address the core aspects of our approach directly in this paper. The results of a thorough validation performed on standard benchmark datasets show that our approach achieves accuracy comparable to state of the art methods, while requiring substantially less computational resources.

Very interesting survey on visual place recognition, including historical background, physio-psychological bases and a definition of “place” in robotics

S. Lowry et al., Visual Place Recognition: A Survey, in IEEE Transactions on Robotics, vol. 32, no. 1, pp. 1-19, Feb. 2016. DOI: 10.1109/TRO.2015.2496823.

Visual place recognition is a challenging problem due to the vast range of ways in which the appearance of real-world places can vary. In recent years, improvements in visual sensing capabilities, an ever-increasing focus on long-term mobile robot autonomy, and the ability to draw on state-of-the-art research in other disciplines-particularly recognition in computer vision and animal navigation in neuroscience-have all contributed to significant advances in visual place recognition systems. This paper presents a survey of the visual place recognition research landscape. We start by introducing the concepts behind place recognition-the role of place recognition in the animal kingdom, how a “place” is defined in a robotics context, and the major components of a place recognition system. Long-term robot operations have revealed that changing appearance can be a significant factor in visual place recognition failure; therefore, we discuss how place recognition solutions can implicitly or explicitly account for appearance change within the environment. Finally, we close with a discussion on the future of visual place recognition, in particular with respect to the rapid advances being made in the related fields of deep learning, semantic scene understanding, and video description.