1. 程式人生 > >Visual SLAM Introduction In Detail

Visual SLAM Introduction In Detail

SLAM概述

  • SLAM一般處理流程包括track和map兩部分。所謂的track是用來估計相機的位姿,也叫front-end。而map部分(back-end)則是深度的構建,通過前面的跟蹤模組估計得到相機的位姿,採用三角法(triangulation)計算相應特徵點的深度,進行當前環境map的重建,重建出的map同時為front-end提供更好的姿態估計,並可以用於例如閉環檢測.
  • 單目slam根據構建地圖的稀疏程度可以大致分為:稀疏法(特徵點),半稠密法,稠密法
  • 根據匹配方法,可分為:直接法和特徵點法
  • 根據系統採用的優化策略,可分為Keyframe-based和filter-based方法;
  1. Strasdat
     H,Montiel J M M,Davison A J.Visual SLAM: why filter?[J].ImageandVisionComputing,2012,30(2):65-77.
  2. For all these scenarios, we conclude that keyframe bundle adjustment outperforms filtering, since it gives the most accuracy per unit of computing time.​

典型的單目slam系統

  • SceneLib2: SLAM originally designed and implemented by Professor Andrew Davison at Imperial College London

     > MonoSLAM: Real-Time Single Camera SLAM (PDF format), Andrew J. Davison, Ian Reid, Nicholas Molton and Olivier Stasse, IEEE Trans. PAMI 2007.

         > Georg Klein and David Murray, "Parallel Tracking and Mapping for Small AR Workspaces", Proc. ISMAR 2007

          A novel, direct monocular SLAM technique: Instead of using keypoints, it directly operates on image intensities both for tracking and mapping. The camera is tracked using direct image alignment

, while geometry is estimated in the form of semi-dense depth maps, obtained by filtering over many pixelwise stereo comparisons. We then build a Sim(3) pose-graph of keyframes, which allows to build scale-drift corrected, large-scale maps including loop-closures. LSD-SLAM runs in real-time on a CPU, and even on a modern smartphone.

          > LSD-SLAM: Large-Scale Direct Monocular SLAM , In European Conference on Computer Vision (ECCV), 2014. [bib] [pdf] [video]

  • SVO: Fast Semi-Direct Monocular Visual Odometry (ICRA 2014)

        ORB-SLAM是西班牙Zaragoza大學的Raul Mur-Artal編寫的視覺SLAM系統。他的論文“ORB-SLAM: a versatile and accurate monocular SLAM system"發表在2015年的IEEE Trans. on Robotics上。開原始碼包括前期的ORB-SLAM[1]和後期的ORB-SLAM2[2]。第一個版本主要用於單目SLAM,而第二個版本支援單目、雙目和RGBD三種介面。

        ORB-SLAM是一個完整的SLAM系統,包括視覺里程計、跟蹤、迴環檢測。它是一種完全基於稀疏特徵點的單目SLAM系統,其核心是使用ORB(Orinted FAST and BRIEF)作為整個視覺SLAM中的核心特徵。具體體現在兩個方面:

  • 提取和跟蹤的特徵點使用ORB。ORB特徵的提取過程非常快,適合用於實時性強的系統。
  • 迴環檢測使用詞袋模型,其字典是一個大型的ORB字典。
  • 介面豐富,支援單目、雙目、RGBD多種感測器輸入,編譯時ROS可選,使得其應用十分輕便。代價是為了支援各種介面,程式碼邏輯稍為複雜。
  • 在PC機以30ms/幀的速度進行實時計算,但在嵌入式平臺上表現不佳。

      它主要有三個執行緒組成:跟蹤、Local Mapping(又稱小圖)、Loop Closing(又稱大圖)。跟蹤執行緒相當於一個視覺里程計,流程如下:

  • 首先,對原始影象提取ORB特徵並計算描述子。
  • 根據特徵描述,在影象間進行特徵匹配。
  • 根據匹配特徵點估計相機運動。
  • 根據關鍵幀判別準則,判斷當前幀是否為關鍵幀。

相比於多數視覺SLAM中利用幀間運動大小來取關鍵幀的做法,ORB_SLAM的關鍵幀判別準則較為複雜。

      > Raúl Mur-ArtalJ. M. M. Montiel and Juan D. TardósORB-SLAM: A Versatile and Accurate Monocular SLAM System.  IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, October 2015. [pdf]

      > Raúl Mur-Artal and Juan D. TardósProbabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM. Robotics: Science and Systems. Rome, Italy, July 2015. [pdf] [poster]

基於單目的稠密slam系統 

  • DPPTAM: DPPTAM is a direct monocular odometry algorithm that estimates a dense reconstruction of a scene in real-time on a CPU. Highly textured image areas are mapped using standard direct mapping techniques, that minimize the photometric error across different views. We make the assumption that homogeneous-color regions belong to approximately planar areas. Related Publication:

          > Alejo Concha, Javier Civera. DPPTAM: Dense Piecewise Planar Tracking and Mapping from a Monocular Sequence IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS15), Hamburg, Germany, 2015

基於RGBD的稠密slam系統

          ElasticFusion: Dense SLAM Without A Pose GraphT. Whelan, S. Leutenegger, R. F. Salas-Moreno, B. Glocker and A. J. Davison, RSS '15

  • RGBDSLAMv2:  a state-of-the-art SLAM system for RGB-D cameras, e.g., the Microsoft Kinect or the Asus Xtion Pro Live. You can use it to create 3D point clouds or OctoMaps.

          > "3D Mapping with an RGB-D Camera", F. Endres, J. Hess, J. Sturm, D. Cremers, W. Burgard, IEEE Transactions on Robotics, 2014.

          The loop closure detector uses a bag-of-words approach to determinate how likely a new image comes from a previous location or a new location. When a loop closure hypothesis is accepted, a new constraint is added to the map's graph, then a graph optimizer minimizes the errors in the map. A memory management approach is used to limit the number of locations used for loop closure detection and graph optimization, so that real-time constraints on large-scale environnements are always respected. RTAB-Map can be used alone with a hand-held Kinect or stereo camera for 6DoF RGB-D mapping, or on a robot equipped with a laser rangefinder for 3DoF mapping.

          > Dense Visual SLAM for RGB-D Cameras , In Proc. of the Int. Conf. on Intelligent Robot Systems (IROS), 2013.

          > Robust Odometry Estimation for RGB-D Cameras , In Int. Conf. on Robotics and Automation, 2013.

Visual-Inertial Slam系統

          Stefan Leutenegger, Simon Lynen, Michael Bosse, Roland Siegwart and Paul Timothy Furgale. Keyframe-based visual–inertial odometry using nonlinear optimization. The International Journal of Robotics Research, 2015.

最新單目slam系統

        REBVO tracks a camera in Realtime using edges. The system is split in 2 components. An on-board part (rebvo itself) doing all the processing and sending data over UDP and an OpenGL visualizer.

        > Tarrio, J. J., & Pedre, S. (2015). Realtime Edge-Based Visual Odometry for a Monocular Camera. In Proceedings of the IEEE International Conference on Computer Vision (pp. 702-710).

          A novel direct and sparse formulation for Visual Odometry. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry - represented as inverse depth in a reference frame - and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. DSO does not depend on keypoint detectors or descriptors, thus it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

         > Direct Sparse Odometry , In arXiv:1607.02565, 2016. [bib] [pdf]

         > A Photometrically Calibrated Benchmark For Monocular Visual Odometry , In arXiv:1607.02555, 2016. [bib] [pdf]

  • svo 2.0

        > C. Forster, Z. Zhang, M. Gassner, M. Werlberger, and D. Scaramuzza. Svo 2.0: Semi-direct visual odometry for monocular and multi-camera systems. IEEE Trans- actions on Robotics, accepted, January 2016.

        > C. Forster, M. Pizzoli, and D. Scaramuzza. SVO: Fast Semi-Direct Monocular Visual Odometry. In IEEE Intl. Conf. on Robotics and Automation (ICRA), 2014. doi:10.1109/ICRA.2014.6906584.

典型的雙目slam系統

          LIBVISO2 (Library for Visual Odometry 2) is a very fast cross-platfrom (Linux, Windows) C++ library with MATLAB wrappers for computing the 6 DOF motion of a moving mono/stereo camera. The stereo version is based on minimizing the reprojection error of sparse feature matches and is rather general (no motion model or setup restrictions except that the input images must be rectified and calibration parameters are known). The monocular version is still very experimental and uses the 8-point algorithm for fundamental matrix estimation. It further assumes that the camera is moving at a known and fixed height over ground (for estimating the scale). Due to the 8 correspondences needed for the 8-point algorithm, many more RANSAC samples need to be drawn, which makes the monocular algorithm slower than the stereo algorithm, for which 3 correspondences are sufficent to estimate parameters. 

         > Geiger A, Ziegler J, Stiller C. Stereoscan: Dense 3d reconstruction in real-time[C]//Intelligent Vehicles Symposium (IV), 2011 IEEE. IEEE, 2011: 963-968.

          ORB-SLAM2 is a real-time SLAM library for Monocular, Stereo and RGB-D cameras that computes the camera trajectory and a sparse 3D reconstruction (in the stereo and RGB-D case with true scale). It is able to detect loops and relocalize the camera in real time. We provide examples to run the SLAM system in the KITTI dataset as stereo or monocular, and in theTUM dataset as RGB-D or monocular.

          S-PTAM is a Stereo SLAM system able to compute the camera trajectory in real-time. It heavily exploits the parallel nature of the SLAM problem, separating the time-constrained pose estimation from less pressing matters such as map building and refinement tasks. On the other hand, the stereo setting allows to reconstruct a metric 3D map for each frame of stereo images, improving the accuracy of the mapping process with respect to monocular SLAM and avoiding the well-known bootstrapping problem. Also, the real scale of the environment is an essential feature for robots which have to interact with their surrounding workspace.

          > Taihú Pire, Thomas Fischer, Javier Civera, Pablo De Cristóforis and Julio Jacobo Berlles. Stereo Parallel Tracking and Mapping for Robot Localization Proc. of The International Conference on Intelligent Robots and Systems (IROS) (Accepted), Hamburg, Germany, 2015.

   ORBSLAM_DWO is developed on top of ORB-SLAM with double window optimization by Jianzhu Huai. The major differences from ORB-SLAM are: (1) it can run with or without ROS, (2) it does not use the modified version of g2o shipped in ORB-SLAM, instead it uses the g2o from github, (3) it uses Eigen vectors and Sophus members instead of OpenCV Mat to represent pose entities, (4) it incorporates the pinhole camera model from rpg_vikit and a decay velocity motion model fromStereo PTAM, (5) currently, it supports monocular, stereo, and stereo + inertial input for SLAM, note it does not work with monocular + inertial input.

         A library for (semi-dense) real-time visual odometry from stereo data using direct alignment of feature descriptors. There are descriptors implemented. First, is raw intensity (no descriptor), which runs in real-time or faster. Second, is an implementation of the Bit-Planes descriptor designed for robust performance under challenging illumination conditions as described here andhere.

        >Gómez-Ojeda R, González-Jiménez J. Robust Stereo Visual Odometry through a Probabilistic Combination of Points and Line Segments[J]. 2016.

  • This is a general and scalable framework for visual SLAM. It employs  "Double Window Optimization" (DWO) as described in our ICCV paper:

    > H. Strasdat, A.J. Davison, J.M.M. Montiel, and K. Konolige "Double Window Optimisation for Constant Time Visual SLAM" Proceedings of the IEEE International Conference on Computer Vision, 2011.​

閉環檢測 

  • DLoopDetector:DLoopDetector is an open source C++ library to detect loops in a sequence of images collected by a mobile robot. It implements the algorithm presented in GalvezTRO12, based on a bag-of-words database created from image local descriptors, and temporal and geometrical constraints. The current implementation includes versions to work with SURF64 and BRIEF descriptors. DLoopDetector is based on the DBoW2 library, so that it can work with any other type of descriptor with little effort.

          > Bags of Binary Words for Fast Place Recognition in Image Sequences. D Gálvez-López, JD Tardos. IEEE Transactions on Robotics 28 (5), 1188-1197, 2012.

           > DBoW2: DBoW2 is an improved version of the DBow library, an open source C++ library for indexing and converting images into a bag-of-word representation.

  • FAB-MAP: FAB-MAP is a Simultaneous Localisation and Mapping algorithm which operates solely in appearance space. FAB-MAP performs location matching between places that have been visited within the world as well as providing a measure of the probability of being at a new, previously unvisited location. Camera images form the sole input to the system, from which OpenCV's feature extraction methods are used to develop bag-of-words representations for the Bayesian comparison technique.

優化工具庫

  • g2o:g2o is an open-source C++ framework for optimizing graph-based nonlinear error functions. g2o has been designed to be easily extensible to a wide range of problems and a new problem typically can be specified in a few lines of code. The current implementation provides solutions to several variants of SLAM and BA.
  • Ceres Solver

    Ceres Solver is an open source C++ library for modeling and solving large, complicated optimization problems. It is a feature rich, mature and performant library which has been used in production at Google since 2010. Ceres Solver can solve two kinds of problems.

    1. General unconstrained optimization problems.
  • GTSAM: GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices. On top of the C++ library, GTSAM includes a MATLAB interface (enable GTSAM_INSTALL_MATLAB_TOOLBOX in CMake to build it). A Python interface is under development.
  • 各大主流的vo和slam系統的精度效能評估網站

SLAM資料集

SLAM綜述相關References

[2] Strasdat H, Montiel J M M, Davison A J. Visual SLAM: why filter?[J]. Image and Vision Computing, 2012, 30(2): 65-77. 

[7] Aulinas J, Petillot Y R, Salvi J, et al. The SLAM problem: a survey[C]//CCIA. 2008: 363-371.

[8] Grisetti G, Kummerle R, Stachniss C, et al. A tutorial on graph-based SLAM[J]. IEEE Intelligent Transportation Systems Magazine, 2010, 2(4): 31-43.

[10] Lowry S, Sünderhauf N, Newman P, et al. Visual place recognition: A survey[J]. IEEE Transactions on Robotics, 2016, 32(1): 1-19.

[11] Georges Younes, Daniel Asmar, Elie Shammas. A survey on non-filter-based monocular Visual SLAM systemsComputer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO), 2016. (針對目前開源的單目slam系統[ PTAM, SVO, DT SLAM, LSD SLAM, ORB SLAM, and DPPTAM] 每個模組採用的方法進行整理)