Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Accurate Kidnap Resolution Through Deep Learning

Towards Accurate Kidnap Resolution Through Deep Learning

Kent Sommer

July 19, 2017
Tweet

Other Decks in Research

Transcript

  1. Towards Accurate Kidnap Resolution Through Deep Learning Kent Sommer, Keonhee

    Kim, Youngji Kim, Sungho Jo Korea Advanced Institute of Science and Technology
  2. Introduction ”Using sensory information to locate the robot in its

    environment is the most fundamental problem to provide a mobile robot with autonomous capabilities” [1] • Position tracking (bounded uncertainty) • Global localization (unbounded uncertainty) • Kidnapping (recovery from failure) 2
  3. Introduction Why use visual information (cameras)? • Low-cost sensors •

    Provide massive amount of information • Passive sensors (no interference) • Some biological motivation We all easily know where we are if we see this. 3
  4. Introduction Current state of the art approaches to the kidnap

    problem rely heavily on hand-crafted image features [3, 4, 6–8] such as: • SIFT - Scale-Invarient Feature Transform • ORB - Oriented FAST and rotated BRIEF 4
  5. Scene Coordinate Regression Forests • Based on regression forests •

    Formulates localization task as regression problem • Requires RGB-D inputs • High 6-DOF accuracy • High runtime cost for large areas SCoRe Forest optimizes regressed position for maximum inliers 5
  6. PlaNet • Convolutional neural network • Formulates localization task as

    classification problem • Requires only monocular inputs • Cannot provide 6-DOF position • Constant runtime cost PlaNet provides a probability of position over cells 6
  7. PoseNet • Convolutional neural network • Formulates localization task as

    regression problem • Requires only monocular inputs • High 6-DOF accuracy • Constant runtime cost Training (Green), Testing (Blue), Predicted(Red) 7
  8. Approach: Overview • Task: Localize given a known map and

    current monocular view: 1. Obtain semi-fine 6-DOF position estimate 2. Use position estimate as initilization for probabilistic localization framework Task Input to Output Visualization 8
  9. Approach: Position Regression • Convolutional neural network based on Inception-V4

    [5] • Removed the final softmax classification layer • Add two regression layers (no activation functions) • CNN learns the function given by: f (In ) ⇒ [x, y, z, w, p, q, r]T • Loss function: Li = ˆ x − x 2 + β ˆ q − q q 2 Position Regression Model Overview 9
  10. Approach: Full System Once our network is trained: • Pass

    current view to our localization CNN • (Optionally) repeat this while turning in place • Average result and send to AMCL (Adaptive Monte Carlo Localization) • AMCL places robot in the map Our Full System Overview 10
  11. Experiments: Overview Require two distinct expiriments to show efficacy of

    our approach: 1. Localization accuracy 2. Number of succesful kidnap resolutions Kings College Dataset 7-Scenes Dataset Hallway Dataset (Ours) 11
  12. Experiments: Hallway Dataset Overview • Dataset location(hallway) was chosen to

    minimize interference with IR sensor due to sunlight • Total size of dataset: 1535 images, with 383 reserved for evaluation Samples from our collected dataset 12
  13. Experiments: Hallway Dataset Collection Our collection method for the hallway

    dataset utilized a TurtleBot 2 robotics platform: • Kobuki base • Laptop Computer • RGB-D camera To label each image with its 6 DoF position: • Utilized Cartographer SLAM [2] 13
  14. Experiments: Hallway Dataset Difficulties There were certain limitations to the

    ground truth accuracy of the dataset: • RGB-D depth map is often very noisy leading to difficulties with SLAM • Odometry accuracy suffered due to the slippery hallway surface during turns However: • Cartographer SLAM was still able to label images accurately enough • Our CNN position regression model was accurate enough to initialize AMCL successfuly 14
  15. Experiments: Localization Accuracy Model Kings College Chess Office Stairs Conv.

    Nearest Neighbor 3.34m,5.92◦ 0.41m,11.2◦ 0.49m,12.0◦ 0.56m,15.4◦ PoseNet 1.92m,5.40◦ 0.32m,8.12◦ 0.48m,7.68◦ 0.47m,13.8◦ Ours 1.46m,2.67◦ +24%,+51% 0.25m,4.02◦ +21%,+50% 0.38m,3.69◦ +21%,+52% 0.38m,10.2◦ +19%,+26% Comparison to State-of-the-Art on Kings College and 7-Scenes datasets 15
  16. Experiments: Kidnap Resolution Performance Prediction Method Successful Unsuccessful Success Rate

    Single View 16 4 80% Multiple Crops 17 3 85% Average Over Views 19 1 95% Single Crop Multi-Crop Multi-View 16
  17. Experiments: Run-time Unlike alternative kidnap solutions: • Scan matching •

    Content based image retrieval We achieve a low constant run-time (depending on method) on modern consumer hardware • Forward pass through any CNN requires constant time Method Runtime Single Image ≈ 11ms Multiple Crops ≈ 66ms Multiple Views ≈ 66ms Runtime Comparisons by Method 17
  18. Summary • New CNN position regression architecture • 22% relative

    improvement in position accuracy over SoTA • 51% relative improvement in orientation accuracy over SoTA • Novel framework to allow for consistent kidnap resolution • 80% success rate from single image • 85% success rate from multi-crop • 95% success rate from multi-view 18
  19. Future Directions: Loop Closure Proposals We have done preliminary work

    towards CNN based loop closure proposals • Utilize iSAM for pose graph optimization • Position regression CNN provides loop closure proposals Point Cloud Registration Without Loop Closure Proposal Point Cloud Registration With Loop Closure Proposal 19
  20. References I References [1] I. J. Cox. Blanche-an experiment in

    guidance and navigation of an autonomous robot vehicle. IEEE Transactions on robotics and automation, 7(2):193–204, 1991. [2] W. Hess, D. Kohler, H. Rapp, and D. Andor. Real-time loop closure in 2d lidar slam. In Robotics and Automation (ICRA), 2016 IEEE International Conference on, pages 1271–1278. IEEE, 2016. [3] Y. Li, N. Snavely, D. P. Huttenlocher, and P. Fua. Worldwide pose estimation using 3d point clouds. In Large-Scale Visual Geo-Localization, pages 147–163. Springer, 2016.
  21. References II [4] T. Sattler, B. Leibe, and L. Kobbelt.

    Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016. [5] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016. [6] J. Wang, H. Zha, and R. Cipolla. Coarse-to-fine vision-based localization by indexing scale-invariant features. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 36(2):413–422, 2006. [7] B. Zeisl, T. Sattler, and M. Pollefeys. Camera pose voting for large-scale image-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 2704–2712, 2015.
  22. References III [8] B. Zhang, Q. Zhao, W. Feng, M.

    Sun, and W. Jia. Sift-based indoor localization for older adults using wearable camera. In Biomedical Engineering Conference (NEBEC), 2015 41st Annual Northeast, pages 1–2. IEEE, 2015.