Towards Accurate Kidnap Resolution Through Deep Learning

Towards Accurate Kidnap Resolution Through Deep Learning Kent Sommer, Keonhee
Kim, Youngji Kim, Sungho Jo Korea Advanced Institute of Science and Technology

Table of contents 1. Introduction 2. Related Work 3. Approach
4. Experiments 5. Conclusion 1

Introduction

Introduction ”Using sensory information to locate the robot in its
environment is the most fundamental problem to provide a mobile robot with autonomous capabilities” [1] • Position tracking (bounded uncertainty) • Global localization (unbounded uncertainty) • Kidnapping (recovery from failure) 2

Introduction Why use visual information (cameras)? • Low-cost sensors •
Provide massive amount of information • Passive sensors (no interference) • Some biological motivation We all easily know where we are if we see this. 3

Introduction Current state of the art approaches to the kidnap
problem rely heavily on hand-crafted image features [3, 4, 6–8] such as: • SIFT - Scale-Invarient Feature Transform • ORB - Oriented FAST and rotated BRIEF 4

Related Work

Scene Coordinate Regression Forests • Based on regression forests •
Formulates localization task as regression problem • Requires RGB-D inputs • High 6-DOF accuracy • High runtime cost for large areas SCoRe Forest optimizes regressed position for maximum inliers 5

PlaNet • Convolutional neural network • Formulates localization task as
classiﬁcation problem • Requires only monocular inputs • Cannot provide 6-DOF position • Constant runtime cost PlaNet provides a probability of position over cells 6

PoseNet • Convolutional neural network • Formulates localization task as
regression problem • Requires only monocular inputs • High 6-DOF accuracy • Constant runtime cost Training (Green), Testing (Blue), Predicted(Red) 7

Approach

Approach: Overview • Task: Localize given a known map and
current monocular view: 1. Obtain semi-ﬁne 6-DOF position estimate 2. Use position estimate as initilization for probabilistic localization framework Task Input to Output Visualization 8

Approach: Position Regression • Convolutional neural network based on Inception-V4
[5] • Removed the ﬁnal softmax classiﬁcation layer • Add two regression layers (no activation functions) • CNN learns the function given by: f (In ) ⇒ [x, y, z, w, p, q, r]T • Loss function: Li = ˆ x − x 2 + β ˆ q − q q 2 Position Regression Model Overview 9

Approach: Full System Once our network is trained: • Pass
current view to our localization CNN • (Optionally) repeat this while turning in place • Average result and send to AMCL (Adaptive Monte Carlo Localization) • AMCL places robot in the map Our Full System Overview 10

Experiments

Experiments: Overview Require two distinct expiriments to show eﬃcacy of
our approach: 1. Localization accuracy 2. Number of succesful kidnap resolutions Kings College Dataset 7-Scenes Dataset Hallway Dataset (Ours) 11

Experiments: Hallway Dataset Overview • Dataset location(hallway) was chosen to
minimize interference with IR sensor due to sunlight • Total size of dataset: 1535 images, with 383 reserved for evaluation Samples from our collected dataset 12

Experiments: Hallway Dataset Collection Our collection method for the hallway
dataset utilized a TurtleBot 2 robotics platform: • Kobuki base • Laptop Computer • RGB-D camera To label each image with its 6 DoF position: • Utilized Cartographer SLAM [2] 13

Experiments: Hallway Dataset Difficulties There were certain limitations to the
ground truth accuracy of the dataset: • RGB-D depth map is often very noisy leading to difficulties with SLAM • Odometry accuracy suffered due to the slippery hallway surface during turns However: • Cartographer SLAM was still able to label images accurately enough • Our CNN position regression model was accurate enough to initialize AMCL successfuly 14

Experiments: Localization Accuracy Model Kings College Chess Oﬃce Stairs Conv.
Nearest Neighbor 3.34m,5.92◦ 0.41m,11.2◦ 0.49m,12.0◦ 0.56m,15.4◦ PoseNet 1.92m,5.40◦ 0.32m,8.12◦ 0.48m,7.68◦ 0.47m,13.8◦ Ours 1.46m,2.67◦ +24%,+51% 0.25m,4.02◦ +21%,+50% 0.38m,3.69◦ +21%,+52% 0.38m,10.2◦ +19%,+26% Comparison to State-of-the-Art on Kings College and 7-Scenes datasets 15

Experiments: Kidnap Resolution Performance Prediction Method Successful Unsuccessful Success Rate
Single View 16 4 80% Multiple Crops 17 3 85% Average Over Views 19 1 95% Single Crop Multi-Crop Multi-View 16

Experiments: Run-time Unlike alternative kidnap solutions: • Scan matching •
Content based image retrieval We achieve a low constant run-time (depending on method) on modern consumer hardware • Forward pass through any CNN requires constant time Method Runtime Single Image ≈ 11ms Multiple Crops ≈ 66ms Multiple Views ≈ 66ms Runtime Comparisons by Method 17

Conclusion

Summary • New CNN position regression architecture • 22% relative
improvement in position accuracy over SoTA • 51% relative improvement in orientation accuracy over SoTA • Novel framework to allow for consistent kidnap resolution • 80% success rate from single image • 85% success rate from multi-crop • 95% success rate from multi-view 18

Future Directions: Loop Closure Proposals We have done preliminary work
towards CNN based loop closure proposals • Utilize iSAM for pose graph optimization • Position regression CNN provides loop closure proposals Point Cloud Registration Without Loop Closure Proposal Point Cloud Registration With Loop Closure Proposal 19

Questions? 19

References I References [1] I. J. Cox. Blanche-an experiment in
guidance and navigation of an autonomous robot vehicle. IEEE Transactions on robotics and automation, 7(2):193–204, 1991. [2] W. Hess, D. Kohler, H. Rapp, and D. Andor. Real-time loop closure in 2d lidar slam. In Robotics and Automation (ICRA), 2016 IEEE International Conference on, pages 1271–1278. IEEE, 2016. [3] Y. Li, N. Snavely, D. P. Huttenlocher, and P. Fua. Worldwide pose estimation using 3d point clouds. In Large-Scale Visual Geo-Localization, pages 147–163. Springer, 2016.

References II [4] T. Sattler, B. Leibe, and L. Kobbelt.
Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016. [5] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261, 2016. [6] J. Wang, H. Zha, and R. Cipolla. Coarse-to-fine vision-based localization by indexing scale-invariant features. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 36(2):413–422, 2006. [7] B. Zeisl, T. Sattler, and M. Pollefeys. Camera pose voting for large-scale image-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 2704–2712, 2015.

References III [8] B. Zhang, Q. Zhao, W. Feng, M.
Sun, and W. Jia. Sift-based indoor localization for older adults using wearable camera. In Biomedical Engineering Conference (NEBEC), 2015 41st Annual Northeast, pages 1–2. IEEE, 2015.

Towards Accurate Kidnap Resolution Through Deep Learning

Towards Accurate Kidnap Resolution Through Deep Learning

Kent Sommer

Other Decks in Research

Featured

Transcript