Transfer Learning for Fun and Profit

Transfer Learning for Fun and Profit

Transfer learning is exciting because it unlocks solutions that weren't feasible a few years ago. In fact, choices to compose from pre-trained models for computer vision tasks became abundant. In this talk, we will explore how to make these choices for image classification and feature extraction.
The analysis is inspired by practical use-cases where human supervision and compute time is often limited. The results are presented for two datasets across PyTorch’s model-zoo. First, a toy dataset where scale invariance is important. Second, a dataset from an object detection pipeline where rotation invariance is important. Lastly, we will cover the human success factors of such a project.

3c3f3f18c25ea5283640ebd23553e7c6?s=128

MunichDataGeeks

October 07, 2017
Tweet

Transcript

  1. Transfer Learning FOR FUN AND PROFIT Intro DATA GEEKS DAY

    2017 1 Alexander Hirner @cybertreiber 07.10.2017 .io Labelling Compute Accuracy Transfer Learning
  2. Intro 10/7/17 DATA GEEKS DAY 2017 2 DishTracker automates food

    checkout Conceived on a napkin in Lappland Problem • Manual order control • Highly trusted individuals • Fatigued after 2h: (rotate or assume error rate) • “The 2nd most important people” • Limited throughput
  3. Product 10/7/17 DATA GEEKS DAY 2017 3 • 98.2% agreement

    for frequent dishes • >20 types of dishes • 28h of sample video (first two days) • 1 day later in operation • < 1 sec. latency from network source to marked-up video stream • 3x20fps with 20% GPU load • High resonance from industry Stress tested at Oktoberfest 2017
  4. Content 10/7/17 DATA GEEKS DAY 2017 4 Stress tested at

    Oktoberfest 2017 1. What are the challenges 2. How to utilize transfer learning for a subset of these challenges 3. Wrap up and outlook
  5. Core Challenges 10/7/17 DATA GEEKS DAY 2017 5 [cf. “Tublets”

    approach Wang, CVPR 2017 https://youtu.be/pK6XAk95kUY?t=35m40s] From Video to Detection: §Annotation time §Label quality, taxonomy, completeness §Class inbalance (fat-tail) From Detection to Realtime Tracking: §Blur §Occlusion §Noisy detections §Compute Time
  6. Architecture 10/7/17 DATA GEEKS DAY 2017 6 [Apache] Superset Flask

    Stream Offline Acquisition Science Management and Views both 20% 40% 40% [embedded systems cam] Data..
  7. Core Solutions From Video to Detection: §Scene/Shot Extraction that maximizes

    pose variance §Automated Labelling Tool: §Region Proposals §Label Proposal §Multi-tenant collaboration §Training Strategy for incomplete and noisy labels From Detection to Tracking in Realtime: §Occlusion logic §Aggregate state over object life-time §Fusion with physical model, motion-flow 10/7/17 DATA GEEKS DAY 2017 7
  8. Label Proposals: model pre-selection 10/7/17 DATA GEEKS DAY 2017 8

    squeezenet, alexnet, (resnet34): • Robust to retrain • Quick to retrain • Computationally feasible Toy Dataset: two classes, scale variance
  9. Embedding Quality 10/7/17 DATA GEEKS DAY 2017 9 Example 1.

    Model [squeezenet, alexnet, resnet] 2. Layers [e.g. ‘features.1’, ‘features.2’] 3. Reduction to <2000 with avg_pool kernel size [3,4,5] 4. Assessment: 1. NN-ranking 2. Plausible false positives Process empty f1 dessert4 dessert1 f2 other alexnet on 12 images, 7 categories, darker = higher cosine distance [cf. Yosinski et al. 2014, https://arxiv.org/abs/1411.1792]
  10. Embedding Quality 10/7/17 DATA GEEKS DAY 2017 10 Example •

    alexnet/resnet more accurate embedding than squeezenet • alexnet additionally: • Most plausible false and true positives (Column 1) • Highest degree of separation (Last Column) 1. Model [squeezenet, alexnet, resnet] 2. Layers [e.g. ‘features.1’, ‘features.2’] 3. Reduction to <2000 with avg_pool kernel size [3,4,5] 4. Assessment: 1. NN-ranking 2. Plausible false positives Process Result empty f1 dessert4 dessert1 f2 other Choice alexnet on 12 images, 7 categories, darker = higher cosine distance alexnet • Layer: ‘features’ (#1) • Kernel size for dim reduction: 3 • Resulting dimensionality: 1024
  11. Labelling Tool - Effects 10/7/17 DATA GEEKS DAY 2017 11

    Instant feedback motivates, best practices emerge collaboratively “--that the program then recognizes dishes is clear. But [parts of the body]… I‘m impressed
  12. Label Proposals: model re-selection 10/7/17 DATA GEEKS DAY 2017 12

    Dish and body parts: many classes, rotation and blur variance WIP, but: • Deep retraining wins over shallow given now available real-world data • Warrants new qualitative assessment along the Pareto curve • Cyclical LR helps some models (resnet, densenet) Constant LR w/ momentum Cyclical Learning Rate [Smith 2017, arxiv.org/abs/1506.01186] [https://github.com/ahirner/pytorch-retraining] [https://medium.com/towards-data-science/ transfer-learning-with-pytorch-72a052297c51]
  13. One more thing (Training Process) 10/7/17 DATA GEEKS DAY 2017

    13 Overfitting = Unit Test of Machine Learning Loss: decreasing monotonically (almost) Different Eval Bug No Bug
  14. 10/7/17 14 Transfer (not) all the things E.g.: Learning 2

    Learn: $2 Mio. of compute Labelling Costs Compute Costs Accuracy + Partial Confidentiality + Stepping Stone for composable AI + Technology transfer between industry and academia Have all three! Share not necessarily Predictions / Generator Share maybe Optimization Method Share Share not necessarily Ground Truth Data Parameters Compute Graph Tradeoff without… Win/Win with Transfer Learning Simulation https://news.ycombinator.com/item?id=14950122 à Join OpenMined to be on the frontier of federated learning with confidentiality guarantees DATA GEEKS DAY 2017
  15. 10/7/17 15 Takeaways • One-shot learning = ultimate goal •

    ... where machines ask the right questions • … where models are learnt from from private data • Datascience is 20% work, but payback is highly non- linear • Make iteration of your analysis pipeline: • Collaborative • Effortless • Work with us! [alexander.hirner@moonvision.io] DATA GEEKS DAY 2017