Transfer Learning for Fun and Profit

Slide 1

Slide 1 text

Transfer Learning FOR FUN AND PROFIT Intro DATA GEEKS DAY 2017 1 Alexander Hirner @cybertreiber 07.10.2017 .io Labelling Compute Accuracy Transfer Learning

Slide 2

Slide 2 text

Intro 10/7/17 DATA GEEKS DAY 2017 2 DishTracker automates food checkout Conceived on a napkin in Lappland Problem • Manual order control • Highly trusted individuals • Fatigued after 2h: (rotate or assume error rate) • “The 2nd most important people” • Limited throughput

Slide 3

Slide 3 text

Product 10/7/17 DATA GEEKS DAY 2017 3 • 98.2% agreement for frequent dishes • >20 types of dishes • 28h of sample video (first two days) • 1 day later in operation • < 1 sec. latency from network source to marked-up video stream • 3x20fps with 20% GPU load • High resonance from industry Stress tested at Oktoberfest 2017

Slide 4

Slide 4 text

Content 10/7/17 DATA GEEKS DAY 2017 4 Stress tested at Oktoberfest 2017 1. What are the challenges 2. How to utilize transfer learning for a subset of these challenges 3. Wrap up and outlook

Slide 5

Slide 5 text

Core Challenges 10/7/17 DATA GEEKS DAY 2017 5 [cf. “Tublets” approach Wang, CVPR 2017 https://youtu.be/pK6XAk95kUY?t=35m40s] From Video to Detection: §Annotation time §Label quality, taxonomy, completeness §Class inbalance (fat-tail) From Detection to Realtime Tracking: §Blur §Occlusion §Noisy detections §Compute Time

Slide 6

Slide 6 text

Architecture 10/7/17 DATA GEEKS DAY 2017 6 [Apache] Superset Flask Stream Offline Acquisition Science Management and Views both 20% 40% 40% [embedded systems cam] Data..

Slide 7

Slide 7 text

Core Solutions From Video to Detection: §Scene/Shot Extraction that maximizes pose variance §Automated Labelling Tool: §Region Proposals §Label Proposal §Multi-tenant collaboration §Training Strategy for incomplete and noisy labels From Detection to Tracking in Realtime: §Occlusion logic §Aggregate state over object life-time §Fusion with physical model, motion-flow 10/7/17 DATA GEEKS DAY 2017 7

Slide 8

Slide 8 text

Label Proposals: model pre-selection 10/7/17 DATA GEEKS DAY 2017 8 squeezenet, alexnet, (resnet34): • Robust to retrain • Quick to retrain • Computationally feasible Toy Dataset: two classes, scale variance

Slide 9

Slide 9 text

Embedding Quality 10/7/17 DATA GEEKS DAY 2017 9 Example 1. Model [squeezenet, alexnet, resnet] 2. Layers [e.g. ‘features.1’, ‘features.2’] 3. Reduction to <2000 with avg_pool kernel size [3,4,5] 4. Assessment: 1. NN-ranking 2. Plausible false positives Process empty f1 dessert4 dessert1 f2 other alexnet on 12 images, 7 categories, darker = higher cosine distance [cf. Yosinski et al. 2014, https://arxiv.org/abs/1411.1792]

Slide 10

Slide 10 text

Embedding Quality 10/7/17 DATA GEEKS DAY 2017 10 Example • alexnet/resnet more accurate embedding than squeezenet • alexnet additionally: • Most plausible false and true positives (Column 1) • Highest degree of separation (Last Column) 1. Model [squeezenet, alexnet, resnet] 2. Layers [e.g. ‘features.1’, ‘features.2’] 3. Reduction to <2000 with avg_pool kernel size [3,4,5] 4. Assessment: 1. NN-ranking 2. Plausible false positives Process Result empty f1 dessert4 dessert1 f2 other Choice alexnet on 12 images, 7 categories, darker = higher cosine distance alexnet • Layer: ‘features’ (#1) • Kernel size for dim reduction: 3 • Resulting dimensionality: 1024

Slide 11

Slide 11 text

Labelling Tool - Effects 10/7/17 DATA GEEKS DAY 2017 11 Instant feedback motivates, best practices emerge collaboratively “--that the program then recognizes dishes is clear. But [parts of the body]… I‘m impressed

Slide 12

Slide 12 text

Label Proposals: model re-selection 10/7/17 DATA GEEKS DAY 2017 12 Dish and body parts: many classes, rotation and blur variance WIP, but: • Deep retraining wins over shallow given now available real-world data • Warrants new qualitative assessment along the Pareto curve • Cyclical LR helps some models (resnet, densenet) Constant LR w/ momentum Cyclical Learning Rate [Smith 2017, arxiv.org/abs/1506.01186] [https://github.com/ahirner/pytorch-retraining] [https://medium.com/towards-data-science/ transfer-learning-with-pytorch-72a052297c51]

Slide 13

Slide 13 text

One more thing (Training Process) 10/7/17 DATA GEEKS DAY 2017 13 Overfitting = Unit Test of Machine Learning Loss: decreasing monotonically (almost) Different Eval Bug No Bug

Slide 14

Slide 14 text

10/7/17 14 Transfer (not) all the things E.g.: Learning 2 Learn: $2 Mio. of compute Labelling Costs Compute Costs Accuracy + Partial Confidentiality + Stepping Stone for composable AI + Technology transfer between industry and academia Have all three! Share not necessarily Predictions / Generator Share maybe Optimization Method Share Share not necessarily Ground Truth Data Parameters Compute Graph Tradeoff without… Win/Win with Transfer Learning Simulation https://news.ycombinator.com/item?id=14950122 à Join OpenMined to be on the frontier of federated learning with confidentiality guarantees DATA GEEKS DAY 2017

Slide 15

Slide 15 text

10/7/17 15 Takeaways • One-shot learning = ultimate goal • ... where machines ask the right questions • … where models are learnt from from private data • Datascience is 20% work, but payback is highly non- linear • Make iteration of your analysis pipeline: • Collaborative • Effortless • Work with us! [[email protected]] DATA GEEKS DAY 2017