Dataset Culling: Towards Efficient Training of Distillation-based Domain Specific Models

© 2019 Kentaro Yoshioka Dataset Culling: Towards Efficient Training of
Distillation- based Domain Specific Models K.Yoshioka(1)(2), E. Lee(2), S. Wong(2), M. Horowitz(2) (1) Toshiba (2) Stanford University IEEE ICIP 2019 Sept. 25

1 © 2019 Kentaro Yoshioka Introduction • Deep Learning based
object detection has excellent accuracy. • Apps: for security, infrastructure, transportation.. Image credit: [Nest.com]

2 © 2019 Kentaro Yoshioka Introduction • Cost? • Requires
many GPU-hours, difficult to scale. • Has accuracy-cost tradeoff. •How can we break this tradeoff? 101-layer Resnet: Imagenet accuracy 78% 10-layer Resnet: Imagenet accuracy 60%

3 © 2019 Kentaro Yoshioka [1]D. Kang, “Noscope: optimizing neural
network queries over video at scale,” [2]R.Mullapudi “Online model distillation for efficient video inference,” Introduction: Domain Specific Models • Training compact domain specific models (DSMs) [1,2] • DSMs: a specialized model for specific env. {conference room, your house, your office, etc.} • Cuts down computation cost 5-20x Surveillance cam. data General dataset Images from MS-COCO(http://cocodataset.org/)

4 © 2019 Kentaro Yoshioka Introduction: What is Distillation? •
Teacher model teaches the small student model to learn • Works without human interference Teacher provides “answers” Teacher model (large, general) Train model Domain data Teacher model (large, general) Domain Specific Model (Small, specialized)

5 © 2019 Kentaro Yoshioka Introduction: The Problem • Can
gather lots of training data easily.. • A day’s worth of surveillance data =86,400 images @ 1FPS • Training 86,400 images require over 100 GPU-hours (Nvidia K80 on AWS) to train. • Unable to scale to deploying DSMs to thousands of cameras • Reducing the DSM training cost has not been explored.

7 © 2019 Kentaro Yoshioka Basic Idea of Dataset Culling
• Reduces the dataset size 300x •Culls only “Easy” data; model accuracy is not harmed Total training time: 104 → 2.2 GPU-hours 47x improvement ☺

8 © 2019 Kentaro Yoshioka What is good training data?
• “Difficult” data which the model makes a lot of mistakes. • No backprop is done if the model can perfectly predict. → Does not contribute to training. • Comparing teacher-student predictions are costly.. • Can we assess from student predictions only?

9 © 2019 Kentaro Yoshioka Difficulty assessment from confidence •
Quantify good data by proposed “confidence loss” • Assesses the difficulty of prediction from the output probability. • Utilize a “pretrained” model. Car=0.49 Car=0.69 Car=0.79 Car=1.0 Person=0.19

10 © 2019 Kentaro Yoshioka Difficulty assessment from confidence •
Quantify good data by proposed “confidence loss” • Assesses the difficulty of prediction from the output probability Car=0.49 Car=0.69 Car=0.79 Car=1.0 Image Conf. Loss: 3.79 → Cull images with low loss. Σ 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 Lconf Detection Confidence Compute loss for all detections.. Person=0.19

12 © 2019 Kentaro Yoshioka Dataset culling pipeline • First,
cull dataset using only the student model • Culls out majority of the data first (50x). Cull 50x

13 © 2019 Kentaro Yoshioka Dataset culling pipeline • Then,
conduct a secondary culling using both teacher-student predictions. • Contributes to boosting the trained model accuracy. • Data is culled up to 300x by the pipeline.

14 © 2019 Kentaro Yoshioka Dataset culling pipeline Details in
paper • Then, conduct a secondary culling using both teacher-student predictions. • Contributes to boosting the trained model accuracy. • Data is culled up to 300x by the pipeline.

16 © 2019 Kentaro Yoshioka Experiment setups • Models pretrained
on MS-COCO: • Student: Resnet-18 based Faster-RCNN • Teacher: Resnet-101 based Faster-RCNN • Dataset: 5 custom videos acquired from Youtube. • Train: first 24-hours • Validation: Subsequent 6-hours • Utilize teacher output as ground-truths

17 © 2019 Kentaro Yoshioka Qualitative results RawStudent TrainStudent TrainStudent+optResolution
Teacher mAP=90.2, comp=28G mAP=94.8, comp=28G mAP=81.6, comp=28G mAP=78.2, comp=28G mAP=71.3, comp=28G mAP=52.7, comp=28G Oracle, comp=1 Oracle, comp=1 Oracle, comp=1 mAP=89.6, comp=7G mAP=93.2, comp=18G mAP=80.7, comp=18G udent TrainStudent TrainStudent+optResolution Teacher mAP=90.2, comp=28G mAP=94.8, comp=28G mAP=81.6, comp=28G comp=28G comp=28G comp=28G Oracle, comp=128G Oracle, comp=128G Oracle, comp=128G mAP=89.6, comp=7G mAP=93.2, comp=18G mAP=80.7, comp=18G

18 © 2019 Kentaro Yoshioka Quantitative Results 64 128 256
Full (86,400) No Training Mean Accuracy [mAP] 85.56 (-3.0%) 88.3 (-0.3%) 89.3 (+0.8%) 88.5 58.6 Total train time [hours] 1.9 (54x) 2.0 (50x) 2.2 (47x) 104 - Student predictions 1.54 1.54 1.54 - - Student training 0.07 0.14 0.28 96 - Teacher predictions 0.33 0.33 0.33 8 - Culled dataset size • Can cull the dataset size to 300x, without accuracy drops or even with slight improvements.

19 © 2019 Kentaro Yoshioka Conclusions • While DSMs can
reduce the inference cost, training them can take many GPU-hours. • We proposed Dataset Culling, which reduces the DSM training cost by 47x. •We found that by culling easy-to-predict data, the accuracy drop can be minimized. •Evaluated on our long-duration dataset, we saw little accuracy penalty when trained with culled datasets. •One step towards deploying DSMs to the real world ☺ Codes and dataset available: https://github.com/kentaroy47/DatasetCulling

20 © 2019 Kentaro Yoshioka Ablation study • Entropy implements
the loss function for active learning. • Using teacher-student comparisons achieve best accuracy (Precision) • Our dataset culling pipeline with Confidence + Precision has the best tradeoff of accuracy and training time.

Dataset Culling: Towards Efficient Training of ...

Dataset Culling: Towards Efficient Training of Distillation-based Domain Specific Models

Yoshioka Lab (Keio CSG)

More Decks by Yoshioka Lab (Keio CSG)

Other Decks in Research

Featured

Transcript

© 2019 Kentaro Yoshioka Dataset Culling: Towards Efficient Training of

1 © 2019 Kentaro Yoshioka Introduction • Deep Learning based

2 © 2019 Kentaro Yoshioka Introduction • Cost? • Requires

3 © 2019 Kentaro Yoshioka [1]D. Kang, “Noscope: optimizing neural

4 © 2019 Kentaro Yoshioka Introduction: What is Distillation? •

5 © 2019 Kentaro Yoshioka Introduction: The Problem • Can

6 © 2019 Kentaro Yoshioka Dataset Culling

7 © 2019 Kentaro Yoshioka Basic Idea of Dataset Culling

8 © 2019 Kentaro Yoshioka What is good training data?

9 © 2019 Kentaro Yoshioka Difficulty assessment from confidence •

10 © 2019 Kentaro Yoshioka Difficulty assessment from confidence •

11 © 2019 Kentaro Yoshioka Examples. Data kept. Data culled.

12 © 2019 Kentaro Yoshioka Dataset culling pipeline • First,

13 © 2019 Kentaro Yoshioka Dataset culling pipeline • Then,

14 © 2019 Kentaro Yoshioka Dataset culling pipeline Details in

15 © 2019 Kentaro Yoshioka Experiments

16 © 2019 Kentaro Yoshioka Experiment setups • Models pretrained

17 © 2019 Kentaro Yoshioka Qualitative results RawStudent TrainStudent TrainStudent+optResolution

18 © 2019 Kentaro Yoshioka Quantitative Results 64 128 256

19 © 2019 Kentaro Yoshioka Conclusions • While DSMs can

20 © 2019 Kentaro Yoshioka Ablation study • Entropy implements