Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dataset Culling: Towards Efficient Training of Distillation-based Domain Specific Models

Dataset Culling: Towards Efficient Training of Distillation-based Domain Specific Models

ICIP 2019

Yoshioka Lab (Keio CSG)

January 07, 2022
Tweet

More Decks by Yoshioka Lab (Keio CSG)

Other Decks in Research

Transcript

  1. © 2019 Kentaro Yoshioka Dataset Culling: Towards Efficient Training of

    Distillation- based Domain Specific Models K.Yoshioka(1)(2), E. Lee(2), S. Wong(2), M. Horowitz(2) (1) Toshiba (2) Stanford University IEEE ICIP 2019 Sept. 25
  2. 1 © 2019 Kentaro Yoshioka Introduction • Deep Learning based

    object detection has excellent accuracy. • Apps: for security, infrastructure, transportation.. Image credit: [Nest.com]
  3. 2 © 2019 Kentaro Yoshioka Introduction • Cost? • Requires

    many GPU-hours, difficult to scale. • Has accuracy-cost tradeoff. •How can we break this tradeoff? 101-layer Resnet: Imagenet accuracy 78% 10-layer Resnet: Imagenet accuracy 60%
  4. 3 © 2019 Kentaro Yoshioka [1]D. Kang, “Noscope: optimizing neural

    network queries over video at scale,” [2]R.Mullapudi “Online model distillation for efficient video inference,” Introduction: Domain Specific Models • Training compact domain specific models (DSMs) [1,2] • DSMs: a specialized model for specific env. {conference room, your house, your office, etc.} • Cuts down computation cost 5-20x Surveillance cam. data General dataset Images from MS-COCO(http://cocodataset.org/)
  5. 4 © 2019 Kentaro Yoshioka Introduction: What is Distillation? •

    Teacher model teaches the small student model to learn • Works without human interference Teacher provides “answers” Teacher model (large, general) Train model Domain data Teacher model (large, general) Domain Specific Model (Small, specialized)
  6. 5 © 2019 Kentaro Yoshioka Introduction: The Problem • Can

    gather lots of training data easily.. • A day’s worth of surveillance data =86,400 images @ 1FPS • Training 86,400 images require over 100 GPU-hours (Nvidia K80 on AWS) to train. • Unable to scale to deploying DSMs to thousands of cameras • Reducing the DSM training cost has not been explored.
  7. 6 © 2019 Kentaro Yoshioka Dataset Culling

  8. 7 © 2019 Kentaro Yoshioka Basic Idea of Dataset Culling

    • Reduces the dataset size 300x •Culls only “Easy” data; model accuracy is not harmed Total training time: 104 → 2.2 GPU-hours 47x improvement ☺
  9. 8 © 2019 Kentaro Yoshioka What is good training data?

    • “Difficult” data which the model makes a lot of mistakes. • No backprop is done if the model can perfectly predict. → Does not contribute to training. • Comparing teacher-student predictions are costly.. • Can we assess from student predictions only?
  10. 9 © 2019 Kentaro Yoshioka Difficulty assessment from confidence •

    Quantify good data by proposed “confidence loss” • Assesses the difficulty of prediction from the output probability. • Utilize a “pretrained” model. Car=0.49 Car=0.69 Car=0.79 Car=1.0 Person=0.19
  11. 10 © 2019 Kentaro Yoshioka Difficulty assessment from confidence •

    Quantify good data by proposed “confidence loss” • Assesses the difficulty of prediction from the output probability Car=0.49 Car=0.69 Car=0.79 Car=1.0 Image Conf. Loss: 3.79 → Cull images with low loss. Σ 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 Lconf Detection Confidence Compute loss for all detections.. Person=0.19
  12. 11 © 2019 Kentaro Yoshioka Examples. Data kept. Data culled.

  13. 12 © 2019 Kentaro Yoshioka Dataset culling pipeline • First,

    cull dataset using only the student model • Culls out majority of the data first (50x). Cull 50x
  14. 13 © 2019 Kentaro Yoshioka Dataset culling pipeline • Then,

    conduct a secondary culling using both teacher-student predictions. • Contributes to boosting the trained model accuracy. • Data is culled up to 300x by the pipeline.
  15. 14 © 2019 Kentaro Yoshioka Dataset culling pipeline Details in

    paper • Then, conduct a secondary culling using both teacher-student predictions. • Contributes to boosting the trained model accuracy. • Data is culled up to 300x by the pipeline.
  16. 15 © 2019 Kentaro Yoshioka Experiments

  17. 16 © 2019 Kentaro Yoshioka Experiment setups • Models pretrained

    on MS-COCO: • Student: Resnet-18 based Faster-RCNN • Teacher: Resnet-101 based Faster-RCNN • Dataset: 5 custom videos acquired from Youtube. • Train: first 24-hours • Validation: Subsequent 6-hours • Utilize teacher output as ground-truths
  18. 17 © 2019 Kentaro Yoshioka Qualitative results RawStudent TrainStudent TrainStudent+optResolution

    Teacher mAP=90.2, comp=28G mAP=94.8, comp=28G mAP=81.6, comp=28G mAP=78.2, comp=28G mAP=71.3, comp=28G mAP=52.7, comp=28G Oracle, comp=1 Oracle, comp=1 Oracle, comp=1 mAP=89.6, comp=7G mAP=93.2, comp=18G mAP=80.7, comp=18G udent TrainStudent TrainStudent+optResolution Teacher mAP=90.2, comp=28G mAP=94.8, comp=28G mAP=81.6, comp=28G comp=28G comp=28G comp=28G Oracle, comp=128G Oracle, comp=128G Oracle, comp=128G mAP=89.6, comp=7G mAP=93.2, comp=18G mAP=80.7, comp=18G
  19. 18 © 2019 Kentaro Yoshioka Quantitative Results 64 128 256

    Full (86,400) No Training Mean Accuracy [mAP] 85.56 (-3.0%) 88.3 (-0.3%) 89.3 (+0.8%) 88.5 58.6 Total train time [hours] 1.9 (54x) 2.0 (50x) 2.2 (47x) 104 - Student predictions 1.54 1.54 1.54 - - Student training 0.07 0.14 0.28 96 - Teacher predictions 0.33 0.33 0.33 8 - Culled dataset size • Can cull the dataset size to 300x, without accuracy drops or even with slight improvements.
  20. 19 © 2019 Kentaro Yoshioka Conclusions • While DSMs can

    reduce the inference cost, training them can take many GPU-hours. • We proposed Dataset Culling, which reduces the DSM training cost by 47x. •We found that by culling easy-to-predict data, the accuracy drop can be minimized. •Evaluated on our long-duration dataset, we saw little accuracy penalty when trained with culled datasets. •One step towards deploying DSMs to the real world ☺ Codes and dataset available: https://github.com/kentaroy47/DatasetCulling
  21. 20 © 2019 Kentaro Yoshioka Ablation study • Entropy implements

    the loss function for active learning. • Using teacher-student comparisons achieve best accuracy (Precision) • Our dataset culling pipeline with Confidence + Precision has the best tradeoff of accuracy and training time.