Slide 1

Slide 1 text

DAY 1 “技” Developer Day Object Detector: WHY Do You Need and HOW Can You Own Yaping Sun ABEJA.Inc

Slide 2

Slide 2 text

Self-Introduction Yaping Sun http://muchuanyun.github.io/ • Majored in Computer Engineering and Microelectronics • Data Engineer @ABEJA, Inc • Interested in applications of Deep Learning in use-cases

Slide 3

Slide 3 text

Object Detection and Applications Object Detection in Machine Learning Experience with ABEJA Platform Datasets Evaluation Criteria Representative Architectures

Slide 4

Slide 4 text

Object Detection in Machine Learning Experience with ABEJA Platform Datasets Evaluation Criteria Representative Architectures Object Detection and Applications

Slide 5

Slide 5 text

Computer Vision Tasks CAT CAT DOG, DOG, CAT DOG, DOG, CAT http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture11.pdf Classification Object Detection Instance Segmentation

Slide 6

Slide 6 text

Object Detection • A basic concept from Human Intelligence • Cornerstone of true AI • Initial step for tracking, identification, human-computer interaction etc. https://github.com/tensorflow/models/tree/master/research/object_detection

Slide 7

Slide 7 text

• Face Detection • People Counting • Self-Driving Cars • Pedestrian/Vehicle detection • Video Surveillance • Anomaly Detection • … Why Do You Need?

Slide 8

Slide 8 text

Case 1: Visual Search • Users upload photo to discover similar-looking products • Usually multiple objects exist in one image • Object Detection reduces computational cost and improves accuracy in visual search system. • Wide application in Fashion business https://labs.pinterest.com/assets/paper/visual_search_at_pinterest.pdf

Slide 9

Slide 9 text

Case 2: Analysis in Drone Imagery • Remote monitoring of a housing construction project through Drone • Routine Inspection of solar farms • Early plant disease detection in agriculture https://medium.com/nanonets/how-we-flew-a-drone-to-monitor- construction-projects-in-africa-using-deep-learning-b792f5c9c471

Slide 10

Slide 10 text

Case 3: Behavior Observation • Analysis of users’ behavior helps improving product • Use object detection to track the movement of items in a kitchen • 1/100 time cost of manual work https://six2018.abejainc.com/docs/b3_six2018.pdf

Slide 11

Slide 11 text

Practice: Deconstruct a Problem Example: Unmanned Store • Required Functions • e.g. track what customer picks from a shelf • e.g. checkout within shopping carts • Possible Approach • e.g. track hands • e.g. detect products • Feasibility Evaluation • cameras (resolution, position, …) • accuracy expectation • cost vs. RFID?

Slide 12

Slide 12 text

Object Detection in Machine Learning Experience with ABEJA Platform Datasets Evaluation Criteria Representative Architectures Object Detection and Applications

Slide 13

Slide 13 text

Object Detection: Problem Definition Input: • Image (RGB) Output: • class 0, (x1, y1, w1, h1), p1 • class 0, (x2, y2, w2, h2), p2 • class 1, (x3, y3, w3, h3), p3 • … (x, y) w h ‘cat’ (cj , bj , pj ) This image is CC0 public domain.

Slide 14

Slide 14 text

Famous Challenges PASCAL VOC (2007) ImageNet ILSVRC (2013) MS COCO (2015) Open Images (2018) # Classes 20 200 80 500 # Training Images 11K 476K 200K 1.7M # Objects 27K 534K 1.5M 12M Note standard scaled up version of PASCAL VOC more difficult than VOC broader range of classes http://host.robots.ox.ac.uk/pascal/VOC/ http://www.image-net.org/challenges/LSVRC/ http://cocodataset.org/#home https://www.kaggle.com/c/google-ai-open-images-object-detection-track

Slide 15

Slide 15 text

Object Detection: Evaluation (1) AP (Average Precision) average of the maximum precisions at different recall values. mAP (mean Average Precision) mean of AP over all categories AP@IoU average precision over all IoU thresholds [0.5:0.05:0.95]. AP@Scales average precision for different object sizes [small, medium, large]. AR (Average Recall) averaged maximum recall given a fixed number of detections per image

Slide 16

Slide 16 text

Object Detection: Evaluation (2) TP (True Positive): correct class and IoU > 0.5 FP(False Positive): wrong class or IoU < 0.5 FN (False Negative): missed object Ground truth Prediction IoU = area of overlap area of union Precision = TP TP + FP Recall = TP TP + FN AP = ∑ r∈Recall([0,1]) Precision(tr ) |Recall([0,1])| AP: average of maximum precision at all recall levels Intersection over Union:

Slide 17

Slide 17 text

Object Detection: Evaluation (3) Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 3 4 5 6 7 8 9 10 Precision = TP TP + FP Recall = TP TP + FN TP=1, FP=0, FN=4 Precision = 1/1 Recall = 1/5

Slide 18

Slide 18 text

Object Detection: Evaluation (3) Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 4 5 6 7 8 9 10 Precision = TP TP + FP Recall = TP TP + FN TP=2, FP=0, FN=3 Precision = 2/2 Recall = 2/5

Slide 19

Slide 19 text

Object Detection: Evaluation (3) Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 5 6 7 8 9 10 Precision = TP TP + FP Recall = TP TP + FN TP=2, FP=1, FN=3 Precision = 2/3 Recall = 2/5

Slide 20

Slide 20 text

Object Detection: Evaluation (3) Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 FALSE 0.5 0.4 5 FALSE 0.4 0.4 6 TRUE 0.5 0.6 7 TRUE 0.57 0.8 8 FALSE 0.5 0.8 9 FALSE 0.44 0.8 10 TRUE 0.5 1.0 Precision = TP TP + FP Recall = TP TP + FN AP: average of maximum precision at all recall levels

Slide 21

Slide 21 text

Object Detection: Evaluation (3) Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 FALSE 0.5 0.4 5 FALSE 0.4 0.4 6 TRUE 0.5 0.6 7 TRUE 0.57 0.8 8 FALSE 0.5 0.8 9 FALSE 0.44 0.8 10 TRUE 0.5 1.0 Recall* Precision* 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10

Slide 22

Slide 22 text

Object Detection: Evaluation (3) AP = (5x1.0+4x0.57+2x0.5)/11 Rank Correct? Precision Recall 1 TRUE 1.0 0.2 2 TRUE 1.0 0.4 3 FALSE 0.67 0.4 4 FALSE 0.5 0.4 5 FALSE 0.4 0.4 6 TRUE 0.5 0.6 7 TRUE 0.57 0.8 8 FALSE 0.5 0.8 9 FALSE 0.44 0.8 10 TRUE 0.5 1.0 Recall* Precision* 0 1.0 0.1 1.0 0.2 1.0 0.3 1.0 0.4 1.0 0.5 0.57 0.6 0.57 0.7 0.57 0.8 0.57 0.9 0.5 1.0 0.5 Example: For category ‘cat’: # Ground truth = 5 # Prediction = 10

Slide 23

Slide 23 text

Object Detection in Machine Learning Experience with ABEJA Platform Datasets Evaluation Criteria Representative Architectures Object Detection and Applications

Slide 24

Slide 24 text

Challenges we are facing Illumination Blur & Motion Occlusion Scale, Size, Pose, Clutter Deformation

Slide 25

Slide 25 text

Think Intuitively… • Use a sliding window to go over the full image • Crop the area and do classification • Repeat for different window size But… • Return multiple detections • Too slow

Slide 26

Slide 26 text

Non-Maximum-Suppression (NMS) • Start with detection with highest confidence score • Measure its IoUs with other detections • Remove detections with IoU > threshold (e.g. 0.5) • Repeat the steps with the remaining detections

Slide 27

Slide 27 text

Milestones of Object Detection • Before 2012: Handcrafted features • After 2012: benefit from DCNNs https://arxiv.org/abs/1809.02165

Slide 28

Slide 28 text

Representative Object Detection Architectures • Two-Stage Detector • RCNN series • R-FCN • One-Stage Detector • YOLO series • SSD

Slide 29

Slide 29 text

RCNN / Fast RCNN / Faster RCNN Highlights • Region proposal (‘blob-like’) • CNN based classifier • SOTA of 2014 Problems • Multi-stage pipeline • Training is too heavy • Detection is slow (47s/image on GPU) https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1311.2524

Slide 30

Slide 30 text

RCNN / Fast RCNN / Faster RCNN Highlights • Feature is calculated only once • Multi-task loss of classification and regression • Faster than RCNN Problems • Region proposal is still the bottleneck. https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1504.08083

Slide 31

Slide 31 text

RCNN / Fast RCNN / Faster RCNN Highlights • Use CNN to do region proposal (RPN), other parts are just like Fast RCNN • Introduce Anchors • Joint training Problems: • Still slow https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1506.01497

Slide 32

Slide 32 text

R-FCN (Region-based Fully Convolutional Network) Highlights • Shared RoI subnet • Position sensitive RoI pooling • Faster than Faster RCNN Problems: • More computational cost than single stage detector https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1605.06409

Slide 33

Slide 33 text

Can We Drop Region Proposal Step?

Slide 34

Slide 34 text

YOLO (You-Only-Look-Once) Highlights • Super fast • Use features from entire image Problems: • Weak on small objects • A lot localization errors https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1506.02640

Slide 35

Slide 35 text

SSD (Single-Shot-Detector) Highlights • Use multiple CONV feature maps • Competitive accuracy with Faster RCNN • Faster than YOLO-v1 Problems • Poor performance on small objects https://arxiv.org/abs/1809.02165 https://arxiv.org/abs/1512.02325

Slide 36

Slide 36 text

Which is the best? Given application&platform: Tradeoff of speed, memory and accuracy Examples: • Mobile device: small memory footprint • Realtime applications: test-time inference speed • Server-side system: accuracy (subject to throughput constraint)

Slide 37

Slide 37 text

Configuration: Feature Extractor https://arxiv.org/pdf/1611.10012.pdf

Slide 38

Slide 38 text

Configuration: Input Image Size https://arxiv.org/pdf/1611.10012.pdf

Slide 39

Slide 39 text

mAP@Scales https://arxiv.org/pdf/1611.10012.pdf

Slide 40

Slide 40 text

Latest SOTA https://arxiv.org/abs/1811.04533

Slide 41

Slide 41 text

Before getting hands dirty… • Prepare a proper dataset • Collect good quality images • Annotation work is necessary • Understand the data • Clarify the deployment environment • Edge device / Local machine / Cloud • Real-time? • Pick a model

Slide 42

Slide 42 text

Object Detection in Machine Learning Experience with ABEJA Platform Datasets Evaluation Criteria Representative Architectures Object Detection and Applications

Slide 43

Slide 43 text

• Data • Accumulation • Management • Annotation • ML/DL Model • Training • Deployment • Serving and Inference • Version Management A Glimpse into ABEJA Platform

Slide 44

Slide 44 text

Technical Tutorials • Sample codes for classification, object detection, semantic segmentation https://github.com/abeja-inc/abeja-platform-samples • Tech Blogs on ABEJA Platform https://qiita.com/advent-calendar/2018/abejaplatform • ABEJA’s General Tech Blog: https://tech-blog.abeja.asia/

Slide 45

Slide 45 text

Object Detection in Machine Learning Experience with ABEJA Platform Datasets Evaluation Criteria Representative Architectures Object Detection and Applications

Slide 46

Slide 46 text

After the lecture is over, we are waiting at the Ask the Speaker section of the exhibition area. If you have any questions, please come to this corner after the session ends. See you Ask the Speaker ABEJA 17 6 5 4 3 1 2 9 10 11 12 7 8 16 15 ABEJA Ask the Speaker 14 3F Hall ABEJAծ ABEJA Deep Learning ABEJA

Slide 47

Slide 47 text

The contents introduced today and the products and services that support the backside of these, We have prepared a booth at the 3F exhibition hall. Please drop by during the session. GO EXPO 2F 3F Room A Room B Room C Room D Hall ٖؒك٦ة٦ WC ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي Room E ٖؒك٦ة٦ WC ♧菙勻㜥罏「➰ ٝ؟٦ أؙ 闌怴罏 「➰ 1F 2F 3F Floor Maps Room A Room B Room C Room D Hall ٖؒك٦ة٦ WC ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي ㉀锑 ٕ٦ي Room E ٖؒك٦ة٦ WC Room W ♧菙勻㜥罏「➰ أهٝ؟٦ رأؙ 闌怴罏 「➰ WC ٖؒك٦ة٦ Here

Slide 48

Slide 48 text

Tomorrow will be announced in many sessions how the technology introduced today is actually used by clients. Please come tomorrow GO Day2 !! - for ABEJA Platform

Slide 49

Slide 49 text

Please give us feedback on this session if you like ID of this session dev-e-2 Object Detector: WHY Do You Need and HOW Can You Own Feedback will be used to develop products and deliver more information https://goo.gl/forms/erEBAsrQK4XKEv352

Slide 50

Slide 50 text

Thank you.