Image Classification Intro

Image Classification Yasser Souri [email protected] Image Processing Lab Computer Engineering
Department Sharif University of Technology

Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 
Image Classification Systems •  Fine-Grained Visual Recognition 2 Image Processing Lab - Sharif

Classical View of Computer Vision Inspired by brain •  Low
level –  Image Formation –  Filtering –  Edge Detection •  Mid level –  Shape –  Texture –  Segmentation •  High level –  Pattern Recognition Image Processing Lab - Sharif 4

Computer Vision as 3 R’s Image Processing Lab - Sharif
5 Recognition Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"

[1] Three R’s of Vision, Jitendra Malik, CVML 2013" Computer
Vision as 3 R’s Image Processing Lab - Sharif 6 Recognition Reorganization Reconstruction

The Interconnections Image Processing Lab - Sharif 10 Recognition
Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"

Visual Recognition Image Processing Lab - Sharif 11 Recognition
Reorganization Reconstruction

Goal of Recognition Image Processing Lab - Sharif 12
Slide Credit: Jitendra Malik"

Goal of Recognition •  Object Recognition Image Processing Lab -
Sharif 13 Slide Credit: Jitendra Malik"

Goal of Recognition •  Object Recognition •  Semantic Segmentation Image
Processing Lab - Sharif 14 Slide Credit: Jitendra Malik"

Goal of Recognition •  Object Recognition •  Semantic Segmentation • 
Pose Estimation Image Processing Lab - Sharif 15 Slide Credit: Jitendra Malik"

Pose Estimation •  Action Recognition Image Processing Lab - Sharif 16 Slide Credit: Jitendra Malik"

Pose Estimation •  Action Recognition •  Attribute Classification Image Processing Lab - Sharif 17 Slide Credit: Jitendra Malik"

Recognition: Object / Scene •  Object •  Scene Image Processing
Lab - Sharif 18 Recently new recognition tasks have been introduced: Attribute[2], Action[3], Memorability[4], Popularity[5], etc. [2] Ali Farhadi, et al, Describing objects by their attributes, CVPR 2009" [3] Bangpeng Yao, et al, Human action recognition by learning bases of action attributes and parts, ICCV 2011" [4] Aditya Khosla, et al, Memorability of Image Regions, NIPS 2012" [5] Aditya Khosla, et al, What makes an image popular? WWW 2014"

Object Recognition: 2D / 3D •  2D Objects •  3D
Objects Image Processing Lab - Sharif 19

Object Recognition Tasks •  Object Instance Recognition Image Processing Lab
- Sharif 20 Slide Credit: Cordelia Schmid"

Object Recognition Tasks •  Object Category Recognition – Image Classification Image
Processing Lab - Sharif 21 Slide Credit: Cordelia Schmid" Cow: ✓ Car: ✓ Bike: ✗ Horse: ✗ …

Object Recognition Tasks •  Object Category Recognition – Object Detection Image
Processing Lab - Sharif 22 Slide Credit: Cordelia Schmid" Cow, (x, y, w, h) Car, (x’, y’, w’, h')

Object Recognition Tasks •  Object Instance Recognition •  Object Category
Recognition – Image Classification – Object Detection Image Processing Lab - Sharif 23

Object Recognition Tasks •  Object Instance Recognition •  Object Category
Recognition – Image Classification – Object Detection Image Processing Lab - Sharif 24

Image Classification •  Given –  Positive training images containing an
object class –  Negative training images not containing that object class –  Classify a test image whether it contains the object class Image Processing Lab - Sharif 27 Slide Credit: Cordelia Schmid"

Datasets •  Datasets are important in Computer Vision Research – Comparing
Methods – Progress •  But they have some drawbacks – Bias[6] – Differ from “the goal” Image Processing Lab - Sharif 28 [6] Antonio Torralba, et al, Unbiased look at dataset bias, CVPR 2011"

3D Image Classification Datasets •  Before 2004 – less than 10
classes, few images •  Caltech 101 – 2004 – 101 classes, one object per image •  PASCAL VOC – 2005 – 2012 – 20 classes, many objects per image •  Imagenet – 2009 – now – More than 1000 classes! Image Processing Lab - Sharif 29

Datasets: Before 2004 •  Mostly few (less than 10) classes
•  Low clutter and variation •  Single instance of class present in image •  Dataset of [7] has 7 classes: faces, buildings, trees, cars, phones, bikes, books. •  1776 images. •  [8] Uses 6 classes: faces, airplanes, cars (rear), cars (side), motorbikes, spotted cats •  3821 images Image Processing Lab - Sharif 30 [7] Gabriella Csurka, et al, Visual Categorization with Bags of Keypoints, ECCV Wrokshops 2004" [8] Rob Fergus, et al, Object Class Recognition by Unsupervised Scale-Incariant Learning, CVPR 2003"

Caltech 101 •  Introduced with [9] in 2004 •  101
widely varied classes + clutter class •  Images ~ 200 x 300 pixels •  Total of 9144 images Image Processing Lab - Sharif 31 [9] Fei-Fei Li, et al, Learning Generative Visual Models from few training examples: an incremental Baysian approach tested on 101 object categories, CVPR Workshops 2004"

Caltech 101 •  Images per category Image Processing Lab -
Sharif 32

Caltech 101 •  Evaluation (vary number of training examples) Image
Processing Lab - Sharif 33 Credit: Caltech 101 Website"

Caltech 101 •  Low clutter (makes it easy) Image Processing
Lab - Sharif 34 Credit: Antonio Torralba"

Caltech 101 •  Drawbacks – Small number of training < 30
– Single object per image – Left-right aligned – Rotation artifacts •  Caltech 256 Image Processing Lab - Sharif 35 Slide Credit: Greg Grifﬁn"

Caltech 256 •  Introduced with [10] in 2006 •  Without
the drawbacks of Caltech 101 Image Processing Lab - Sharif 36 Credit: Caltech 256 Website" [10] Greg Grifﬁn, et al, The Caltech 256, Caltech Technical Report, 2006"

Caltech 256 Image Processing Lab - Sharif 37 101
clu$er 256 clu$er Slide Credit: Greg Grifﬁn"

Caltech 256 Image Processing Lab - Sharif 38 Slide
Credit: Greg Grifﬁn" •  Higher Variations, still single object

Caltech 256 Image Processing Lab - Sharif 39 Credit:
Caltech 256 Website" •  Half the performance

PASCAL VOC •  PASCAL Visual Object Classes [11] •  From
England (Oxford, Edinburgh, ...) •  Two parts – Public dataset – Yearly competition •  Classification •  Detection •  Others (segmentation, action recognition, etc) •  Updated each year (2005 – 2012) Image Processing Lab - Sharif 40 [11] Mark Everingham, et at, The PASCAL Visual Object Classes (VOC) Challenge, IJCV, 2010"

PASCAL VOC 2005 •  4 classes – Person: person – Vehicle: bicycle,
car, motorbike •  2445 images, containing 3348 objects •  1.37 Object/Image Image Processing Lab - Sharif 41

PASCAL VOC 2007 •  20 classes – Person: person – Animal: bird,
cat, cow, dog, horse, sheep – Vehicle: airplane, bicycle, boat, bus, car, motorbike, train – Indoor: bottle, chair, dinning table, potted plant, sofa, tv/monitor •  9963 images, containing 24640 objects •  2.47 Object/Image Image Processing Lab - Sharif 42

PASCAL VOC 2012 •  20 classes – Person: person – Animal: bird,
cat, cow, dog, horse, sheep – Vehicle: airplane, bicycle, boat, bus, car, motorbike, train – Indoor: bottle, chair, dinning table, potted plant, sofa, tv/monitor •  11450 images, containing 27450 objects •  2.39 Object/Image Image Processing Lab - Sharif 43

PASCAL VOC vs Caltech •  Caltech – Categories many classes (101
- 256) – Using small number of training images – 1 object/image, centered object in image •  VOC – Categories few classes (20) – Using many training examples – In general images Image Processing Lab - Sharif 44

PASCAL VOC 2012 – Caltech 256 •  tv/monitor •  computer-monitor
Image Processing Lab - Sharif 45

PASCAL VOC •  Evaluation – Make Precision/Recall Curve – Average Precision (AP)
Image Processing Lab - Sharif 46 AP = 1 11 Per(rec) rec∈{0,0.1,...,1} ∑ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Credit: Mark Everingham"

Imagenet •  Introduced in 2009 [12] from Stanford •  Based
on WordNet Image Processing Lab - Sharif 47 [12] Jia Deng, et al, Imagenet: A Large-Scale Hierarchical Image Database, CVPR 2009" Slide Credit: Jia Deng"

Imagenet •  Statics from April 2010 – 21841 synsets (classes) –
WordNet has 80,000 – 14,197,122 images – 1,034,908 images with bounding boxes – 50% of synsets have more than 500 images Image Processing Lab - Sharif 48

Imagenet Diversity Image Processing Lab - Sharif 49 Slide
Credit: Jia Deng"

ILSVRC •  Imagenet Large Scale Visual Recognition Challenge – A Challenge
each year (2010 – current) – Subset of Imagenet – PASCAL VOC replacement – 1000 Object Classes – 1,431,167 images Image Processing Lab - Sharif 50

ILSVRC - variety Image Processing Lab - Sharif 51
Slide Credit: Fei-Fei Li"

ILSVRC - Evaluation Image Processing Lab - Sharif 52

ILSVRC - Results Image Processing Lab - Sharif 53

ILSVRC - Results •  Lowest Error rate ( 1 -
Accuracy) – 2010: •  Winner: 0.2819 Runner-up: 0.3364 – 2011: •  Winner: 0.2577 Runner-up: 0.3101 – 2012: •  Winner: 0.1531 Runner-up: 0.2617 – 2013: •  Winner: 0.1174 Runner-up: 0.1253 Image Processing Lab - Sharif 54

Image Classification Systems •  Bag of features •  Nearest neighbor
classifier •  Spatial Pyramid Matching •  Fisher Kernel •  Deep Learning / Convolutional Neural Networks Image Processing Lab - Sharif 57

Image Classification Systems •  Bag of features •  Spatial Pyramid
Matching •  Deep Learning / Convolutional Neural Networks Image Processing Lab - Sharif 58

Bag of features - Origin •  Think of documents (Bag
of Words) Image Processing Lab - Sharif 59 0 2 4 6 8 Football Dictionary of words Document Category

of Words) Image Processing Lab - Sharif 60 0 2 4 6 8 Politics Dictionary of words Document Category

of Words) Image Processing Lab - Sharif 61 0 2 4 6 8 Machine Learning Dictionary of words Document Category

Bag of features •  Document à Image •  Word? Image
Processing Lab - Sharif 62

Bag of features – Visual Words •  Word can be
a small patch of an image Image Processing Lab - Sharif 63 Slide Credit: Fei-Fei Li"

Finding Visual Words •  Use Keypoint Detectors – SIFT – Harris-Affine – etc.
Image Processing Lab - Sharif 64

Visual Words - Issue •  Text words: easy to calculate
frequencies. •  Visual words ? Image Processing Lab - Sharif 65

Do Visual Words Repeat •  Do visual words repeat in
natural images? – Texture images Image Processing Lab - Sharif 66

Do Visual Words Repeat •  Do visual words repeat in
natural images? – Object images Image Processing Lab - Sharif 67 Slide Credit: Bastian Leibe"

Bag of feature model •  Introduced in [13] Image Processing
Lab - Sharif 68 [13] Gabriella Csurka, et al, “Visual Categorization with Bags of Keypoints”, ECCV Workshops 2004"

1.Feature detection and representation Normalize patch Detect patches
[Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic

… 1.Feature detec9on and representa9on Slide credit: Josef
Sivic

2. Codewords dictionary formation … Slide credit: Josef Sivic

2. Codewords dictionary formation Vector quanOzaOon … Slide
credit: Josef Sivic

2. Codewords dic9onary forma9on Fei-‐Fei et al. 2005

Image patch examples of codewords Sivic et al. 2005

3. Image representa9on ….. frequency codewords
Slide credit: Fei-‐Fei Li

Bag of features Image Processing Lab - Sharif 77
Slide Credit: Tom Funkhouser"

Bag of features - Issues •  Spatial Info is Lost
– Good: Invariance – Bad: Equal probability for all Image Processing Lab - Sharif 78

Bag of features - Issues •  Quantization Error – To obtain
compact representation (histogram) – Small size of codebooks – Results in lower discriminative power of descriptors – O(106) visual words à O(102) code-words – Highly frequent words have low discriminative power[14] Image Processing Lab - Sharif 79 [14] Oren Boiman, et al, “In Defense of Nearest-Neighbor Based Image Classiﬁcation”, CVPR 2008"

Bag of features - Issues •  Quantization Error – Bin-density is
long-tail[14] Image Processing Lab - Sharif 80 [14] Oren Boiman, et al, “In Defense of Nearest-Neighbor Based Image Classiﬁcation”, CVPR 2008"

Spatial Pyramid Matching[15] Image Processing Lab - Sharif 82
[15] Svetlana Lazebnik, et al, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”, CVPR 2006"

SPM - Issues •  Background Pollution – Same object different Backgrounds
•  Only appropriate for scene recognition •  Still improves object recognition performace Image Processing Lab - Sharif 85

Features in Visual Recognition •  Why Human is so good
at visual recognition •  But machines are not? •  Research[16, 17] has shown that features are the weak spot of computer vision Image Processing Lab - Sharif 87 [16] Devi Parikh, et al, “The role of Features, Algorithms and Data in Visual Recognition”, CVPR 2010" [17] Xiangxin Zhu, et al, “Do We Need More Training Data or Better Models for Object Detection?”, BMVC 2012"

Deep Learning / Feature Learning •  As a back box
for feature extraction •  SVM on these features achieve state-of- the-art on several datasets [18] Image Processing Lab - Sharif 88 [18] Ali Sharif Razavian, et al, CNN Features off-the-shelf: an Astounding Baseline for Recognition, CVPR Workshops 2014"

CNN Off-the-shelf •  Results – http://www.csc.kth.se/cvap/cvg/DL/ots/ Image Processing Lab - Sharif
89

Next time very soon, inshallah! Image Processing Lab - Sharif
92

Miserable life of an Image Classifier Image Processing Lab -
Sharif 93 Slide Credit: Jitendra Malik" What is the positive class? Hint: it is not person.

Thank you Image Processing Lab - Sharif 94 Summer
2014

Image Classification Intro

Image Classification Intro

More Decks by Yasser Souri

Other Decks in Research

Featured

Transcript