Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Image Classification Intro

Image Classification Intro

Description of the Problem and its datasets

Yasser Souri

July 09, 2014
Tweet

More Decks by Yasser Souri

Other Decks in Research

Transcript

  1. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 2   Image Processing Lab - Sharif
  2. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 3   Image Processing Lab - Sharif
  3. Classical View of Computer Vision Inspired by brain •  Low

    level –  Image Formation –  Filtering –  Edge Detection •  Mid level –  Shape –  Texture –  Segmentation •  High level –  Pattern Recognition Image Processing Lab - Sharif 4  
  4. Computer Vision as 3 R’s Image Processing Lab - Sharif

    5   Recognition Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"
  5. [1] Three R’s of Vision, Jitendra Malik, CVML 2013" Computer

    Vision as 3 R’s Image Processing Lab - Sharif 6   Recognition Reorganization Reconstruction
  6. Computer Vision as 3 R’s Image Processing Lab - Sharif

    7   Recognition Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"
  7. Computer Vision as 3 R’s Image Processing Lab - Sharif

    8   Recognition Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"
  8. Computer Vision as 3 R’s Image Processing Lab - Sharif

    9   Recognition Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"
  9. The Interconnections Image Processing Lab - Sharif 10   Recognition

    Reorganization Reconstruction [1] Three R’s of Vision, Jitendra Malik, CVML 2013"
  10. Goal of Recognition •  Object Recognition •  Semantic Segmentation Image

    Processing Lab - Sharif 14   Slide Credit: Jitendra Malik"
  11. Goal of Recognition •  Object Recognition •  Semantic Segmentation • 

    Pose Estimation Image Processing Lab - Sharif 15   Slide Credit: Jitendra Malik"
  12. Goal of Recognition •  Object Recognition •  Semantic Segmentation • 

    Pose Estimation •  Action Recognition Image Processing Lab - Sharif 16   Slide Credit: Jitendra Malik"
  13. Goal of Recognition •  Object Recognition •  Semantic Segmentation • 

    Pose Estimation •  Action Recognition •  Attribute Classification Image Processing Lab - Sharif 17   Slide Credit: Jitendra Malik"
  14. Recognition: Object / Scene •  Object •  Scene Image Processing

    Lab - Sharif 18   Recently new recognition tasks have been introduced: Attribute[2], Action[3], Memorability[4], Popularity[5], etc. [2] Ali Farhadi, et al, Describing objects by their attributes, CVPR 2009" [3] Bangpeng Yao, et al, Human action recognition by learning bases of action attributes and parts, ICCV 2011" [4] Aditya Khosla, et al, Memorability of Image Regions, NIPS 2012" [5] Aditya Khosla, et al, What makes an image popular? WWW 2014"
  15. Object Recognition: 2D / 3D •  2D Objects •  3D

    Objects Image Processing Lab - Sharif 19  
  16. Object Recognition Tasks •  Object Category Recognition – Image Classification Image

    Processing Lab - Sharif 21   Slide Credit: Cordelia Schmid" Cow:  ✓   Car:  ✓   Bike:  ✗ Horse: ✗ …
  17. Object Recognition Tasks •  Object Category Recognition – Object Detection Image

    Processing Lab - Sharif 22   Slide Credit: Cordelia Schmid" Cow, (x, y, w, h) Car, (x’, y’, w’, h')
  18. Object Recognition Tasks •  Object Instance Recognition •  Object Category

    Recognition – Image Classification – Object Detection Image Processing Lab - Sharif 23  
  19. Object Recognition Tasks •  Object Instance Recognition •  Object Category

    Recognition – Image Classification – Object Detection Image Processing Lab - Sharif 24  
  20. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 25   Image Processing Lab - Sharif
  21. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 26   Image Processing Lab - Sharif
  22. Image Classification •  Given –  Positive training images containing an

    object class –  Negative training images not containing that object class –  Classify a test image whether it contains the object class Image Processing Lab - Sharif 27   Slide Credit: Cordelia Schmid"
  23. Datasets •  Datasets are important in Computer Vision Research – Comparing

    Methods – Progress •  But they have some drawbacks – Bias[6] – Differ from “the goal” Image Processing Lab - Sharif 28   [6] Antonio Torralba, et al, Unbiased look at dataset bias, CVPR 2011"
  24. 3D Image Classification Datasets •  Before 2004 – less than 10

    classes, few images •  Caltech 101 – 2004 – 101 classes, one object per image •  PASCAL VOC – 2005 – 2012 – 20 classes, many objects per image •  Imagenet – 2009 – now – More than 1000 classes! Image Processing Lab - Sharif 29  
  25. Datasets: Before 2004 •  Mostly few (less than 10) classes

    •  Low clutter and variation •  Single instance of class present in image •  Dataset of [7] has 7 classes: faces, buildings, trees, cars, phones, bikes, books. •  1776 images. •  [8] Uses 6 classes: faces, airplanes, cars (rear), cars (side), motorbikes, spotted cats •  3821 images Image Processing Lab - Sharif 30   [7] Gabriella Csurka, et al, Visual Categorization with Bags of Keypoints, ECCV Wrokshops 2004" [8] Rob Fergus, et al, Object Class Recognition by Unsupervised Scale-Incariant Learning, CVPR 2003"
  26. Caltech 101 •  Introduced with [9] in 2004 •  101

    widely varied classes + clutter class •  Images ~ 200 x 300 pixels •  Total of 9144 images Image Processing Lab - Sharif 31   [9] Fei-Fei Li, et al, Learning Generative Visual Models from few training examples: an incremental Baysian approach tested on 101 object categories, CVPR Workshops 2004"
  27. Caltech 101 •  Evaluation (vary number of training examples) Image

    Processing Lab - Sharif 33   Credit: Caltech 101 Website"
  28. Caltech 101 •  Low clutter (makes it easy) Image Processing

    Lab - Sharif 34   Credit: Antonio Torralba"
  29. Caltech 101 •  Drawbacks – Small number of training < 30

    – Single object per image – Left-right aligned – Rotation artifacts •  Caltech 256 Image Processing Lab - Sharif 35   Slide Credit: Greg Griffin"
  30. Caltech 256 •  Introduced with [10] in 2006 •  Without

    the drawbacks of Caltech 101 Image Processing Lab - Sharif 36   Credit: Caltech 256 Website" [10] Greg Griffin, et al, The Caltech 256, Caltech Technical Report, 2006"
  31. Caltech 256 Image Processing Lab - Sharif 37   101

     clu$er   256  clu$er   Slide Credit: Greg Griffin"
  32. Caltech 256 Image Processing Lab - Sharif 38   Slide

    Credit: Greg Griffin" •  Higher Variations, still single object
  33. Caltech 256 Image Processing Lab - Sharif 39   Credit:

    Caltech 256 Website" •  Half the performance
  34. PASCAL VOC •  PASCAL Visual Object Classes [11] •  From

    England (Oxford, Edinburgh, ...) •  Two parts – Public dataset – Yearly competition •  Classification •  Detection •  Others (segmentation, action recognition, etc) •  Updated each year (2005 – 2012) Image Processing Lab - Sharif 40   [11] Mark Everingham, et at, The PASCAL Visual Object Classes (VOC) Challenge, IJCV, 2010"
  35. PASCAL VOC 2005 •  4 classes – Person: person – Vehicle: bicycle,

    car, motorbike •  2445 images, containing 3348 objects •  1.37 Object/Image Image Processing Lab - Sharif 41  
  36. PASCAL VOC 2007 •  20 classes – Person: person – Animal: bird,

    cat, cow, dog, horse, sheep – Vehicle: airplane, bicycle, boat, bus, car, motorbike, train – Indoor: bottle, chair, dinning table, potted plant, sofa, tv/monitor •  9963 images, containing 24640 objects •  2.47 Object/Image Image Processing Lab - Sharif 42  
  37. PASCAL VOC 2012 •  20 classes – Person: person – Animal: bird,

    cat, cow, dog, horse, sheep – Vehicle: airplane, bicycle, boat, bus, car, motorbike, train – Indoor: bottle, chair, dinning table, potted plant, sofa, tv/monitor •  11450 images, containing 27450 objects •  2.39 Object/Image Image Processing Lab - Sharif 43  
  38. PASCAL VOC vs Caltech •  Caltech – Categories many classes (101

    - 256) – Using small number of training images – 1 object/image, centered object in image •  VOC – Categories few classes (20) – Using many training examples – In general images Image Processing Lab - Sharif 44  
  39. PASCAL VOC •  Evaluation – Make Precision/Recall Curve – Average Precision (AP)

    Image Processing Lab - Sharif 46   AP = 1 11 Per(rec) rec∈{0,0.1,...,1} ∑ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision Credit: Mark Everingham"
  40. Imagenet •  Introduced in 2009 [12] from Stanford •  Based

    on WordNet Image Processing Lab - Sharif 47   [12] Jia Deng, et al, Imagenet: A Large-Scale Hierarchical Image Database, CVPR 2009" Slide Credit: Jia Deng"
  41. Imagenet •  Statics from April 2010 – 21841 synsets (classes) –

    WordNet has 80,000 – 14,197,122 images – 1,034,908 images with bounding boxes – 50% of synsets have more than 500 images Image Processing Lab - Sharif 48  
  42. ILSVRC •  Imagenet Large Scale Visual Recognition Challenge – A Challenge

    each year (2010 – current) – Subset of Imagenet – PASCAL VOC replacement – 1000 Object Classes – 1,431,167 images Image Processing Lab - Sharif 50  
  43. ILSVRC - Results •  Lowest Error rate ( 1 -

    Accuracy) – 2010: •  Winner: 0.2819 Runner-up: 0.3364 – 2011: •  Winner: 0.2577 Runner-up: 0.3101 – 2012: •  Winner: 0.1531 Runner-up: 0.2617 – 2013: •  Winner: 0.1174 Runner-up: 0.1253 Image Processing Lab - Sharif 54  
  44. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 55   Image Processing Lab - Sharif
  45. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 56   Image Processing Lab - Sharif
  46. Image Classification Systems •  Bag of features •  Nearest neighbor

    classifier •  Spatial Pyramid Matching •  Fisher Kernel •  Deep Learning / Convolutional Neural Networks Image Processing Lab - Sharif 57  
  47. Image Classification Systems •  Bag of features •  Spatial Pyramid

    Matching •  Deep Learning / Convolutional Neural Networks Image Processing Lab - Sharif 58  
  48. Bag of features - Origin •  Think of documents (Bag

    of Words) Image Processing Lab - Sharif 59   0   2   4   6   8   Football Dictionary of words Document Category
  49. Bag of features - Origin •  Think of documents (Bag

    of Words) Image Processing Lab - Sharif 60   0   2   4   6   8   Politics Dictionary of words Document Category
  50. Bag of features - Origin •  Think of documents (Bag

    of Words) Image Processing Lab - Sharif 61   0   2   4   6   8   Machine Learning Dictionary of words Document Category
  51. Bag of features – Visual Words •  Word can be

    a small patch of an image Image Processing Lab - Sharif 63   Slide Credit: Fei-Fei Li"
  52. Visual Words - Issue •  Text words: easy to calculate

    frequencies. •  Visual words ? Image Processing Lab - Sharif 65  
  53. Do Visual Words Repeat •  Do visual words repeat in

    natural images? – Texture images Image Processing Lab - Sharif 66  
  54. Do Visual Words Repeat •  Do visual words repeat in

    natural images? – Object images Image Processing Lab - Sharif 67   Slide Credit: Bastian Leibe"
  55. Bag of feature model •  Introduced in [13] Image Processing

    Lab - Sharif 68   [13] Gabriella Csurka, et al, “Visual Categorization with Bags of Keypoints”, ECCV Workshops 2004"
  56. 1.Feature detection and representation Normalize  patch   Detect  patches  

    [Mikojaczyk  and  Schmid  ’02]   [Matas  et  al.  ’02]     [Sivic  et  al.  ’03]   Compute  SIFT   descriptor              [Lowe’99]   Slide  credit:  Josef  Sivic  
  57. Bag of features - Issues •  Spatial Info is Lost

    – Good: Invariance – Bad: Equal probability for all Image Processing Lab - Sharif 78  
  58. Bag of features - Issues •  Quantization Error – To obtain

    compact representation (histogram) – Small size of codebooks – Results in lower discriminative power of descriptors – O(106) visual words à O(102) code-words – Highly frequent words have low discriminative power[14] Image Processing Lab - Sharif 79   [14] Oren Boiman, et al, “In Defense of Nearest-Neighbor Based Image Classification”, CVPR 2008"
  59. Bag of features - Issues •  Quantization Error – Bin-density is

    long-tail[14] Image Processing Lab - Sharif 80   [14] Oren Boiman, et al, “In Defense of Nearest-Neighbor Based Image Classification”, CVPR 2008"
  60. Image Classification Systems •  Bag of features •  Spatial Pyramid

    Matching •  Deep Learning / Convolutional Neural Networks Image Processing Lab - Sharif 81  
  61. Spatial Pyramid Matching[15] Image Processing Lab - Sharif 82  

    [15] Svetlana Lazebnik, et al, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”, CVPR 2006"
  62. Spatial Pyramid Matching[15] Image Processing Lab - Sharif 83  

    [15] Svetlana Lazebnik, et al, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”, CVPR 2006"
  63. Spatial Pyramid Matching[15] Image Processing Lab - Sharif 84  

    [15] Svetlana Lazebnik, et al, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”, CVPR 2006"
  64. SPM - Issues •  Background Pollution – Same object different Backgrounds

    •  Only appropriate for scene recognition •  Still improves object recognition performace Image Processing Lab - Sharif 85  
  65. Image Classification Systems •  Bag of features •  Spatial Pyramid

    Matching •  Deep Learning / Convolutional Neural Networks Image Processing Lab - Sharif 86  
  66. Features in Visual Recognition •  Why Human is so good

    at visual recognition •  But machines are not? •  Research[16, 17] has shown that features are the weak spot of computer vision Image Processing Lab - Sharif 87   [16] Devi Parikh, et al, “The role of Features, Algorithms and Data in Visual Recognition”, CVPR 2010" [17] Xiangxin Zhu, et al, “Do We Need More Training Data or Better Models for Object Detection?”, BMVC 2012"
  67. Deep Learning / Feature Learning •  As a back box

    for feature extraction •  SVM on these features achieve state-of- the-art on several datasets [18] Image Processing Lab - Sharif 88   [18] Ali Sharif Razavian, et al, CNN Features off-the-shelf: an Astounding Baseline for Recognition, CVPR Workshops 2014"
  68. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 90   Image Processing Lab - Sharif
  69. Outline •  Intro – Reconstruction, Recognition, Reorganization •  Image Classification • 

    Image Classification Systems •  Fine-Grained Visual Recognition 91   Image Processing Lab - Sharif
  70. Miserable life of an Image Classifier Image Processing Lab -

    Sharif 93   Slide Credit: Jitendra Malik" What is the positive class? Hint: it is not person.