Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Short History of Computer Vision

Johannes Kolbe
November 12, 2024

A Short History of Computer Vision

This Slide Deck takes us through the history of Computer Vision by tracking one task through time: classifying and detecting a bird in an image. Starting from KeyPoint Detection, through Convolutional Neural Networks, Transformers all the way to modern multimodal models and their capabibilites, we can see a steady development in the models' abilities.

Johannes Kolbe

November 12, 2024
Tweet

More Decks by Johannes Kolbe

Other Decks in Science

Transcript

  1. About Me Johannes Kolbe • Data Scientist at ◦ Focus

    on Computer Vision ◦ Some Expertise in NLP • M.Sc. in Computer Science at TU Berlin • Automotive Systems Research Background • Conference Speaker (PyCon Africa, EuroPython, …) • Hugging Face Fellow @[email protected] linkedin.com/in/johko github.com/johko huggingface.co/johko
  2. Overview 1. From SIFT to AlexNet 2. Convolutional Neural Networks

    3. Transformers 4. Multimodality 5. The Future
  3. The Bird Problem - 2024 Classify 9000 animals Share Photos/Videos

    via App Take HD Photos/Videos Swarovski Optik AX Visio
  4. A Short History of CNNs 1979 Necognitron 1989 LeNet-1 Kunihiko

    Fukushima Yann LeCun 1994 Convolutional Dynamic Systems 1998 LeNet-5 Yann LeCun 2012 AlexNet Krizhevsky, Sutskever, Hinton Rummelhart, Hinton, Williams
  5. The CNN Boom 2014 GoogLeNet 2014 VGG Christian Szegedy et

    al. Simonyan, Zisserman 2015 ResNet 2017 MobileNet Howard Andrew et al. 2019 EfficientNet Tan, Le Kaiming He et al. 2022 ConvNext Zhuang Liu et al.
  6. CNNs for everyone! Classification Detection Segmentation VGG ResNet Efficient Net

    Mobile Net ConvNext Mask R-CNN DeepLab UNet Faster R-CNN YOLO SSD RetinaNet FCN
  7. Which Domain? Every Domain! CNN Augmented and Virtual Reality Aerial

    and Satellite Imagery Analysis Manufacturing and Industrial Automation Security and Surveillance Autonomous Vehicles Medical Imaging Multimedia Content Analysis Social Media Analytics Financial Services Retail and E-commerce Text Mining and Natural Language Processing
  8. NLP “Natural language processing (NLP) is the discipline of building

    machines that can manipulate human language — or data that resembles human language — in the way that it is written, spoken, and organized.” deeplearning.ai
  9. Transformers for everyone! Classification Detection Segmentation ViT Swin DEiT MAE

    DiNO Mask Former SAM DETR Mask DiNO YOLOS Deformable DETR SegFormer
  10. CNN Trans former Parameter Efficiency Global Context Few-Shot Adaption Interpretability

    Multimodality Industry Adoption Local Receptive Fields Training time