Tic-tac defect detection is based on a use case we developed to automate pill production quality control and save a pharmaceutical company millions of euros per year. When a pill press breaks, it begins to produce defective pills that can take hours or sometimes days to recognize. With each pill press producing hundreds of pills per second, the monetary stakes of potentially losing days of product are high. To address the issue, we prototyped a (deep) machine learning pipeline using transfer learning to detect defective pills. One particular challenge was to avoid false positives because they lead to unnecessary and costly production disruptions. Some of the causes of false positives were noise in images captured under practical but imperfect conditions (e.g. dust, unwanted reflections, CCD shot noise, etc.). Our pretrained lower layers (InceptionV3) was not as effective as we needed it to be at extracting features that could differentiate dust from pill defects, and we did not have enough labelled data to retrain them. We instead show how pre-processing techniques with Gaussian mixtures models, auto-encoders, and non-local means can be used to improve the quality of our transfer learning based image recognition pipeline. Cloud platforms allowed the team to quickly and cheaply explore and compare a wide range of machine learning techniques, thereby leading to an effective working prototype within a short development cycle.



February 28, 2018


  1. PILL/CANDY DEFECT DETECTION Alex Loosley, Data Reply GmbH

  2. None
  3. None

  5. the use case

  6. Vicky Qian Peter Schaab Wadim Pessin Johannes Oberreuter

  7. NOT ALL PILLS ARE CREATED PERFECT quality control in pill

    production broken pill presses image classification
  8. SETUP • Monitor tic-tac pills after production in real time

    • 2 high-speed cameras • Pill classifier
  9. None
  10. None
  11. CLASSIFYING PILLS input deep learning output non-defective defective

  12. WHY DEEP LEARNING? automatic feature extraction robustness to variation reusability

  13. CONVOLUTIONAL NEURAL NETWORK • emulate visual cortex • translationally invariant

    • exploit spatial correlations source: Wikipedia features increasingly non-local
  14. CONVOLUTIONAL NEURAL NETWORK Typical Numbers • 5 convolutional layers, 3

    layers in top neural network – 500,000 neurons • 50,000,000 parameters • ~ 1 week to train (GPUs) source: http://learning.eng.cam.ac.uk/pub/Public/Turner/Teaching/ml-lecture-3-slides.pdf features increasingly non-local
  15. TRANSFER LEARNING + general information on edges & shapes labelled

    dataset pretrained neural network Tic-tac classifier: • Pretrained InceptionV3 network • GPU instance in Google Cloud

  17. WHY GOOGLE CLOUD? (AND CLOUD ML) 0 350 700 1050

    1400 1750 3 layers 4 layers CPU NVIDIA Tesla K80 GPUs as a service: s/epoch fail fast: (10x speed-up) cost efficient full modeling control flexible buy it: $4,349.99 use it: $0.783/h train it: $0.30
  18. CLASSIFYING PILLS non-defective: true negative defective: true positive defective pills

    non-defective pills (the vast majority)
  19. IMAGE DENOISING clear image noisy image

  20. None
  21. AUTOENCODER Source: Benjamin Irving encoder decoder noisy image reconstructed image

    compressed encoding
  22. IMAGE DENOISING clear image noisy image denoising with autoencoder AUC

    = 0.99 AUC = 0.89 AUC = 0.91
  23. NON-LOCAL MEANS replaces the value of a pixel by an

    average of a selection of other pixels values Source: scikit-image
  24. IMAGE DENOISING clear image noisy image denoising with autoencoder denoising

    with non-local means AUC = 0.99 AUC = 0.89 AUC = 0.91 AUC = 0.97
  25. # of test pill Probability FINAL RESULTS image with or

    without dust denoising classification 0.97
  26. CONCLUSION Preprocessing improves classification results with transfer learning Cheap to

    train/deploy with Google Cloud Platform (and Cloud ML) Transfer learning saves time and training compute resources Pills taste better with Maple Syrup
  27. THANK YOU www.reply.com

  28. None

  30. CONVOLUTIONAL NEURAL NETWORK • emulate visual cortex • translationally invariant

    • exploit spatial correlations source: Wikipedia receptive field features increasingly non-local
  31. Unlike “local mean” filters, which take the mean value of

    a group of pixels surrounding a target pixel to smooth the image Non-local means filtering take a means of all the pixels in the image, weighted by how similar these pixels are to the target pixel Reference NON-LOCAL MEANS
  32. SUGGESTED DECISION WINDOW 120 pills/s count 7200 pills (1 minute)

    FPR: 0.2% TPR: 62% 4 broken/s 160 pills/minute detected 15 pills/minute expected chance for false alarm: < 10-10
  33. CONVOLUTIONAL NEURAL NETWORK benchmarking different architectures 3 Layers 4 Layers

    trained on dusty and non-dusty images
  34. AUTOENCODER FOR DENOISING Convolutional Neural Network Inception V3