Pandora Ad Classification

Pandora Ad Classification

Summary of a grad class project built to match advertisements to their companies via Machine Learning algorithms.

8a21c795333e58ca713df7c9a1bfd745?s=128

Greg Ziegan

December 18, 2014
Tweet

Transcript

  1. PANDORA AD CLASSIFICATION Greg Ziegan & MJ Harkins

  2. WHY PANDORA? The company I work for, Vertical Knowledge, works

    on consulting projects for hedge funds, government departments, and other private agencies. Hedge funds want something from us: insights
  3. A company like Pandora is a freemium service. It provides

    a usable platform for free users and incentives to premium members.
  4. Pandora needs to sustain and profit from even its free

    service. The premium members are charged a fee but this revenue does not provide the company with large enough profits.
  5. In order to sustain its free service, Pandora will show

    advertisements from companies who believe these ads will somehow coax the user into visiting their company site.
  6. If we can discover the distribution of ads shown by

    these external companies, we may have a roughly accurate view of what companies have invested in Pandora, and the amount they have invested compared to others.
  7. And that leaves us with thousands of screenshots of advertisments

    to classify.
  8. BUT WAIT, WHY NOT JUST FOLLOW THE AD'S LINK? We

    do not want to alter traffic to these sites. We are classifying across hundreds of stations and geographically distributed IP's Why not just look at the link's url? The ads are embedded in the audio player (making the links hard to find) and are often shortened and made unrecognizable
  9. CLASSIFICATION: THE APPROACH We will take a moment here to

    cite Gary Doran, as he has helped us greatly in understanding unsupervised feature detection and multiple instance algorithms.
  10. STEPS TO SUCCESS 1. Image Segmentation 2. Feature Preparation 3.

    Kernel Selection 4. SVM and MISVM 5. Profit
  11. IMAGE SEGMENTATION

  12. Here are some example advertisements:

  13. None
  14. None
  15. We used each pixel's RGB color values as features for

    clustering.
  16. We sent the pixel data for each image through an

    implementation of the k-means clustering algorithm.
  17. We tried three algorithms from the scikit-image library: Quickshift SLIC

    Felzenswalb's
  18. QUICKSHIFT "Segments image using quickshift clustering in Color-(x,y) space. Produces

    an oversegmentation of the image using the quickshift mode-seeking algorithm."
  19. None
  20. We thought quickshift meant quick. It was not. The algorithm

    took ~5 seconds to segment an image... total clustering time: 40 minutes :(
  21. We tried another of the three, SLIC, and it gave

    fantastic results.
  22. Better yet, it took less than a quarter of a

    second to cluster.
  23. SLIC "Segments image using k-means clustering in Color-(x,y,z) space."

  24. None
  25. None
  26. We were very happy with SLIC and after reviewing the

    third algorithm we decided against it since we couldn't pronounce it.
  27. FEATURE PREPARATION

  28. Once we had clusters, we took the average RGB value

    for each cluster as a feature for the training set.
  29. AN ADDITIONAL APPROACH

  30. We discussed adding Gabor wavelets to the clustering algorithms and

    using the refined clusters' RGB values/texture features as the example set.
  31. This proved unnecessary due to excellent results with vanilla SLIC

    segmentation.
  32. However, another task at Vertical Knowledge would deal with recognizing

    objects with texture, orientation, and depth. It is very likely more complicated features, including Gabor wavelets, would be needed to classify such an image.
  33. SVM

  34. Using a standard support vector machine, we classified the examples

    with the following results:
  35. Accuracy: 0.95 Precision: 0.90 Recall: 0.98 AROC: 0.99

  36. MISVM (Results Pending)

  37. The multiple instance learner is still being tested on the

    data set. We're currently getting warnings and all zeroes for predictions.
  38. It's not nearly as quick as the SVM, as it

    considers all the instances in a bag with a single image.
  39. The results are showing that some parameter is not tuned

    correctly. We're confident the algorithm will perform at least as well as the SVM since Gary's implementation has been used and resulted in 99% accuracy on a similar data set.
  40. CONCLUSIONS We were both extremely excited to see such high

    accuracy from a standard SVM.
  41. However, this result is only from one company where we

    had ~190 example images.
  42. There are 20 other companies with above 30 examples, but

    many from there on have under 5 examples to train on.
  43. This project will need to shift focus in implementation to

    take on a state of Active Learning.
  44. We discussed having the SVM be retrained at each new

    labeled instance. Another possibly was to provide a feature of the time the ad was found, weighting more recent ads as more important while still keeping a reasonable training set size.
  45. But while the MISVM tooks magnitudes longer, the SVM took

    less than a minute on 515 images. This means the first suggestion of retraining is very feasible for the time being.
  46. FIN