Analyzing TV Series/Movies with Python - part 2

Analyzing TV Series/Movies with Python - part 2

3c3f3f18c25ea5283640ebd23553e7c6?s=128

MunichDataGeeks

October 08, 2016
Tweet

Transcript

  1. Analyzing TV  Series/Movies   with Python  – part 2

  2. About me • Big  data architect in  the Data  Technology

     team @  Prosieben • Data  science  Education  @  France  (NLP,  image  recognition,…) • Several hacks  (kaggle,  datamining  a  flat,  robotax,  whatsappcli,…) • Python  enthusiast • Still trying to  figure  out  Javascript and  booting up  Eclipse
  3. Some background • Find  a  good series/movie to watch next

    • Using the content • and Python  and my cat
  4. Initial  approach:  Subtitles • Amount of data

  5. Initial  approach:  Subtitles • Information  in  the data

  6. Goal:  Movie/Series  Recommendation system • What should i  watch next?

  7. What is a  recommendation system and why should I  care?

  8. Item  based recommendation

  9. Similarity:  Feature  definition (text only) • Talk  time   •

    Talk  frequency • Episode/Movie  duration • Idle time • Number of words • Number of sentences • Most  used words • Words  length • Sentence length • Vocabulary richness • Time  to read • SMOG  grade • Topic  modelling • Summary • Polarity • Word  usage • Sentence beginnings
  10. Let‘s do  it

  11. Recommender using the subtitles

  12. Unprocessed/Not  Analyzed data • Audio • Video • Most  important

    information ! • Let‘s play a  bit with it
  13. Unprocessed/Not  Analyzed data • Audio • Video • Most  important

    information ! • Let‘s play a  bit with it
  14. Video  analysis and processing • Succession of frames • Has

    different  formats • Easy  to process with Python • Contains more information than we can analyze today So  we need to know what to look for in  videos
  15. Ideas?

  16. Sex  &  NSFW  stuff in  Series  and Movies

  17. Sex  &  NSFW  stuff in  Series  and Movies • How

    ?!
  18. Sex  &  NSFW  stuff in  Series  and Movies • How

    ?! + Open NSFW =
  19. open  NSFW

  20. open  NSFW

  21. open  NSFW NSFW  score: 0.0174936428666 root@7bb4e0bc0da0:/workspace/open_nsfw#   python ./classify_nsfw.py \

    -­‐-­‐model_def nsfw_model/deploy.prototxt \ -­‐-­‐pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel   \ the_image IMG_5983.JPG christy_mack_test.JPG NSFW  score: 0.852280557156
  22. Let‘s apply this to series/movies Tools  used:   • Python

    • Caffe • NSFW  Model • MoviePy
  23. None
  24. None
  25. Results

  26. Desperate  Housewives

  27. Desperate  Housewives NSFW SFW ??

  28. Desperate  Housewives

  29. Desperate  Housewives

  30. None
  31. Desperate  Housewives

  32. None
  33. Dexter

  34. Dexter

  35. Game  of Thrones

  36. Game  of Thrones

  37. Narcos

  38. Sex  and the City

  39. Californication

  40. Californication 10/10   would watch!

  41. Recap • Automatic tagging of videos • Not  limited  to

    NSFW • Generating  more usable data • Models  can be trained easily
  42. Similarity:  Feature  definition for Videos • NSFW  score • Color

     palette • Brightness • Length • Scenes  &  Rhythm • ...
  43. Ok,  but  what else? • Let‘s put Video  and Text

     together in  a  practical example
  44. Video  summarization • Let‘s generate trailers

  45. IBM  Watson  did it • Using AI   • Complex

    • https://www.youtube.com/watch?v=gJEzuYynaiw
  46. Ok • What do  you do  when you don‘t have

    a  supercomputer?
  47. This

  48. Libraries  used • PyAudioAnalysis • PySceneDetect • youtube-­‐dl • moviePy

    • pysrt • nltk • re
  49. Output

  50. Demo • Original:  https://www.youtube.com/watch?v=9VDvgL58h_Y • Summarized version using subtitles and

    Python:   https://youtu.be/6Yvy1wHItSA
  51. Now let‘s try to • Boil down  the series to

    their NSFW  content
  52. Now let‘s try to • Boil down  the series to

    their NSFW  content ...  No,  just  kidding :)  
  53. Back  to business...

  54. An  easy  recommendation engine concept • Goal:  Similarity

  55. An  easy  recommendation engine concept • Goal:  Similarity • Fast

     hack:  Use Elasticsearch‘s built in  „More  Like  this“  feature • Better option:  build own similarity
  56. Elasticsearch • Powerful  search engine • Very easy  to use

    and to install • Similar to Solr • Has lots  of hidden features
  57. An  easy  recommendation engine concept • MLT  selects a  set

    of representative terms of these input documents,   forms a  query using these terms,  executes the query and returns the results.  The  user controls the input documents,  how the terms should be selected and how the query is formed. more_like_this can be shortened to mlt.
  58. An  easy  recommendation engine concept 1.  create an  index based

    on  the calculated features
  59. An  easy  recommendation engine 2.  queryit

  60. First  Results:  What should I  watch after   Dexter?

  61. Conclusion • We test the capabilities of machine learning methods

    to enhance our dataset • Size  of the data does matter,  but  variety matters more • Trying out  out  of the box  solutionis always rewarding and motivating • Stay tuned for part 3  on  http://funnybretzel.comor @KarimJDDA Interested in  analyzing uniquedata in  an  innovative  environmentand working on  super  cool  projects?   • Big  Data  Engineer  x1 • Data  Scientist  x1 karim.jedda@gmail.com
  62. Thank you!

  63. Combining Audio,  Video  and Text Building  and scaling a  unique

    content recommendation system with open  source tools