Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Karim Jedda - Analyzing TV Series/Movies with Python – part 2

Karim Jedda - Analyzing TV Series/Movies with Python – part 2

MunichDataGeeks

October 08, 2016
Tweet

More Decks by MunichDataGeeks

Other Decks in Programming

Transcript

  1. About me • Big  data architect in  the Data  Technology

     team @  Prosieben • Data  science  Education  @  France  (NLP,  image  recognition,…) • Several hacks  (kaggle,  datamining  a  flat,  robotax,  whatsappcli,…) • Python  enthusiast • Still trying to  figure  out  Javascript and  booting up  Eclipse
  2. Some background • Find  a  good series/movie to watch next

    • Using the content • and Python  and my cat
  3. Similarity:  Feature  definition (text only) • Talk  time   •

    Talk  frequency • Episode/Movie  duration • Idle time • Number of words • Number of sentences • Most  used words • Words  length • Sentence length • Vocabulary richness • Time  to read • SMOG  grade • Topic  modelling • Summary • Polarity • Word  usage • Sentence beginnings
  4. Unprocessed/Not  Analyzed data • Audio • Video • Most  important

    information ! • Let‘s play a  bit with it
  5. Unprocessed/Not  Analyzed data • Audio • Video • Most  important

    information ! • Let‘s play a  bit with it
  6. Video  analysis and processing • Succession of frames • Has

    different  formats • Easy  to process with Python • Contains more information than we can analyze today So  we need to know what to look for in  videos
  7. open  NSFW NSFW  score: 0.0174936428666 root@7bb4e0bc0da0:/workspace/open_nsfw#   python ./classify_nsfw.py \

    -­‐-­‐model_def nsfw_model/deploy.prototxt \ -­‐-­‐pretrained_model nsfw_model/resnet_50_1by2_nsfw.caffemodel   \ the_image IMG_5983.JPG christy_mack_test.JPG NSFW  score: 0.852280557156
  8. Recap • Automatic tagging of videos • Not  limited  to

    NSFW • Generating  more usable data • Models  can be trained easily
  9. Similarity:  Feature  definition for Videos • NSFW  score • Color

     palette • Brightness • Length • Scenes  &  Rhythm • ...
  10. Ok,  but  what else? • Let‘s put Video  and Text

     together in  a  practical example
  11. IBM  Watson  did it • Using AI   • Complex

    • https://www.youtube.com/watch?v=gJEzuYynaiw
  12. Now let‘s try to • Boil down  the series to

    their NSFW  content ...  No,  just  kidding :)  
  13. An  easy  recommendation engine concept • Goal:  Similarity • Fast

     hack:  Use Elasticsearch‘s built in  „More  Like  this“  feature • Better option:  build own similarity
  14. Elasticsearch • Powerful  search engine • Very easy  to use

    and to install • Similar to Solr • Has lots  of hidden features
  15. An  easy  recommendation engine concept • MLT  selects a  set

    of representative terms of these input documents,   forms a  query using these terms,  executes the query and returns the results.  The  user controls the input documents,  how the terms should be selected and how the query is formed. more_like_this can be shortened to mlt.
  16. Conclusion • We test the capabilities of machine learning methods

    to enhance our dataset • Size  of the data does matter,  but  variety matters more • Trying out  out  of the box  solutionis always rewarding and motivating • Stay tuned for part 3  on  http://funnybretzel.comor @KarimJDDA Interested in  analyzing uniquedata in  an  innovative  environmentand working on  super  cool  projects?   • Big  Data  Engineer  x1 • Data  Scientist  x1 [email protected]
  17. Combining Audio,  Video  and Text Building  and scaling a  unique

    content recommendation system with open  source tools