Slide 1

Slide 1 text

Natural Language Processing - Opportunities & Challenges in Media Machine Learning Tech Session – 7th Dec Despicable Me 3 Xiaolan Sha Lead Data Scientist

Slide 2

Slide 2 text

2 About myself # PhD in CS # Lead Data Scientist @Sky # Insight & Decision Science Dept LinkedIn: linkedin.com/in/xiaolan-sha Email: [email protected]

Slide 3

Slide 3 text

3 Data @Sky What we have: • Media Content • Customer Consumptions • Customer Services • Hardware diagnostics • 3rd party data • … What we do: • Anomaly Detection • Text Analytics • Customer Analytics • Churn Prediction • Recommender Systems • … What we have: • Media Content • Customer Consumptions • Customer Services • Hardware diagnostics • 3rd party data • … What we do: • Anomaly Detection • Text Analytics • Customer Analytics • Churn Prediction • Recommender Systems • … What we have: • Media Content • Customer Consumptions • Customer Services • Hardware diagnostics • 3rd party data • … What we do: • Anomaly Detection • Text Analytics • Customer Analytics • Churn Prediction • Recommender Systems • … What we have: • Media Content • Customer Consumptions • Customer Services • Hardware diagnostics • 3rd party data • … What we do: • Anomaly Detection • Text Analytics • Customer Analytics • Churn Prediction • Recommender Systems • … What we have: • Media Content • Customer Consumptions • Customer Services • Hardware Diagnostics • 3rd Party Data • ... What we do: • Identify Key Drivers for KPIs • Customer Preference Modelling • Content Personalisation in Marketing • Churn predictions • Anomaly Detections • …

Slide 4

Slide 4 text

4 Data Platform Batch SFTP ETL Hub Data Acquisition Analytics Data Platform@Sky Exploration Business Objects Data Marts Storage Cloud Storage Cloud Bigtable Analytics BigQuery Cloud Dataproc Cloud Datalab Cloud ML Google Cloud Platform

Slide 5

Slide 5 text

Case Study: Tag suggestions based on Movie Synopses

Slide 6

Slide 6 text

6 When we talk about movies, we talk about… Synopsis A woman involved in a car accident wakes up in a cramped underground bunker with two men she's never met. Tense thriller with Mary Elizabeth Winstead and John Goodman. (2016)(109 mins) Also in HD Tags action_thriller, alien_thriller, apocalyptic_thriller, disaster_action, disaster_movie, horror_thriller, kidnapped_thriller, modern_great, monster_horror-thriller, monster_sci-fi-horror, sci-fi_horror, sci-fi_mystery, sci-fi_thriller Genres Sci-fi, thriller Director Dan Trachtenberg Stars Mary Elizabeth Winstead John Goodman, John Gallagher Jr. Producer J.J. Abrams

Slide 7

Slide 7 text

7 Drama, Comedy and Thriller are the big themes 18,000+ movies 17 Genres 2,800+ free text tags • Each movie can have one or more than one genre. • Each movie can have one or many free text tags, e.g. based_on_novel, comedy_drama • More than 40% movies are flagged as drama. • Comedy is the second biggest genre (less than half the size of Drama), followed by Thriller, Action, Adventure and Crime, etc. • Musical and Documentary are the genres have the least titles. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Documentary Musical Kids Animation War Fantasy Western Sci-Fi Horror Family Romance Crime Adventure Action Thriller Comedy Drama

Slide 8

Slide 8 text

8 Movies of different genre are tagged differently Note: The shorter is the bar, the more popular is the tag. Occurrence + 1 for the purpose of visualisation. 18,000+ movies 17 Genres 2,800+ free text tags • Each movie can have one or more than one genre. • Each movie can have one or many free text tags, e.g. based_on_a_novel, comedy_drama e.g. based_on_a_novel, crime_thriller, romantic_drama • All three tags are popular among drama. • A few documentaries tagged as based_on_novel, but none of them are crime_thriller or romantic_drama. • Movies of crime genre are tagged very often with crime_thriller, and popular with based_on_a_novel, but not romantic_drama. • Tags complements genre to bring descriptions of finer granularity.

Slide 9

Slide 9 text

9 Movies of same genre are tagged differently Drama 1,700+ tags Comedy 1,400+ tags Thriller 1,100+ tags 0 400 800 friendship_come… dysfunctional_co… british_film eccentric_comedy crime_comedy family_comedy relationship_co… classic_comedy comedy-drama rom-com 0 500 1000 1500 melodrama comedy-drama drama_thriller based_on_a_novel romantic_drama classic_drama relationship_dra… based_on_real_e… crime_drama tv_movie 0 500 1000 1500 based_on_a_novel tv_movie crime_drama psychological_th… horror_thriller murder_mystery mystery_thriller drama_thriller action_thriller crime_thriller

Slide 10

Slide 10 text

10 Synopses vocabulary distribution is long-tailed starring drama hd stars mins also comedy family young life thriller man story two strong love woman new film language one action john find star contains world directed father war adventure horror murder wife girl finds crime 2014 must 2013 son mother 2012 group 2011 based true western daughter get 18,000+ movies Synopses vocabulary 32,000+ average length: 18 • Synopses are very concise. • Word distribution used in synopses is long-tailed. • Most frequent words include Starring, Drama, HD, Stars, Mins, Comedy etc. • Synopses are likely to tell about casting, genre, micro-genre, duration, and release year etc. • "A woman involved in a car accident wakes up in a cramped underground bunker with two men she's never met. Tense thriller with Mary Elizabeth Winstead and John Goodman. (2016)(109 mins) Also in HD”

Slide 11

Slide 11 text

11 Synopses summarisation with topics • "A woman involved in a car accident wakes up in a cramped underground bunker with two men she's never met. Tense thriller with Mary Elizabeth Winstead and John Goodman. (2016)(109 mins) Also in HD” • … christmas, santa, case, family, woman, everyone, yul, 2008, told, strangers, dad, husband, four, brown, bobby, father, relationship, convicted, maverick, must, etc # Christmas/Family horror, supernatural, thriller, killer, los, angeles, events, group, starring, 2009, friends, remote, bloody, discover, girl, turns, birth, victim, abandoned, violence, etc # Crime/Thriller/Supernatural # other topics LDA 0 0.05 0.1 0.15 0.2 0.25 Topic Distributions

Slide 12

Slide 12 text

Q: Is it possible to summarise synopses with tags?

Slide 13

Slide 13 text

13 • Seq2Seq models handle NLP tasks where expected output is not one single label but a sequential output. • It has been shown its effectiveness in the applications like: • Machine Translation • Chatbot (One to one sentence conversation) • Text summarisations • The challenge of suggesting tags based on synopses is analogous to text summarisations. Sequence to Sequence Learning Reference: Text Summarization with Amazon Reviews, David Currie; Diagram from Google Research Blog - Computer, respond to this email

Slide 14

Slide 14 text

14 Automated Tag Suggestions Decoder LSTM word embedding lookup Encoder LSTM word embedding lookup t1 woman t2 involved t3 car t4 accident t5 , t6 action t7 t8 disaster • Seq2Seq model composes two RNNs • Encoder RNN takes sequence of words as input, and learns a ”context vector” of fixed length • Decoder RNN takes the “context vector” as initial status to generate next sequence of words as output • Word embedding reduces the dimensionality of word level representation in input

Slide 15

Slide 15 text

15 Automated Tag Suggestions Source: synopses are from IMDb website Blade Runner 2049 IMDb Genre: Mystery, Sci-Fi, Thriller Synopses: A young blade runner's discovery of a long-buried secret leads him to track down former blade runner Rick Deckard, who's been missing for thirty years. Suggested Tags: action, adventure, classic, crime, horror, novel, sci-fi Arrival IMDb Genre: Drama, Mystery, Sci-Fi Synopses: When twelve mysterious spacecrafts appear around the world, linguistics professor Louise Banks is tasked with interpreting the language of the apparent alien visitors. Suggested Tags: action, adventure, horror, mystery, sci-fi, thriller Synopses: A depressed uncle is asked to take care of his teenage nephew after the boy's father dies. Suggested Tags: drama, friendship, movie, relationship, tv Manchester by the Sea IMDb Genre: Drama

Slide 16

Slide 16 text

Summary Opportunities • Effective in automating content meta data tagging. • Potentials in automating the task of summarising synopses out of production commission text material. • Potentials in one-to-one Retrieval based customer services chatbot. Challenges • Lack of learning the creativity and beauty from human language. • Not ready for TV script story line writing. • Not ready for consistent conversational customer support task.

Slide 17

Slide 17 text

Questions ?

Slide 18

Slide 18 text

No content