Predictive Analytics - Speaker Deck

Slide 1

Slide 1 text

Predictive Analytics ANKIT BAHUGUNA TERADATA, MUNICH [email protected]

Slide 2

Slide 2 text

Apple Watch http://blogs-images.forbes.com/anthonykosner/files/2014/10/apple-watch-selling-points.jpg

Slide 3

Slide 3 text

In the 24 hours since the launch of the Apple Watch on 9 March, Hotwire’s social media analysis picked up 981,021 mentions of the device using the terms Apple Watch, #AppleWatch, and #AppleWatchEvent. Of these mentions, a massive 42 per cent were found to contain negative sentiment towards the devices – 58 per cent was however positive. Source: http://www.thedrum.com/news/2015/03/10/apple-watch-sees-42-negative-response-twitter-battery-life-and-price-being-main

Slide 4

Slide 4 text

Predictive Analytics in Media o Hewlett-Packard “Flight Risk” score: Gitali Halder, Hewlett-Packard, and Anindya Dey, Hewlett Packard, “Attrition Driver Analysis,” Predictive Analytics World London Conference, November 30, 2011, London, UK. o “Target Knew Teen Was Pregnant Before Her Dad.” Fox News, February 24, 2012. o NETFLIX: Clive Thompson, “If You Liked This, You’re Sure to Love That,” New York Times, November 21, 2008. www.nytimes.com/2008/11/23/magazine/23Netflix-t.html. o IBM, “IBM Watson: Ushering in a new era of computing,” IBM Innovations, April 11, 2012. www-03.ibm.com/innovation/us/watson/ o Google: Peter van der Graff, “How Search Engines Use Machine Learning for Pattern Detection,” Search Engine Watch, December 1, 2011. http://searchenginewatch.com/article/2129359/How-Search-Engines-Use-Machine-Learning- for-Pattern-Detection

Slide 5

Slide 5 text

The Prediction Effect: A little prediction goes a long way. Predictive analytics (PA) Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions.

Slide 6

Slide 6 text

Predictions Define a Functional Society

Slide 7

Slide 7 text

The Process

Slide 8

Slide 8 text

Predictive Model o A mechanism that predicts a behavior of an individual, such as click, buy, lie, or die. o It takes characteristics of the individual as input, and provides a predictive score as output. o The higher the score, the more likely it is that the individual will exhibit the predicted behavior.

Slide 9

Slide 9 text

PA APPLICATION: TARGETING DIRECT MARKETING What’s predicted: Which customers will respond to marketing contact. What’s done about it: Contact customers more likely to respond.

Slide 10

Slide 10 text

Mailing List with a Million Prospects o 1 Million Unique Customers o $2 to mail each one. o Say, 1 out of 100 buys your product (10000 responses) o Say, For each product, you get a profit of 220$ for each rare positive response. o Profit = Revenue - Cost o ($220 x 10, 000 responses) – ($2 x 1 million) o Profit = $200,000 Are you Happy Yet ?

Slide 11

Slide 11 text

PA Gives: Most Likely Responders o It earmarks a quarter of entire list o Says “These folks are three times more likely to respond than average!” o Now, you have a short list of 250,000 customers of which 3 percent will respond—7,500 responses. o But again 3% Confidence of Response ? o Let’s do some math again!

Slide 12

Slide 12 text

Working out Math on PA solution o If we send mail to only this short list then we profit: o Profit = Revenue - Cost o ($220 x 7,500 responses) – ($2 x 250,000) o Profit = $1,150,000 o We just improved your profit 5.75 times over by mailing to fewer people (and, in so doing, expending fewer trees). o In particular, you predicted who wasn’t worth contacting and simply left them alone. Aggregate bottom line for PA is huge!

Slide 13

Slide 13 text

PA Application: PREDICTIVE ADVERTISEMENT TARGETING What’s predicted: Which ad each customer is most likely to click. What’s done about it: Display the best ad (based on the likelihood of a click as well as the bounty paid by its sponsor.

Slide 14

Slide 14 text

Probability of Clicking on the Advert. IF the individual ◦ Is still in high school ◦ AND ◦ expects to graduate college within three years ◦ AND ◦ indicates certain military interest ◦ AND ◦ has not been shown this ad yet THEN the probability of clicking on the ad for the Art Institute is 13.5 percent

Slide 15

Slide 15 text

PA Application: Recommendation Systems

Slide 16

Slide 16 text

Identify users Most Similar to Alice

Slide 17

Slide 17 text

Identify Candidate Items and Predict Ratings

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

PA APPLICATION: BLACK BOX TRADING What’s predicted: Whether a stock will go up or down. What’s done about it: Buy stocks that will go up; sell those that will go down. Several Pointers and Strategies: One of them is financial Sentiment Analysis, which we cover later.

Slide 21

Slide 21 text

Machine Learning: Classification In classification, we use an object's characteristics to identify which class (or group) it belongs to. Source: Wikipedia

Slide 22

Slide 22 text

Machine Learning: Clustering Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Source: Wikipedia

Slide 23

Slide 23 text

Machine Learning: Supervised vs Unsupervised Supervised learning is the machine learning task of inferring a function from labeled training data TRAINING DATA: LABELED DATA whereas, The problem of unsupervised learning is that of trying to find hidden structure in unlabeled data. TRAINING DATA: UNLABELED DATA

Slide 24

Slide 24 text

Application Demo SENTIMENT ANALYSIS

Slide 25

Slide 25 text

Game of Thrones Season-4 DVD (Amazon) Source: Amazon.com

Slide 26

Slide 26 text

Reviews – What do the people say? Valar Morghulis! (All men must die) Valar Dohaeris! (All men must serve) Love this series! Great actors and I love the characters. I am always on pins and needles waiting for each new season of this show! I think this is a fantastic series. Although the fourth year was not as exciting to me as the first three, I still look forward to seasons 5 and 6. I felt this season was not the strongest of the series. Love the series, a little disappointed that it will have to end one day!

Slide 27

Slide 27 text

Sentiment Analysis A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/ aspect level. Whether the expressed opinion in a document, a sentence or an entity feature/ aspect is positive, negative, or neutral. “Beyond polarity" sentiment classification looks at emotional states such as "angry," "sad," and "happy."

Slide 28

Slide 28 text

Let’s Brainstorm – Predict Sentiment ? 1. This movie was fantastic. 2. This is the worst movie, I have seen my entire life! 3. The direction in this movie was not very good.

Slide 29

Slide 29 text

Let’s Brainstorm – Predict Sentiment ? 1. This movie was fantastic. 2. This is the worst movie, I have seen my entire life! 3. The direction in this movie was not very good.

Slide 30

Slide 30 text

Getting the Data https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data

Slide 31

Slide 31 text

Information about Kaggle Data-set ◦ In-Domain Data, originally from Rotten Tomatoes. ◦ Training Data: Kaggle Movie Reviews 156,060 Phrases ◦ Testing Data: Kaggle Movie Reviews 66,292 Phrases ◦ Task: Classify test phrases into one of the five categories: ◦ negative (0), ◦ somewhat negative (1), ◦ neutral (2), ◦ somewhat positive (3) ◦ positive (4).

Slide 32

Slide 32 text

Data Format – Tab Separated Values Input Training Data PhraseId SentenceId Phrase Sentiment 64 2 This quiet , introspective and entertaining independent is worth seeking . 4 Input Testing Data PhraseId SentenceId Phrase 156250 8550 All ends well , sort of , but the frenzied comic moments never click . Output – Analyzed Test Data (Comma Separated) PhraseId,Sentiment 156061,2

Slide 33

Slide 33 text

Steps ◦ Lowercase the input text; ◦ Stop Word Removal (a, an , the etc.) from Text ◦ TF-IDF or Count Vectorizer (or, Bag of Words counts) ◦ Normalization of Vectors(L2) ◦ Training data is fetched to a Lib-Linear SVM (Machine Learning Model) ◦ Output is obtained in pre-defined format!

Slide 34

Slide 34 text

Representation: Bag of Words In this model, a text (such as a sentence or a document) is represented as the bag (multi-set) of its words, disregarding grammar and even word order but keeping multiplicity. Example: D1: John likes to watch movies. Mary likes movies too. D2: John also likes to watch football games. Vocabulary {Word : Index} { "John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10 } There are 10 distinct words and using the indexes of the Vocabulary , each document is represented by a 10-entry vector: [1, 2, 1, 1, 2, 0, 0, 0, 1, 1] [1, 1, 1, 1, 0, 1, 1, 1, 0, 0] Note: Scikit-Learn has direct support this vector representation using a CountVectorizer. Similarly support is available for TF-IDF too.

Slide 35

Slide 35 text

Code Walk-through Python + Scikit Learn

Slide 36

Slide 36 text

And we are done  https://hayleyandjoelblog.files.wordpress.com/2015/02/hurray.png

Slide 37

Slide 37 text

Or wait… Are we? http://www.stepupleader.com/wp-content/uploads/2013/06/curious.jpg

Slide 38

Slide 38 text

Deep Learning (Word Vectors - NLP)

Slide 39

Slide 39 text

Word2Vec: CBOW and SKIP Models Mikolov T. et al. 2013

Slide 40

Slide 40 text

Recursive Neural Tensor Networks A new composition function ‘p’ was introduced in a new compositional model called the RNTN, along with a new sentiment tree-bank, which allows training and evaluation with compositional information. More expressive than any other recursive neural network so far! Idea: Allow more interaction of Vectors. Image Courtesy: Socher et al. 2013 EMNLP

Slide 41

Slide 41 text

Conclusion o Important to understand: “Data is always predictive!” o In real world data science, no single model fits all problems, so one needs to constantly learn about new techniques. o Real world data is notorious and one constantly faces new challenges to handle new problems. o “With more power comes more responsibility”. o The Prediction Effect: A little prediction goes a long way.

Slide 42

Slide 42 text

Predictive Analytics – Eric Siegel “The power to predict who will click, buy, lie or die.”

Slide 43

Slide 43 text

Links and References Yongzheng Zhang, Dan Shen and Catherine Baudin Sentiment Analysis in Practice, Tutorial delivered at ICDM 2011 Scikit Learn Supervised Learning: http://scikit- learn.org/stable/supervised_learning.html#supervised-learning Scikit Learn Working with Text - http://scikit- learn.org/stable/tutorial/text_analytics/working_with_text_data.html Andrew Ng’s Machine Learning Course: https://www.coursera.org/course/ml Manning and Jurafsky, Natural Language Processing Course https://www.coursera.org/course/nlp Learning Scikit-Learn: Machine Learning in Python http://www.amazon.com/Learning-scikit- learn-Machine-Python/dp/1783281936

Slide 44

Slide 44 text

THANK YOU! You can write to me at: [email protected]