Predictive Analytics

Predictive Analytics ANKIT BAHUGUNA TERADATA, MUNICH ANKIT.BAHUGUNA@TERADATA.COM

Apple Watch http://blogs-images.forbes.com/anthonykosner/files/2014/10/apple-watch-selling-points.jpg

In the 24 hours since the launch of the Apple
Watch on 9 March, Hotwire’s social media analysis picked up 981,021 mentions of the device using the terms Apple Watch, #AppleWatch, and #AppleWatchEvent. Of these mentions, a massive 42 per cent were found to contain negative sentiment towards the devices – 58 per cent was however positive. Source: http://www.thedrum.com/news/2015/03/10/apple-watch-sees-42-negative-response-twitter-battery-life-and-price-being-main

Predictive Analytics in Media o Hewlett-Packard “Flight Risk” score: Gitali
Halder, Hewlett-Packard, and Anindya Dey, Hewlett Packard, “Attrition Driver Analysis,” Predictive Analytics World London Conference, November 30, 2011, London, UK. o “Target Knew Teen Was Pregnant Before Her Dad.” Fox News, February 24, 2012. o NETFLIX: Clive Thompson, “If You Liked This, You’re Sure to Love That,” New York Times, November 21, 2008. www.nytimes.com/2008/11/23/magazine/23Netflix-t.html. o IBM, “IBM Watson: Ushering in a new era of computing,” IBM Innovations, April 11, 2012. www-03.ibm.com/innovation/us/watson/ o Google: Peter van der Graff, “How Search Engines Use Machine Learning for Pattern Detection,” Search Engine Watch, December 1, 2011. http://searchenginewatch.com/article/2129359/How-Search-Engines-Use-Machine-Learning- for-Pattern-Detection

The Prediction Effect: A little prediction goes a long way.
Predictive analytics (PA) Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions.

Predictions Define a Functional Society

The Process

Predictive Model o A mechanism that predicts a behavior of
an individual, such as click, buy, lie, or die. o It takes characteristics of the individual as input, and provides a predictive score as output. o The higher the score, the more likely it is that the individual will exhibit the predicted behavior.

PA APPLICATION: TARGETING DIRECT MARKETING What’s predicted: Which customers will
respond to marketing contact. What’s done about it: Contact customers more likely to respond.

Mailing List with a Million Prospects o 1 Million Unique
Customers o $2 to mail each one. o Say, 1 out of 100 buys your product (10000 responses) o Say, For each product, you get a profit of 220$ for each rare positive response. o Profit = Revenue - Cost o ($220 x 10, 000 responses) – ($2 x 1 million) o Profit = $200,000 Are you Happy Yet ?

PA Gives: Most Likely Responders o It earmarks a quarter
of entire list o Says “These folks are three times more likely to respond than average!” o Now, you have a short list of 250,000 customers of which 3 percent will respond—7,500 responses. o But again 3% Confidence of Response ? o Let’s do some math again!

Working out Math on PA solution o If we send
mail to only this short list then we profit: o Profit = Revenue - Cost o ($220 x 7,500 responses) – ($2 x 250,000) o Profit = $1,150,000 o We just improved your profit 5.75 times over by mailing to fewer people (and, in so doing, expending fewer trees). o In particular, you predicted who wasn’t worth contacting and simply left them alone. Aggregate bottom line for PA is huge!

PA Application: PREDICTIVE ADVERTISEMENT TARGETING What’s predicted: Which ad each
customer is most likely to click. What’s done about it: Display the best ad (based on the likelihood of a click as well as the bounty paid by its sponsor.

Probability of Clicking on the Advert. IF the individual ◦
Is still in high school ◦ AND ◦ expects to graduate college within three years ◦ AND ◦ indicates certain military interest ◦ AND ◦ has not been shown this ad yet THEN the probability of clicking on the ad for the Art Institute is 13.5 percent

PA Application: Recommendation Systems

Identify users Most Similar to Alice

Identify Candidate Items and Predict Ratings

PA APPLICATION: BLACK BOX TRADING What’s predicted: Whether a stock
will go up or down. What’s done about it: Buy stocks that will go up; sell those that will go down. Several Pointers and Strategies: One of them is financial Sentiment Analysis, which we cover later.

Machine Learning: Classification In classification, we use an object's characteristics
to identify which class (or group) it belongs to. Source: Wikipedia

Machine Learning: Clustering Clustering is the task of grouping a
set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Source: Wikipedia

Machine Learning: Supervised vs Unsupervised Supervised learning is the machine
learning task of inferring a function from labeled training data TRAINING DATA: LABELED DATA whereas, The problem of unsupervised learning is that of trying to find hidden structure in unlabeled data. TRAINING DATA: UNLABELED DATA

Application Demo SENTIMENT ANALYSIS

Game of Thrones Season-4 DVD (Amazon) Source: Amazon.com

Reviews – What do the people say? Valar Morghulis! (All
men must die) Valar Dohaeris! (All men must serve) Love this series! Great actors and I love the characters. I am always on pins and needles waiting for each new season of this show! I think this is a fantastic series. Although the fourth year was not as exciting to me as the first three, I still look forward to seasons 5 and 6. I felt this season was not the strongest of the series. Love the series, a little disappointed that it will have to end one day!

Sentiment Analysis A basic task in sentiment analysis is classifying
the polarity of a given text at the document, sentence, or feature/ aspect level. Whether the expressed opinion in a document, a sentence or an entity feature/ aspect is positive, negative, or neutral. “Beyond polarity" sentiment classification looks at emotional states such as "angry," "sad," and "happy."

Let’s Brainstorm – Predict Sentiment ? 1. This movie was
fantastic. 2. This is the worst movie, I have seen my entire life! 3. The direction in this movie was not very good.

Getting the Data https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data

Information about Kaggle Data-set ◦ In-Domain Data, originally from Rotten
Tomatoes. ◦ Training Data: Kaggle Movie Reviews 156,060 Phrases ◦ Testing Data: Kaggle Movie Reviews 66,292 Phrases ◦ Task: Classify test phrases into one of the five categories: ◦ negative (0), ◦ somewhat negative (1), ◦ neutral (2), ◦ somewhat positive (3) ◦ positive (4).

Data Format – Tab Separated Values Input Training Data PhraseId
SentenceId Phrase Sentiment 64 2 This quiet , introspective and entertaining independent is worth seeking . 4 Input Testing Data PhraseId SentenceId Phrase 156250 8550 All ends well , sort of , but the frenzied comic moments never click . Output – Analyzed Test Data (Comma Separated) PhraseId,Sentiment 156061,2

Steps ◦ Lowercase the input text; ◦ Stop Word Removal
(a, an , the etc.) from Text ◦ TF-IDF or Count Vectorizer (or, Bag of Words counts) ◦ Normalization of Vectors(L2) ◦ Training data is fetched to a Lib-Linear SVM (Machine Learning Model) ◦ Output is obtained in pre-defined format!

Representation: Bag of Words In this model, a text (such
as a sentence or a document) is represented as the bag (multi-set) of its words, disregarding grammar and even word order but keeping multiplicity. Example: D1: John likes to watch movies. Mary likes movies too. D2: John also likes to watch football games. Vocabulary {Word : Index} { "John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10 } There are 10 distinct words and using the indexes of the Vocabulary , each document is represented by a 10-entry vector: [1, 2, 1, 1, 2, 0, 0, 0, 1, 1] [1, 1, 1, 1, 0, 1, 1, 1, 0, 0] Note: Scikit-Learn has direct support this vector representation using a CountVectorizer. Similarly support is available for TF-IDF too.

Code Walk-through Python + Scikit Learn

And we are done  https://hayleyandjoelblog.files.wordpress.com/2015/02/hurray.png

Or wait… Are we? http://www.stepupleader.com/wp-content/uploads/2013/06/curious.jpg

Deep Learning (Word Vectors - NLP)

Word2Vec: CBOW and SKIP Models Mikolov T. et al. 2013

Recursive Neural Tensor Networks A new composition function ‘p’ was
introduced in a new compositional model called the RNTN, along with a new sentiment tree-bank, which allows training and evaluation with compositional information. More expressive than any other recursive neural network so far! Idea: Allow more interaction of Vectors. Image Courtesy: Socher et al. 2013 EMNLP

Conclusion o Important to understand: “Data is always predictive!” o
In real world data science, no single model fits all problems, so one needs to constantly learn about new techniques. o Real world data is notorious and one constantly faces new challenges to handle new problems. o “With more power comes more responsibility”. o The Prediction Effect: A little prediction goes a long way.

Predictive Analytics – Eric Siegel “The power to predict who
will click, buy, lie or die.”

Links and References Yongzheng Zhang, Dan Shen and Catherine Baudin
Sentiment Analysis in Practice, Tutorial delivered at ICDM 2011 Scikit Learn Supervised Learning: http://scikit- learn.org/stable/supervised_learning.html#supervised-learning Scikit Learn Working with Text - http://scikit- learn.org/stable/tutorial/text_analytics/working_with_text_data.html Andrew Ng’s Machine Learning Course: https://www.coursera.org/course/ml Manning and Jurafsky, Natural Language Processing Course https://www.coursera.org/course/nlp Learning Scikit-Learn: Machine Learning in Python http://www.amazon.com/Learning-scikit- learn-Machine-Python/dp/1783281936

THANK YOU! You can write to me at: ankit.bahuguna@teradata.com

Predictive Analytics

Predictive Analytics

More Decks by Ankit Bahuguna

Other Decks in Research

Featured

Transcript