Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Predictive Analytics

Predictive Analytics

An Introduction to Predictive Analytics, delivered at Teradata Gmbh. Munich, Germany.

Ankit Bahuguna

July 22, 2015
Tweet

More Decks by Ankit Bahuguna

Other Decks in Research

Transcript

  1. In the 24 hours since the launch of the Apple

    Watch on 9 March, Hotwire’s social media analysis picked up 981,021 mentions of the device using the terms Apple Watch, #AppleWatch, and #AppleWatchEvent. Of these mentions, a massive 42 per cent were found to contain negative sentiment towards the devices – 58 per cent was however positive. Source: http://www.thedrum.com/news/2015/03/10/apple-watch-sees-42-negative-response-twitter-battery-life-and-price-being-main
  2. Predictive Analytics in Media o Hewlett-Packard “Flight Risk” score: Gitali

    Halder, Hewlett-Packard, and Anindya Dey, Hewlett Packard, “Attrition Driver Analysis,” Predictive Analytics World London Conference, November 30, 2011, London, UK. o “Target Knew Teen Was Pregnant Before Her Dad.” Fox News, February 24, 2012. o NETFLIX: Clive Thompson, “If You Liked This, You’re Sure to Love That,” New York Times, November 21, 2008. www.nytimes.com/2008/11/23/magazine/23Netflix-t.html. o IBM, “IBM Watson: Ushering in a new era of computing,” IBM Innovations, April 11, 2012. www-03.ibm.com/innovation/us/watson/ o Google: Peter van der Graff, “How Search Engines Use Machine Learning for Pattern Detection,” Search Engine Watch, December 1, 2011. http://searchenginewatch.com/article/2129359/How-Search-Engines-Use-Machine-Learning- for-Pattern-Detection
  3. The Prediction Effect: A little prediction goes a long way.

    Predictive analytics (PA) Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions.
  4. Predictive Model o A mechanism that predicts a behavior of

    an individual, such as click, buy, lie, or die. o It takes characteristics of the individual as input, and provides a predictive score as output. o The higher the score, the more likely it is that the individual will exhibit the predicted behavior.
  5. PA APPLICATION: TARGETING DIRECT MARKETING What’s predicted: Which customers will

    respond to marketing contact. What’s done about it: Contact customers more likely to respond.
  6. Mailing List with a Million Prospects o 1 Million Unique

    Customers o $2 to mail each one. o Say, 1 out of 100 buys your product (10000 responses) o Say, For each product, you get a profit of 220$ for each rare positive response. o Profit = Revenue - Cost o ($220 x 10, 000 responses) – ($2 x 1 million) o Profit = $200,000 Are you Happy Yet ?
  7. PA Gives: Most Likely Responders o It earmarks a quarter

    of entire list o Says “These folks are three times more likely to respond than average!” o Now, you have a short list of 250,000 customers of which 3 percent will respond—7,500 responses. o But again 3% Confidence of Response ? o Let’s do some math again!
  8. Working out Math on PA solution o If we send

    mail to only this short list then we profit: o Profit = Revenue - Cost o ($220 x 7,500 responses) – ($2 x 250,000) o Profit = $1,150,000 o We just improved your profit 5.75 times over by mailing to fewer people (and, in so doing, expending fewer trees). o In particular, you predicted who wasn’t worth contacting and simply left them alone. Aggregate bottom line for PA is huge!
  9. PA Application: PREDICTIVE ADVERTISEMENT TARGETING What’s predicted: Which ad each

    customer is most likely to click. What’s done about it: Display the best ad (based on the likelihood of a click as well as the bounty paid by its sponsor.
  10. Probability of Clicking on the Advert. IF the individual ◦

    Is still in high school ◦ AND ◦ expects to graduate college within three years ◦ AND ◦ indicates certain military interest ◦ AND ◦ has not been shown this ad yet THEN the probability of clicking on the ad for the Art Institute is 13.5 percent
  11. PA APPLICATION: BLACK BOX TRADING What’s predicted: Whether a stock

    will go up or down. What’s done about it: Buy stocks that will go up; sell those that will go down. Several Pointers and Strategies: One of them is financial Sentiment Analysis, which we cover later.
  12. Machine Learning: Classification In classification, we use an object's characteristics

    to identify which class (or group) it belongs to. Source: Wikipedia
  13. Machine Learning: Clustering Clustering is the task of grouping a

    set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). Source: Wikipedia
  14. Machine Learning: Supervised vs Unsupervised Supervised learning is the machine

    learning task of inferring a function from labeled training data TRAINING DATA: LABELED DATA whereas, The problem of unsupervised learning is that of trying to find hidden structure in unlabeled data. TRAINING DATA: UNLABELED DATA
  15. Reviews – What do the people say? Valar Morghulis! (All

    men must die) Valar Dohaeris! (All men must serve) Love this series! Great actors and I love the characters. I am always on pins and needles waiting for each new season of this show! I think this is a fantastic series. Although the fourth year was not as exciting to me as the first three, I still look forward to seasons 5 and 6. I felt this season was not the strongest of the series. Love the series, a little disappointed that it will have to end one day!
  16. Sentiment Analysis A basic task in sentiment analysis is classifying

    the polarity of a given text at the document, sentence, or feature/ aspect level. Whether the expressed opinion in a document, a sentence or an entity feature/ aspect is positive, negative, or neutral. “Beyond polarity" sentiment classification looks at emotional states such as "angry," "sad," and "happy."
  17. Let’s Brainstorm – Predict Sentiment ? 1. This movie was

    fantastic. 2. This is the worst movie, I have seen my entire life! 3. The direction in this movie was not very good.
  18. Let’s Brainstorm – Predict Sentiment ? 1. This movie was

    fantastic. 2. This is the worst movie, I have seen my entire life! 3. The direction in this movie was not very good.
  19. Information about Kaggle Data-set ◦ In-Domain Data, originally from Rotten

    Tomatoes. ◦ Training Data: Kaggle Movie Reviews 156,060 Phrases ◦ Testing Data: Kaggle Movie Reviews 66,292 Phrases ◦ Task: Classify test phrases into one of the five categories: ◦ negative (0), ◦ somewhat negative (1), ◦ neutral (2), ◦ somewhat positive (3) ◦ positive (4).
  20. Data Format – Tab Separated Values Input Training Data PhraseId

    SentenceId Phrase Sentiment 64 2 This quiet , introspective and entertaining independent is worth seeking . 4 Input Testing Data PhraseId SentenceId Phrase 156250 8550 All ends well , sort of , but the frenzied comic moments never click . Output – Analyzed Test Data (Comma Separated) PhraseId,Sentiment 156061,2
  21. Steps ◦ Lowercase the input text; ◦ Stop Word Removal

    (a, an , the etc.) from Text ◦ TF-IDF or Count Vectorizer (or, Bag of Words counts) ◦ Normalization of Vectors(L2) ◦ Training data is fetched to a Lib-Linear SVM (Machine Learning Model) ◦ Output is obtained in pre-defined format!
  22. Representation: Bag of Words In this model, a text (such

    as a sentence or a document) is represented as the bag (multi-set) of its words, disregarding grammar and even word order but keeping multiplicity. Example: D1: John likes to watch movies. Mary likes movies too. D2: John also likes to watch football games. Vocabulary {Word : Index} { "John": 1, "likes": 2, "to": 3, "watch": 4, "movies": 5, "also": 6, "football": 7, "games": 8, "Mary": 9, "too": 10 } There are 10 distinct words and using the indexes of the Vocabulary , each document is represented by a 10-entry vector: [1, 2, 1, 1, 2, 0, 0, 0, 1, 1] [1, 1, 1, 1, 0, 1, 1, 1, 0, 0] Note: Scikit-Learn has direct support this vector representation using a CountVectorizer. Similarly support is available for TF-IDF too.
  23. Recursive Neural Tensor Networks A new composition function ‘p’ was

    introduced in a new compositional model called the RNTN, along with a new sentiment tree-bank, which allows training and evaluation with compositional information. More expressive than any other recursive neural network so far! Idea: Allow more interaction of Vectors. Image Courtesy: Socher et al. 2013 EMNLP
  24. Conclusion o Important to understand: “Data is always predictive!” o

    In real world data science, no single model fits all problems, so one needs to constantly learn about new techniques. o Real world data is notorious and one constantly faces new challenges to handle new problems. o “With more power comes more responsibility”. o The Prediction Effect: A little prediction goes a long way.
  25. Links and References Yongzheng Zhang, Dan Shen and Catherine Baudin

    Sentiment Analysis in Practice, Tutorial delivered at ICDM 2011 Scikit Learn Supervised Learning: http://scikit- learn.org/stable/supervised_learning.html#supervised-learning Scikit Learn Working with Text - http://scikit- learn.org/stable/tutorial/text_analytics/working_with_text_data.html Andrew Ng’s Machine Learning Course: https://www.coursera.org/course/ml Manning and Jurafsky, Natural Language Processing Course https://www.coursera.org/course/nlp Learning Scikit-Learn: Machine Learning in Python http://www.amazon.com/Learning-scikit- learn-Machine-Python/dp/1783281936