Slide 1

Slide 1 text

Social Media Intelligence Text, Network Mining and Predictive Analytics Combined Phil Winters Customer Perspective Champion Data Wisperer [email protected] www.knime.com

Slide 2

Slide 2 text

NOTE Examples, workflows (ie: the complete programs) as well as white papers are available for download on: www.knime.com 2

Slide 3

Slide 3 text

Copyright © 2013 by KNIME.com AG All Rights Reserved - Confidential The KNIME Platform 3

Slide 4

Slide 4 text

KNIME Selected Node Highlights • Statistics • Data Mining • Time Series • Image Processing • Neighborgrams • Web Analytics • Text Mining • Network Analysis • Social Media Analysis • WEKA • R Over 1000 native and imbedded nodes included: • Database Support • ETL • Text Processing • Data Generation • XML Read/Write • PMML Read / Write • Social Media Analysis • Business Intelligence • Community Nodes • 3rd Party Nodes Advanced Visualization 4

Slide 5

Slide 5 text

KNIME rated #1 in satisfaction for open source analytics platforms Copyright © 2012 by KNIME.com AG All Rights Reserved - Confidential

Slide 6

Slide 6 text

Hot Topics: Recommendation Engine 6

Slide 7

Slide 7 text

Hot Topics: Realtime Scoring 7

Slide 8

Slide 8 text

Social Media Analysis Water Water Everywhere, and not a drop to drink Approaches and Challenges: Cloud-based Approach: No Access to Data In-House Dashboard: No Analytics In-House Text Mining: Sentiment but no relevance In-House Network Mining: Relevance but no Sentiment 8

Slide 9

Slide 9 text

Case Study: Major European Telco Very rich new data sources about customers ! Combine – Text mining – Network Analysis – Classic Predictive Analytics • Modeling, Clustering, Time Series, etc Combine with internal Data makes the text „relevant“ – Include Product names/Categories – exclude Staff Members – Include number of web hits per page... – Include existing marketing positioning – Include major campaign information 9

Slide 10

Slide 10 text

Social Media Intelligence: Major European Telco 10

Slide 11

Slide 11 text

Our Goal in Social Media Analysis 11 Text Mining for Sentiment Drill Down on special cases Network Mining for Relevance Analytics for Prediction

Slide 12

Slide 12 text

Case Study Example: Slashdot Data “News for Nerds, Stuff that Matters“ 12 Basic Facts: • 24532 users • 491 threads with 15 – 843 responses from 12 – 507 users • 113505 posts (text mining on posts) • 60 main topics

Slide 13

Slide 13 text

Text Mining Remove anonymous users, group by PostID Words Tagging Positive words Negative words MPQA Corpus BoW Standard Named Entity Filter Word Frequency User Bins Word cloud for selected users

Slide 14

Slide 14 text

Slashdot – Text Mining List of negative and positive words (MPQA Opinion Corpus) Tag positive and negative words Count words in posts Aggregate over users Negative + Positive User. Most positive user: dada21 (2838 positive / 1725 negative words) Most negative user: pNutz (43 positive / 109 negative words) 16016 positive users 7107 negative users Which Topics have positive users in common ? – Government – People – Law/s – Money – Market

Slide 15

Slide 15 text

Slashdot – Text Mining User tag cloud: pNutz

Slide 16

Slide 16 text

Slashdot – Text Mining Most negative post:

Slide 17

Slide 17 text

Slashdot – Text Mining Positive vs. negative word frequency dada21 99BottlesOfBeer dbIII pNutz positive word count negative word count

Slide 18

Slide 18 text

18 Network Creation User1 User2 User3 User6 User4 User5

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

Topic Graphs 20

Slide 21

Slide 21 text

21 Topic Graphs

Slide 22

Slide 22 text

Topic Graph: NASA 22

Slide 23

Slide 23 text

Hubs & Authorities 23 • Hubs = Follower • Authorities = Leader Filtering anonymous users and creating network Centrality index to define hub weight and authority weight Users with hub and authority weights and other features

Slide 24

Slide 24 text

Hubs & Authorities 24 dada21 Doc Ruby Carl Bialik pNutz 99BottlesOfBeerInMyF Tube Steak

Slide 25

Slide 25 text

Combining Text and Network Mining 25 Network Analysis Text Analysis Hub and Authority Score per User Attitude Level per User

Slide 26

Slide 26 text

26 Carl Bialik dada21 Doc Ruby 99BottlesOfBeerInMyF WebHosting Guy pNutz Tube Steak Catbeller Hubs, Authorities &Attitudes from the WSJ

Slide 27

Slide 27 text

What we have found ... - The positive leaders - The neutral leaders - The negative leaders - The inactive users 27 What identifies each group? How do I identify a new user? How do I handle each user?

Slide 28

Slide 28 text

Why Clustering? - No a priori knowledge (not even on a subset of users) - Prediction and interpretation capabilities required 28 k-Means algorithm

Slide 29

Slide 29 text

Normalization 29 • (Authority score, Hub score) in [0,1] x [0,1] • Attitude level in [-66, 1113]

Slide 30

Slide 30 text

Re-sampling the Training Set 30 k = 10

Slide 31

Slide 31 text

Number of Clusters Users with a negative attitude are hard to catch! K=30: 10 clusters with more than 1000 users; 2 clusters with clear negative attitude (< 0.4) K=20: 5 clusters with more than 1000 users; 2 clusters with negative attitude (<0.4) K=10: 2 clusters with more than 5000 users and no cluster with a negative attitude anymore. 31

Slide 32

Slide 32 text

Additional Discoveries • There are only very few real leaders! Authority and hub scores identify active participants rather than leaders. • Superfans can be found in cluster_3 • Negative and (sigh!) active users are collected in cluster_1. • Neutral users are usually inactive (cluster_2, cluster_7, and cluster_8) • Positive users with different degrees of activity are scattered across the remaining clusters. 32

Slide 33

Slide 33 text

The k-Means Clusters 33 Superfans Negative users Neutral users Fans

Slide 34

Slide 34 text

The operational Workflow 34 Pre-processing Cluster Extraction Assignment of new data

Slide 35

Slide 35 text

Lessons Learned Data Manipulation is the key…. The decision science flows from that Sentiment analysis is all about the Corpus ! 35 Network Analysis Sentiment Analysis

Slide 36

Slide 36 text

Capturing the data Options Available: From fee-paying to open source !

Slide 37

Slide 37 text

Big Data….. The Science 37

Slide 38

Slide 38 text

NOTE Examples, workflows (ie: the complete programs) as well as white papers are available for download on: www.knime.com 38

Slide 39

Slide 39 text

Copyright © 2013 by KNIME.com AG All Rights Reserved - Confidential Mark Your Calendars: KNIME’s 6th User Group Meeting 6.-7. March 2013 Zurich, Switzerland www.KNIME.com 39