Slide 1

Slide 1 text

Twitter @lgleasain Github lgleasain www.lancegleason.com www.polyglotprogrammincinc.com [email protected] 1 Sunday, April 14, 13

Slide 2

Slide 2 text

Introductions 2 Sunday, April 14, 13

Slide 3

Slide 3 text

3 Sunday, April 14, 13

Slide 4

Slide 4 text

4 Sunday, April 14, 13

Slide 5

Slide 5 text

5 Sunday, April 14, 13

Slide 6

Slide 6 text

http://www.polyglotprogramminginc.com/purr- programming-2-0/ 6 Sunday, April 14, 13

Slide 7

Slide 7 text

Data Science 7 Sunday, April 14, 13

Slide 8

Slide 8 text

8 Sunday, April 14, 13

Slide 9

Slide 9 text

8 Sunday, April 14, 13

Slide 10

Slide 10 text

8 Sunday, April 14, 13

Slide 11

Slide 11 text

9 Sunday, April 14, 13

Slide 12

Slide 12 text

10 Sunday, April 14, 13

Slide 13

Slide 13 text

Analytics 11 Sunday, April 14, 13

Slide 14

Slide 14 text

12 Sunday, April 14, 13

Slide 15

Slide 15 text

13 Sunday, April 14, 13

Slide 16

Slide 16 text

http://www.torlaune.de/euro-2012/spieler-relationen/ http://www.nytimes.com/interactive/2012/05/17/business/ dealbook/how-the-facebook-offering-compares.html?_r=0 http://www.nytimes.com/interactive/2012/08/24/us/drought- crops.html 14 Sunday, April 14, 13

Slide 17

Slide 17 text

? 15 Sunday, April 14, 13

Slide 18

Slide 18 text

16 Sunday, April 14, 13

Slide 19

Slide 19 text

Which Customers Bought the Most? 17 Sunday, April 14, 13

Slide 20

Slide 20 text

Which Will Buy the Most? 18 Sunday, April 14, 13

Slide 21

Slide 21 text

http://www.kaggle.com/c/titanic-gettingStarted 19 Sunday, April 14, 13

Slide 22

Slide 22 text

20 Sunday, April 14, 13

Slide 23

Slide 23 text

Coupon App 21 Sunday, April 14, 13

Slide 24

Slide 24 text

22 Sunday, April 14, 13

Slide 25

Slide 25 text

•Users can search for deals via proximity to location 22 Sunday, April 14, 13

Slide 26

Slide 26 text

•Users can search for deals via proximity to location •Customers can save coupons 22 Sunday, April 14, 13

Slide 27

Slide 27 text

•Users can search for deals via proximity to location •Customers can save coupons •When a coupon is used it is no longer available for use 22 Sunday, April 14, 13

Slide 28

Slide 28 text

•Users can search for deals via proximity to location •Customers can save coupons •When a coupon is used it is no longer available for use •Customers can search via categories 22 Sunday, April 14, 13

Slide 29

Slide 29 text

•Users can search for deals via proximity to location •Customers can save coupons •When a coupon is used it is no longer available for use •Customers can search via categories •Advertisers pay for data and preferential ad pacement 22 Sunday, April 14, 13

Slide 30

Slide 30 text

How many sales are being generated by this app? 23 Sunday, April 14, 13

Slide 31

Slide 31 text

Am I getting more active users? 24 Sunday, April 14, 13

Slide 32

Slide 32 text

How many users are saving coupons? 25 Sunday, April 14, 13

Slide 33

Slide 33 text

How many users are saving coupons? •Which stores have the most saves. 25 Sunday, April 14, 13

Slide 34

Slide 34 text

How many users are saving coupons? •Which stores have the most saves. •Do certain categories get more saves? 25 Sunday, April 14, 13

Slide 35

Slide 35 text

How many are using coupons? 26 Sunday, April 14, 13

Slide 36

Slide 36 text

How many are using coupons? •Which stores have the most uses? 26 Sunday, April 14, 13

Slide 37

Slide 37 text

How many are using coupons? •Which stores have the most uses? •Do certain categories get used more? 26 Sunday, April 14, 13

Slide 38

Slide 38 text

How many are using coupons? •Which stores have the most uses? •Do certain categories get used more? •Which physical store is the user in? 26 Sunday, April 14, 13

Slide 39

Slide 39 text

Which stores are they at? 27 Sunday, April 14, 13

Slide 40

Slide 40 text

Hidden Insights 28 Sunday, April 14, 13

Slide 41

Slide 41 text

29 Sunday, April 14, 13

Slide 42

Slide 42 text

30 Sunday, April 14, 13

Slide 43

Slide 43 text

Appstore Data 31 Sunday, April 14, 13

Slide 44

Slide 44 text

32 Sunday, April 14, 13

Slide 45

Slide 45 text

32 Sunday, April 14, 13

Slide 46

Slide 46 text

32 Sunday, April 14, 13

Slide 47

Slide 47 text

32 Sunday, April 14, 13

Slide 48

Slide 48 text

32 Sunday, April 14, 13

Slide 49

Slide 49 text

33 Sunday, April 14, 13

Slide 50

Slide 50 text

34 Sunday, April 14, 13

Slide 51

Slide 51 text

35 Sunday, April 14, 13

Slide 52

Slide 52 text

36 Sunday, April 14, 13

Slide 53

Slide 53 text

37 Sunday, April 14, 13

Slide 54

Slide 54 text

Logging (Papertrail/ Loggly) 37 Sunday, April 14, 13

Slide 55

Slide 55 text

Logging (Papertrail/ Loggly) Amazon S3 37 Sunday, April 14, 13

Slide 56

Slide 56 text

{"measure":"instance","instance": "stores","store_id": 64696,"company_id": 210,"store_name":"bebe", "controller":"api/v1/ stores","action":"index"} 38 Sunday, April 14, 13

Slide 57

Slide 57 text

39 Sunday, April 14, 13

Slide 58

Slide 58 text

Amazon Elastic Map Reduce 39 Sunday, April 14, 13

Slide 59

Slide 59 text

Amazon Elastic Map Reduce 39 Sunday, April 14, 13

Slide 60

Slide 60 text

DynamoDB 40 Sunday, April 14, 13

Slide 61

Slide 61 text

CREATE EXTERNAL TABLE events_1 ( id bigint, received_at string, generated_at string, source_id bigint, source_name string, source_ip string, facility string, severity string, program string, message string ) PARTITIONED BY ( dt string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION 's3://mybucket/papertrail/logs/production'; 41 Sunday, April 14, 13

Slide 62

Slide 62 text

ALTER TABLE events_1 RECOVER PARTITIONS; 42 Sunday, April 14, 13

Slide 63

Slide 63 text

CREATE EXTERNAL TABLE promotions_1 (id string, received_at string, source_id string, source_ip string, source_name string,measure string, instance string, promotion_id string, company_id string, controller string, action string) stored by 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHan dler' TBLPROPERTIES ("dynamodb.table.name" = "sh_promotions_latest", "dynamodb.column.mapping" = "id:id,received_at:received_at,source_id:source_id,source_i p:source_ip,source_name:source_name,measure:measure,i nstance:instance,promotion_id:promotion_id,company_id:c ompany_id,controller:controller,action:action"); 43 Sunday, April 14, 13

Slide 64

Slide 64 text

alter table promotions_1 recover partitions; 44 Sunday, April 14, 13

Slide 65

Slide 65 text

insert overwrite table promotions_1 select id, received_at, source_id, source_ip, source_name, get_json_object(message, '$.measure') as measure, get_json_object(message, '$.instance') as instance, get_json_object(message, '$.promotion_id') as promotion_id, get_json_object(message, '$.company_id') as company_id, get_json_object(message, '$.controller') as controller, get_json_object(message, '$.action') as action from events_1 where message like '%"promotion"%' ; 45 Sunday, April 14, 13

Slide 66

Slide 66 text

46 Sunday, April 14, 13

Slide 67

Slide 67 text

47 Sunday, April 14, 13

Slide 68

Slide 68 text

d3js.org 48 Sunday, April 14, 13

Slide 69

Slide 69 text

49 Sunday, April 14, 13

Slide 70

Slide 70 text

49 Sunday, April 14, 13

Slide 71

Slide 71 text

49 Sunday, April 14, 13

Slide 72

Slide 72 text

False Positives 50 Sunday, April 14, 13

Slide 73

Slide 73 text

51 Sunday, April 14, 13

Slide 74

Slide 74 text

Nearly ALL sick people have eaten PEAS (obviously then, the effects are cumulative). 52 Sunday, April 14, 13

Slide 75

Slide 75 text

An estimated 99.9% of all people who die from cancer or heart attacks have eaten PEAS. 53 Sunday, April 14, 13

Slide 76

Slide 76 text

Another 99.9% of people involved in auto accidents ate PEAS within 60-days before the accident. 54 Sunday, April 14, 13

Slide 77

Slide 77 text

Among people born in 1839 who later dined on PEAS, there has been a 100% mortality rate 55 Sunday, April 14, 13

Slide 78

Slide 78 text

Peas Will Kill You 56 Sunday, April 14, 13

Slide 79

Slide 79 text

We had 4000 app downloads this month. We are doing great.... 57 Sunday, April 14, 13

Slide 80

Slide 80 text

58 Sunday, April 14, 13

Slide 81

Slide 81 text

Most people use the app once and then uninstall it. 59 Sunday, April 14, 13

Slide 82

Slide 82 text

60 Sunday, April 14, 13

Slide 83

Slide 83 text

My shopping app just saw a spike in weekly usage after I made UI changes. 61 Sunday, April 14, 13

Slide 84

Slide 84 text

That UI change led to more users! 62 Sunday, April 14, 13

Slide 85

Slide 85 text

63 Sunday, April 14, 13

Slide 86

Slide 86 text

The change went live during the last week of November. 64 Sunday, April 14, 13

Slide 87

Slide 87 text

65 Sunday, April 14, 13

Slide 88

Slide 88 text

Be Wary of N of 1 Experiments 66 Sunday, April 14, 13

Slide 89

Slide 89 text

Segmentation 67 Sunday, April 14, 13

Slide 90

Slide 90 text

Sparse Data 68 Sunday, April 14, 13

Slide 91

Slide 91 text

To Get Statistically Meaningful Results you will need thousands of data points 69 Sunday, April 14, 13

Slide 92

Slide 92 text

The Results Need to Pass the Smell Test 70 Sunday, April 14, 13

Slide 93

Slide 93 text

71 Sunday, April 14, 13

Slide 94

Slide 94 text

Collect Lots of Data Early 72 Sunday, April 14, 13

Slide 95

Slide 95 text

Go For Low Hanging Fruit First 73 Sunday, April 14, 13

Slide 96

Slide 96 text

Try to Gather Data Rich Data Points (whenever possible) 74 Sunday, April 14, 13

Slide 97

Slide 97 text

75 Sunday, April 14, 13

Slide 98

Slide 98 text

Twitter @lgleasain Github lgleasain www.lancegleason.com www.polyglotprogrammincinc.com [email protected] 76 Sunday, April 14, 13