Slide 1

Slide 1 text

Behavioral Analytics U!"#r$%&!"'!( %)# H*w & W)+ *f U$#r$ Ben Johnson, Skyland Labs

Slide 2

Slide 2 text

My Background

Slide 3

Slide 3 text

Former Oracle DBA Data Visualization Behavioral Analytics

Slide 4

Slide 4 text

W)&% '$ B#)&v'*r&, A!&,+%'-$?

Slide 5

Slide 5 text

W)&% '$ B#)&v'*r&, A!&,+%'-$? Let’s start with an example.

Slide 6

Slide 6 text

Our Example: You are a SaaS company with a web app.

Slide 7

Slide 7 text

Our Example: You are a SaaS company with a web app. SQL database?

Slide 8

Slide 8 text

Our Example: You are a SaaS company with a web app. SQL database? check.

Slide 9

Slide 9 text

Our Example: You are a SaaS company with a web app. SQL database? check. Tons of Logs?

Slide 10

Slide 10 text

Our Example: You are a SaaS company with a web app. SQL database? check. Tons of Logs? hell yeah!

Slide 11

Slide 11 text

Let’s use SQL to understand our users!

Slide 12

Slide 12 text

What are users doing?

Slide 13

Slide 13 text

What are users doing? SELECT url, count(*) FROM page_views GROUP BY url;

Slide 14

Slide 14 text

What are users doing? 0 12.5 25 37.5 50 /index.html /about.html /pricing.html /signup.html

Slide 15

Slide 15 text

Who is doing it?

Slide 16

Slide 16 text

Who is doing it? SELECT url, gender, count(*) FROM page_views INNER JOIN users GROUP BY url, gender;

Slide 17

Slide 17 text

Who is doing it? 0 15 30 45 60 /index.html /about.html /pricing.html /signup.html Men Women

Slide 18

Slide 18 text

Where are they doing it?

Slide 19

Slide 19 text

Where are they doing it? SELECT url, city, count(*) FROM page_views INNER JOIN users GROUP BY url, city;

Slide 20

Slide 20 text

Where are they doing it? 0 15 30 45 60 /index.html /about.html /pricing.html /signup.html San Francisco Denver London

Slide 21

Slide 21 text

When are they doing it?

Slide 22

Slide 22 text

When are they doing it? SELECT url, DATE_FORMAT(‘%Y-%m), count(*) FROM page_views GROUP BY url, ...;

Slide 23

Slide 23 text

When are they doing it? 0 10 20 30 40 /index.html /about.html /pricing.html /signup.html Jan 2013 Feb 2013 Mar 2013 April 2013

Slide 24

Slide 24 text

SQL works great for these questions:

Slide 25

Slide 25 text

SQL works great for these questions: WHAT

Slide 26

Slide 26 text

SQL works great for these questions: WHAT WHO

Slide 27

Slide 27 text

SQL works great for these questions: WHO WHEN WHAT

Slide 28

Slide 28 text

SQL works great for these questions: WHEN WHERE WHO WHAT

Slide 29

Slide 29 text

SQL analyzes state.

Slide 30

Slide 30 text

How do we understand actions?

Slide 31

Slide 31 text

How are they doing it?

Slide 32

Slide 32 text

How are they doing it? “H*w” .#&!$ /!*w'!( w)&% )&pp#!#" "0r'!( &! &-%'*! *r %&$/.

Slide 33

Slide 33 text

How are they doing it? “H*w” .#&!$ /!*w'!( w)&% )&pp#!#" "0r'!( &! &-%'*! *r %&$/. 1&.p,#: H*w "* 0$#r$ -)#-/*0%?

Slide 34

Slide 34 text

How are they doing it? SELECT t1.url, t2.url AS next_url, count(*) FROM page_views t1 LEFT JOIN ( SELECT url FROM page_views WHERE id > t1.id AND user_id = t1.user_id ORDER BY id ASC LIMIT 1) AS t2 WHERE t1.url = ‘/checkout.html’ GROUP BY t1.url, t2.url;

Slide 35

Slide 35 text

How are they doing it? 0 37.5 75 112.5 150 /enter_cc_info.html /product.html /index.html NULL

Slide 36

Slide 36 text

How are they doing it? 0 37.5 75 112.5 150 /enter_cc_info.html /product.html /index.html NULL Ab&!"*!#" S)*pp'!( C&r%$!

Slide 37

Slide 37 text

Why are they doing it?

Slide 38

Slide 38 text

Why are they doing it? “W)+” .#&!$ /!*w'!( w)&% -&.# b#f*r# &! &-%'*!.

Slide 39

Slide 39 text

Why are they doing it? “W)+” .#&!$ /!*w'!( w)&% -&.# b#f*r# &! &-%'*!. 1&.p,#: W)+ "* 0$#r$ -&!-#, &--*0!%$?

Slide 40

Slide 40 text

Why are they doing it? SELECT t1.url, t2.url AS prev_url, count(*) FROM page_views t1 LEFT JOIN ( SELECT url FROM page_views WHERE id < t1.id AND user_id = t1.user_id ORDER BY id DESC LIMIT 1) AS t2 WHERE t1.url = ‘/cancel_account.html’ GROUP BY t1.url, t2.url;

Slide 41

Slide 41 text

Why are they doing it? 0 20 40 60 80 /help.html /index.html /404.html /contact_us.html

Slide 42

Slide 42 text

Why are they doing it? 0 20 40 60 80 /help.html /index.html /404.html /contact_us.html Y*0r “)#,p” p&(#$ "*!’% )#,p.

Slide 43

Slide 43 text

Why are they doing it? 0 20 40 60 80 /help.html /index.html /404.html /contact_us.html U$#r$ -*.'!( %* +*0r $'%# %* q0'%

Slide 44

Slide 44 text

Why are they doing it? 0 20 40 60 80 /help.html /index.html /404.html /contact_us.html Y*0r w#b $'%# '$ br*/#!

Slide 45

Slide 45 text

Why are they doing it? 0 20 40 60 80 /help.html /index.html /404.html /contact_us.html Y*0r -0$%*.#r $0pp*r% $0-/$.

Slide 46

Slide 46 text

B#)&v'*r&, &!&,+%'-$ '$ &b*0% 0!"#r$%&!"'!( %)# “)*w” &!" %)# “w)+”

Slide 47

Slide 47 text

Now let’s use Hadoop to understand our users

Slide 48

Slide 48 text

Building Funnel Analysis in Hadoop

Slide 49

Slide 49 text

Building Funnel Analysis in Hadoop Step 1: Parse logs

Slide 50

Slide 50 text

Building Funnel Analysis in Hadoop Step 1: Parse logs Step 2: Group by user

Slide 51

Slide 51 text

Building Funnel Analysis in Hadoop Step 1: Parse logs Step 2: Group by user Step 3: Sessionize

Slide 52

Slide 52 text

Building Funnel Analysis in Hadoop Step 1: Parse logs Step 2: Group by user Step 3: Sessionize Step 4: Apply Pattern Matching

Slide 53

Slide 53 text

Building Funnel Analysis in Hadoop Step 1: Parse logs Step 2: Group by user Step 3: Sessionize Step 4: Apply Pattern Matching Step 5: Aggregate

Slide 54

Slide 54 text

Building Funnel Analysis in Hadoop Step 1: Parse logs Step 2: Group by user Step 3: Sessionize Step 4: Apply Pattern Matching Step 5: Aggregate Step 6: Pull hair repeatedly.

Slide 55

Slide 55 text

Why not make a system & language for describing behavior?

Slide 56

Slide 56 text

Behavior is...

Slide 57

Slide 57 text

A-%'*!$

Slide 58

Slide 58 text

A-%'*!$ & S%&%#

Slide 59

Slide 59 text

A-%'*!$ + 2.# & S%&%#

Slide 60

Slide 60 text

You need a way to describe time:

Slide 61

Slide 61 text

“WHEN”

Slide 62

Slide 62 text

“WHEN” Let’s see some examples

Slide 63

Slide 63 text

Simple Hit Count H*w .&!+ p#*p,# -,'-/ *! .+ )*.# p&(#?

Slide 64

Slide 64 text

Simple Hit Count WHEN action == ‘/index.html’ THEN SELECT count(); END H*w .&!+ p#*p,# -,'-/ *! .+ )*.# p&(#?

Slide 65

Slide 65 text

Simple Hit Count H*w .&!+ p#*p,# -,'-/ *! .+ )*.# p&(#? 0 25 50 75 100 /index.html

Slide 66

Slide 66 text

Simple Hit Count w/ Demographics H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?

Slide 67

Slide 67 text

Simple Hit Count w/ Demographics WHEN action == ‘/index.html’ THEN SELECT count() GROUP BY gender; END H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?

Slide 68

Slide 68 text

Simple Hit Count w/ Demographics 0 25 50 75 100 Men Women H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?

Slide 69

Slide 69 text

Answering the “How” H*w "* 0$#r$ -)#-/*0%?

Slide 70

Slide 70 text

Answering the “How” WHEN action == ‘/checkout.html’ THEN WHEN WITHIN 1 STEP THEN SELECT count() GROUP BY action; END END H*w "* 0$#r$ -)#-/*0%?

Slide 71

Slide 71 text

0 25 50 75 100 /enter_cc.html /product.html /index.html NULL Answering the “How” H*w "* 0$#r$ -)#-/*0%?

Slide 72

Slide 72 text

Answering the “How” H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?

Slide 73

Slide 73 text

Answering the “How” WHEN action == ‘/checkout.html’ THEN WHEN action == ‘/enter_cc.html’ WITHIN 1 STEP THEN WHEN WITHIN 1 STEP THEN SELECT count() GROUP BY action; END END END H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?

Slide 74

Slide 74 text

0 25 50 75 100 /confirm.html NULL Answering the “How” H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?

Slide 75

Slide 75 text

Combine these to make a funnel:

Slide 76

Slide 76 text

F0!!#, Q0#r+:

Slide 77

Slide 77 text

WHEN action == ‘/checkout.html’ THEN SELECT count() INTO “step0”; WHEN action == ‘/enter_cc.html’ WITHIN 1 STEP THEN SELECT count() INTO “step1”; WHEN action == ‘/confirm.html’ WITHIN 1 STEP THEN SELECT count() INTO “step2”; END END F0!!#, Q0#r+:

Slide 78

Slide 78 text

F0!!#, O0%p0%:

Slide 79

Slide 79 text

{ “step0”:{“count”:100}, “step1”:{“count”:40}, “step2”:{“count”:32} } F0!!#, O0%p0%:

Slide 80

Slide 80 text

0 25 50 75 100 /checkout.html /enter_cc.html /confirm.html Answering the “How” H*w "* 0$#r$ -)#-/*0%? (step0) (step1) (step2)

Slide 81

Slide 81 text

0 25 50 75 100 /checkout.html /enter_cc.html /confirm.html Answering the “How” H*w "* 0$#r$ -)#-/*0%? (step0) (step1) (step2)

Slide 82

Slide 82 text

0 25 50 75 100 /checkout.html /enter_cc.html /confirm.html Answering the “How” H*w "* 0$#r$ -)#-/*0%? (step0) (step1) (step2)

Slide 83

Slide 83 text

Answering the “Why” W)+ "* 0$#r$ -&!-#, &--*0!%$?

Slide 84

Slide 84 text

Answering the “Why” WHEN action == ‘/cancel_account’ THEN WHEN WITHIN -1 STEP THEN SELECT count() GROUP BY action; END END W)+ "* 0$#r$ -&!-#, &--*0!%$?

Slide 85

Slide 85 text

Answering the “Why” WHEN action == ‘/cancel_account’ THEN WHEN WITHIN -1 STEP THEN SELECT count() GROUP BY action; END END W)+ "* 0$#r$ -&!-#, &--*0!%$?

Slide 86

Slide 86 text

What can we do with “within”?

Slide 87

Slide 87 text

Within +/- n Steps

Slide 88

Slide 88 text

Within 1 Step

Slide 89

Slide 89 text

Within 2 Steps

Slide 90

Slide 90 text

Within 1..3 Steps

Slide 91

Slide 91 text

Within -1 Steps

Slide 92

Slide 92 text

Within -2 Steps

Slide 93

Slide 93 text

Within +/- Sessions

Slide 94

Slide 94 text

Within +/- Sessions

Slide 95

Slide 95 text

Within 0 Sessions (current session only)

Slide 96

Slide 96 text

Within 1 Session

Slide 97

Slide 97 text

Within -1 Session

Slide 98

Slide 98 text

W'%)'! &,$* w*r/$ *! "&%#$ Days, Weeks, Months, etc.

Slide 99

Slide 99 text

S/+ An open source, behavioral analytics database.

Slide 100

Slide 100 text

Why Build a Database?

Slide 101

Slide 101 text

Data Model

Slide 102

Slide 102 text

Performance Data Model

Slide 103

Slide 103 text

Performance Scaling Data Model

Slide 104

Slide 104 text

The Data Model

Slide 105

Slide 105 text

The Data Model Relational databases look like this:

Slide 106

Slide 106 text

The Data Model Relational databases look like this: ID name gender state 1 Bob M CO 2 Susy F CA users

Slide 107

Slide 107 text

The Data Model Relational databases look like this: ID name gender state 1 Bob M CO 2 Susy F CA users User ID url date 1 /home ... 1 /signup ... 2 /home ... page_views

Slide 108

Slide 108 text

The Data Model Logs look like this:

Slide 109

Slide 109 text

The Data Model Logs look like this: User ID url date 1 /home ... 2 /home ... 2 /signup ... 1 /product/123 ... 3 /login ... 1 /checkout ...

Slide 110

Slide 110 text

The Data Model Logs look like this: User ID url date 1 /home ... 2 /home ... 2 /signup ... 1 /product/123 ... 3 /login ... 1 /checkout ...

Slide 111

Slide 111 text

The Data Model We think of behavior in terms of timelines

Slide 112

Slide 112 text

The Data Model We think of behavior in terms of timelines Bob

Slide 113

Slide 113 text

The Data Model We think of behavior in terms of timelines Bob /home

Slide 114

Slide 114 text

The Data Model We think of behavior in terms of timelines Bob /home /product/123

Slide 115

Slide 115 text

The Data Model We think of behavior in terms of timelines Bob /home /product/123 /checkout

Slide 116

Slide 116 text

The Data Model We think of behavior in terms of timelines Bob /home /product/123 /checkout Susy

Slide 117

Slide 117 text

The Data Model We think of behavior in terms of timelines Bob /home /product/123 /checkout Susy /signup

Slide 118

Slide 118 text

The Data Model We think of behavior in terms of timelines Bob /home /product/123 /checkout Susy /signup /welcome

Slide 119

Slide 119 text

The Data Model So that’s how Sky stores it. Bob /home /product/123 /checkout Susy /signup /welcome

Slide 120

Slide 120 text

Performance

Slide 121

Slide 121 text

Organize data how you’ll analyze it Performance

Slide 122

Slide 122 text

Organize data how you’ll analyze it Performance Minimize data movement

Slide 123

Slide 123 text

Organize data how you’ll analyze it Performance Minimize data movement Tightly pack data (so more stays in memory)

Slide 124

Slide 124 text

Expect to query about Performance 10,000,000 #v#!%$ / -*r# / $#-*!"

Slide 125

Slide 125 text

Expect to query about Performance 10,000,000 #v#!%$ / -*r# / $#-*!" (with linear scaling)

Slide 126

Slide 126 text

Scaling

Slide 127

Slide 127 text

Timelines are analyzed in isolation Scaling

Slide 128

Slide 128 text

Timelines are analyzed in isolation Automatically shards data across cores Scaling

Slide 129

Slide 129 text

Timelines are analyzed in isolation Automatically shards data across cores Automatically shards across nodes (v0.3.1) Scaling

Slide 130

Slide 130 text

S/+ I!%#r!&,$ A quick peek under the hood

Slide 131

Slide 131 text

S/+ I!%#r!&,$

Slide 132

Slide 132 text

S/+ I!%#r!&,$ Recently ported to Go (originally C)

Slide 133

Slide 133 text

S/+ I!%#r!&,$ Recently ported to Go (originally C) Query Engine still written in C

Slide 134

Slide 134 text

S/+ I!%#r!&,$ Recently ported to Go (originally C) Query Engine still written in C Storage handled with LevelDB

Slide 135

Slide 135 text

S/+ I!%#r!&,$ Recently ported to Go (originally C) Query Engine still written in C Storage handled with LevelDB RESTful JSON over HTTP protocol

Slide 136

Slide 136 text

S/+ I!%#r!&,$ Loose Schema, No Limit On “Columns”

Slide 137

Slide 137 text

S/+ I!%#r!&,$ Loose Schema, No Limit On “Columns” Types: Strings, Integers, Floats & Booleans

Slide 138

Slide 138 text

S/+ I!%#r!&,$ Loose Schema, No Limit On “Columns” Types: Strings, Integers, Floats & Booleans Possible Future Types: Map, Array, Dates, Lat/Long

Slide 139

Slide 139 text

S/+ I!%#r!&,$ Loose Schema, No Limit On “Columns” Types: Strings, Integers, Floats & Booleans Possible Future Types: Map, Array, Dates, Lat/Long Flexible Query System Built On LuaJIT

Slide 140

Slide 140 text

O%)#r R&" S%0ff

Slide 141

Slide 141 text

O%)#r R&" S%0ff (C*.'!( S**!)

Slide 142

Slide 142 text

SQL-like Query Language (Currently uses a JSON query interface)

Slide 143

Slide 143 text

Multi-node Distribution (coming in next version, v0.3.1)

Slide 144

Slide 144 text

Predictive Behavioral Analytics

Slide 145

Slide 145 text

Landmark

Slide 146

Slide 146 text

Landmark Hosted Behavioral Analytics

Slide 147

Slide 147 text

Landmark Hosted Behavioral Analytics Sign up at http://landmark.io

Slide 148

Slide 148 text

Demo

Slide 149

Slide 149 text

Questions? http://skydb.io Google Group: skydb Twitter: @benbjohnson [email protected]

Slide 150

Slide 150 text

Image Attribution Database designed by Sergey Shmidt from The Noun Project Line Graph designed by Cris Dobbins from The Noun Project Computer designed by Olivier Guin from The Noun Project Database designed by Cees de Vries from The Noun Project Document designed by Piotrek Chuchla from The Noun Project Worker designed by Bart Laugs from The Noun Project People designed by Studio Het Mes from The Noun Project Clock designed by Infinity Kim from The Noun Project Question designed by Greg Pabst from The Noun Project Hammer designed by John Caserta from The Noun Project Rocket designed by James Fenton from The Noun Project Expand designed by Dmitry Baranovskiy from The Noun Project Rock n Roll designed by Cengiz SARI from The Noun Project Robot designed by Simon Child from The Noun Project Profanity designed by Juan Pablo Bravo from The Noun Project