Former Oracle DBA
Data Visualization
Behavioral Analytics
Slide 4
Slide 4 text
W)&% '$
B#)&v'*r&, A!&,+%'-$?
Slide 5
Slide 5 text
W)&% '$
B#)&v'*r&, A!&,+%'-$?
Let’s start with an example.
Slide 6
Slide 6 text
Our Example:
You are a SaaS company
with a web app.
Slide 7
Slide 7 text
Our Example:
You are a SaaS company
with a web app.
SQL database?
Slide 8
Slide 8 text
Our Example:
You are a SaaS company
with a web app.
SQL database?
check.
Slide 9
Slide 9 text
Our Example:
You are a SaaS company
with a web app.
SQL database?
check.
Tons of Logs?
Slide 10
Slide 10 text
Our Example:
You are a SaaS company
with a web app.
SQL database?
check.
Tons of Logs?
hell yeah!
Slide 11
Slide 11 text
Let’s use SQL to
understand our users!
Slide 12
Slide 12 text
What are users doing?
Slide 13
Slide 13 text
What are users doing?
SELECT url, count(*)
FROM page_views
GROUP BY url;
Slide 14
Slide 14 text
What are users doing?
0
12.5
25
37.5
50
/index.html /about.html /pricing.html /signup.html
Slide 15
Slide 15 text
Who is doing it?
Slide 16
Slide 16 text
Who is doing it?
SELECT url, gender, count(*)
FROM page_views
INNER JOIN users
GROUP BY url, gender;
Slide 17
Slide 17 text
Who is doing it?
0
15
30
45
60
/index.html /about.html /pricing.html /signup.html
Men Women
Slide 18
Slide 18 text
Where are they doing it?
Slide 19
Slide 19 text
Where are they doing it?
SELECT url, city, count(*)
FROM page_views
INNER JOIN users
GROUP BY url, city;
Slide 20
Slide 20 text
Where are they doing it?
0
15
30
45
60
/index.html /about.html /pricing.html /signup.html
San Francisco Denver London
Slide 21
Slide 21 text
When are they doing it?
Slide 22
Slide 22 text
When are they doing it?
SELECT url,
DATE_FORMAT(‘%Y-%m),
count(*)
FROM page_views
GROUP BY url, ...;
Slide 23
Slide 23 text
When are they doing it?
0
10
20
30
40
/index.html /about.html /pricing.html /signup.html
Jan 2013 Feb 2013
Mar 2013 April 2013
Slide 24
Slide 24 text
SQL works great
for these questions:
Slide 25
Slide 25 text
SQL works great
for these questions:
WHAT
Slide 26
Slide 26 text
SQL works great
for these questions:
WHAT WHO
Slide 27
Slide 27 text
SQL works great
for these questions:
WHO
WHEN
WHAT
Slide 28
Slide 28 text
SQL works great
for these questions:
WHEN WHERE
WHO
WHAT
Slide 29
Slide 29 text
SQL analyzes state.
Slide 30
Slide 30 text
How do we understand actions?
Slide 31
Slide 31 text
How are they doing it?
Slide 32
Slide 32 text
How are they doing it?
“H*w” .#&!$ /!*w'!( w)&%
)&pp#!#" "0r'!( &! &-%'*! *r %&$/.
Slide 33
Slide 33 text
How are they doing it?
“H*w” .#&!$ /!*w'!( w)&%
)&pp#!#" "0r'!( &! &-%'*! *r %&$/.
1&.p,#:
H*w "* 0$#r$ -)#-/*0%?
Slide 34
Slide 34 text
How are they doing it?
SELECT t1.url, t2.url AS next_url, count(*)
FROM page_views t1
LEFT JOIN (
SELECT url
FROM page_views
WHERE id > t1.id AND user_id = t1.user_id
ORDER BY id ASC
LIMIT 1) AS t2
WHERE t1.url = ‘/checkout.html’
GROUP BY t1.url, t2.url;
Slide 35
Slide 35 text
How are they doing it?
0
37.5
75
112.5
150
/enter_cc_info.html /product.html /index.html NULL
Slide 36
Slide 36 text
How are they doing it?
0
37.5
75
112.5
150
/enter_cc_info.html /product.html /index.html NULL
Ab&!"*!#" S)*pp'!( C&r%$!
Slide 37
Slide 37 text
Why are they doing it?
Slide 38
Slide 38 text
Why are they doing it?
“W)+” .#&!$ /!*w'!( w)&%
-&.# b#f*r# &! &-%'*!.
Slide 39
Slide 39 text
Why are they doing it?
“W)+” .#&!$ /!*w'!( w)&%
-&.# b#f*r# &! &-%'*!.
1&.p,#:
W)+ "* 0$#r$ -&!-#, &--*0!%$?
Slide 40
Slide 40 text
Why are they doing it?
SELECT t1.url, t2.url AS prev_url, count(*)
FROM page_views t1
LEFT JOIN (
SELECT url
FROM page_views
WHERE id < t1.id AND user_id = t1.user_id
ORDER BY id DESC
LIMIT 1) AS t2
WHERE t1.url = ‘/cancel_account.html’
GROUP BY t1.url, t2.url;
Slide 41
Slide 41 text
Why are they doing it?
0
20
40
60
80
/help.html /index.html /404.html /contact_us.html
Slide 42
Slide 42 text
Why are they doing it?
0
20
40
60
80
/help.html /index.html /404.html /contact_us.html
Y*0r “)#,p” p&(#$ "*!’% )#,p.
Answering the “How”
H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?
Slide 73
Slide 73 text
Answering the “How”
WHEN action == ‘/checkout.html’ THEN
WHEN action == ‘/enter_cc.html’ WITHIN 1 STEP
THEN
WHEN WITHIN 1 STEP THEN
SELECT count() GROUP BY action;
END
END
END
H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?
WHEN action == ‘/checkout.html’ THEN
SELECT count() INTO “step0”;
WHEN action == ‘/enter_cc.html’ WITHIN 1 STEP
THEN
SELECT count() INTO “step1”;
WHEN action == ‘/confirm.html’ WITHIN 1 STEP
THEN
SELECT count() INTO “step2”;
END
END
F0!!#, Q0#r+:
Answering the “Why”
WHEN action == ‘/cancel_account’ THEN
WHEN WITHIN -1 STEP THEN
SELECT count() GROUP BY action;
END
END
W)+ "* 0$#r$ -&!-#, &--*0!%$?
Slide 85
Slide 85 text
Answering the “Why”
WHEN action == ‘/cancel_account’ THEN
WHEN WITHIN -1 STEP THEN
SELECT count() GROUP BY action;
END
END
W)+ "* 0$#r$ -&!-#, &--*0!%$?
Slide 86
Slide 86 text
What can we do with “within”?
Slide 87
Slide 87 text
Within +/- n Steps
Slide 88
Slide 88 text
Within 1 Step
Slide 89
Slide 89 text
Within 2 Steps
Slide 90
Slide 90 text
Within 1..3 Steps
Slide 91
Slide 91 text
Within -1 Steps
Slide 92
Slide 92 text
Within -2 Steps
Slide 93
Slide 93 text
Within +/- Sessions
Slide 94
Slide 94 text
Within +/- Sessions
Slide 95
Slide 95 text
Within 0 Sessions
(current session only)
Slide 96
Slide 96 text
Within 1 Session
Slide 97
Slide 97 text
Within -1 Session
Slide 98
Slide 98 text
W'%)'! &,$* w*r/$ *! "&%#$
Days, Weeks, Months, etc.
Slide 99
Slide 99 text
S/+
An open source, behavioral analytics database.
Slide 100
Slide 100 text
Why Build a Database?
Slide 101
Slide 101 text
Data Model
Slide 102
Slide 102 text
Performance
Data Model
Slide 103
Slide 103 text
Performance
Scaling
Data Model
Slide 104
Slide 104 text
The Data Model
Slide 105
Slide 105 text
The Data Model
Relational databases look like this:
Slide 106
Slide 106 text
The Data Model
Relational databases look like this:
ID name gender state
1 Bob M CO
2 Susy F CA
users
Slide 107
Slide 107 text
The Data Model
Relational databases look like this:
ID name gender state
1 Bob M CO
2 Susy F CA
users
User ID url date
1 /home ...
1 /signup ...
2 /home ...
page_views
Slide 108
Slide 108 text
The Data Model
Logs look like this:
Slide 109
Slide 109 text
The Data Model
Logs look like this:
User ID url date
1 /home ...
2 /home ...
2 /signup ...
1 /product/123 ...
3 /login ...
1 /checkout ...
Slide 110
Slide 110 text
The Data Model
Logs look like this:
User ID url date
1 /home ...
2 /home ...
2 /signup ...
1 /product/123 ...
3 /login ...
1 /checkout ...
Slide 111
Slide 111 text
The Data Model
We think of behavior in terms of timelines
Slide 112
Slide 112 text
The Data Model
We think of behavior in terms of timelines
Bob
Slide 113
Slide 113 text
The Data Model
We think of behavior in terms of timelines
Bob
/home
Slide 114
Slide 114 text
The Data Model
We think of behavior in terms of timelines
Bob
/home /product/123
Slide 115
Slide 115 text
The Data Model
We think of behavior in terms of timelines
Bob
/home /product/123 /checkout
Slide 116
Slide 116 text
The Data Model
We think of behavior in terms of timelines
Bob
/home /product/123 /checkout
Susy
Slide 117
Slide 117 text
The Data Model
We think of behavior in terms of timelines
Bob
/home /product/123 /checkout
Susy
/signup
Slide 118
Slide 118 text
The Data Model
We think of behavior in terms of timelines
Bob
/home /product/123 /checkout
Susy
/signup /welcome
Slide 119
Slide 119 text
The Data Model
So that’s how Sky stores it.
Bob
/home /product/123 /checkout
Susy
/signup /welcome
Slide 120
Slide 120 text
Performance
Slide 121
Slide 121 text
Organize data how you’ll analyze it
Performance
Slide 122
Slide 122 text
Organize data how you’ll analyze it
Performance
Minimize data movement
Slide 123
Slide 123 text
Organize data how you’ll analyze it
Performance
Minimize data movement
Tightly pack data (so more stays in memory)
Slide 124
Slide 124 text
Expect to query about
Performance
10,000,000 #v#!%$ / -*r# / $#-*!"
Slide 125
Slide 125 text
Expect to query about
Performance
10,000,000 #v#!%$ / -*r# / $#-*!"
(with linear scaling)
Slide 126
Slide 126 text
Scaling
Slide 127
Slide 127 text
Timelines are analyzed in isolation
Scaling
Slide 128
Slide 128 text
Timelines are analyzed in isolation
Automatically shards data across cores
Scaling
Slide 129
Slide 129 text
Timelines are analyzed in isolation
Automatically shards data across cores
Automatically shards across nodes (v0.3.1)
Scaling
Slide 130
Slide 130 text
S/+ I!%#r!&,$
A quick peek under the hood
Slide 131
Slide 131 text
S/+ I!%#r!&,$
Slide 132
Slide 132 text
S/+ I!%#r!&,$
Recently ported to Go (originally C)
Slide 133
Slide 133 text
S/+ I!%#r!&,$
Recently ported to Go (originally C)
Query Engine still written in C
Slide 134
Slide 134 text
S/+ I!%#r!&,$
Recently ported to Go (originally C)
Query Engine still written in C
Storage handled with LevelDB
Slide 135
Slide 135 text
S/+ I!%#r!&,$
Recently ported to Go (originally C)
Query Engine still written in C
Storage handled with LevelDB
RESTful JSON over HTTP protocol
Slide 136
Slide 136 text
S/+ I!%#r!&,$
Loose Schema, No Limit On “Columns”
Slide 137
Slide 137 text
S/+ I!%#r!&,$
Loose Schema, No Limit On “Columns”
Types: Strings, Integers, Floats & Booleans
Slide 138
Slide 138 text
S/+ I!%#r!&,$
Loose Schema, No Limit On “Columns”
Types: Strings, Integers, Floats & Booleans
Possible Future Types: Map, Array, Dates, Lat/Long
Slide 139
Slide 139 text
S/+ I!%#r!&,$
Loose Schema, No Limit On “Columns”
Types: Strings, Integers, Floats & Booleans
Possible Future Types: Map, Array, Dates, Lat/Long
Flexible Query System Built On LuaJIT
Slide 140
Slide 140 text
O%)#r R&" S%0ff
Slide 141
Slide 141 text
O%)#r R&" S%0ff
(C*.'!( S**!)
Slide 142
Slide 142 text
SQL-like Query Language
(Currently uses a JSON query interface)
Slide 143
Slide 143 text
Multi-node Distribution
(coming in next version, v0.3.1)
Slide 144
Slide 144 text
Predictive Behavioral
Analytics
Slide 145
Slide 145 text
Landmark
Slide 146
Slide 146 text
Landmark
Hosted Behavioral Analytics
Slide 147
Slide 147 text
Landmark
Hosted Behavioral Analytics
Sign up at http://landmark.io
Slide 148
Slide 148 text
Demo
Slide 149
Slide 149 text
Questions?
http://skydb.io
Google Group: skydb
Twitter: @benbjohnson
[email protected]
Slide 150
Slide 150 text
Image Attribution
Database designed by Sergey Shmidt from The Noun Project
Line Graph designed by Cris Dobbins from The Noun Project
Computer designed by Olivier Guin from The Noun Project
Database designed by Cees de Vries from The Noun Project
Document designed by Piotrek Chuchla from The Noun Project
Worker designed by Bart Laugs from The Noun Project
People designed by Studio Het Mes from The Noun Project
Clock designed by Infinity Kim from The Noun Project
Question designed by Greg Pabst from The Noun Project
Hammer designed by John Caserta from The Noun Project
Rocket designed by James Fenton from The Noun Project
Expand designed by Dmitry Baranovskiy from The Noun Project
Rock n Roll designed by Cengiz SARI from The Noun Project
Robot designed by Simon Child from The Noun Project
Profanity designed by Juan Pablo Bravo from The Noun Project