$30 off During Our Annual Pro Sale. View Details »

Behavioral Analytics - Understanding the "why" and "how" of your users.

Behavioral Analytics - Understanding the "why" and "how" of your users.

The slides for my talk at the Boulder/Denver Big Data Meetup on April 24th, 2013.

Video of the talk can be found on YouTube: http://www.youtube.com/watch?v=STtjSm-5A-8

benbjohnson

April 19, 2013
Tweet

More Decks by benbjohnson

Other Decks in Technology

Transcript

  1. Behavioral Analytics
    U!"#r$%&!"'!( %)# H*w & W)+ *f U$#r$
    Ben Johnson, Skyland Labs

    View Slide

  2. My Background

    View Slide

  3. Former Oracle DBA
    Data Visualization
    Behavioral Analytics

    View Slide

  4. W)&% '$
    B#)&v'*r&, A!&,+%'-$?

    View Slide

  5. W)&% '$
    B#)&v'*r&, A!&,+%'-$?
    Let’s start with an example.

    View Slide

  6. Our Example:
    You are a SaaS company
    with a web app.

    View Slide

  7. Our Example:
    You are a SaaS company
    with a web app.
    SQL database?

    View Slide

  8. Our Example:
    You are a SaaS company
    with a web app.
    SQL database?
    check.

    View Slide

  9. Our Example:
    You are a SaaS company
    with a web app.
    SQL database?
    check.
    Tons of Logs?

    View Slide

  10. Our Example:
    You are a SaaS company
    with a web app.
    SQL database?
    check.
    Tons of Logs?
    hell yeah!

    View Slide

  11. Let’s use SQL to
    understand our users!

    View Slide

  12. What are users doing?

    View Slide

  13. What are users doing?
    SELECT url, count(*)
    FROM page_views
    GROUP BY url;

    View Slide

  14. What are users doing?
    0
    12.5
    25
    37.5
    50
    /index.html /about.html /pricing.html /signup.html

    View Slide

  15. Who is doing it?

    View Slide

  16. Who is doing it?
    SELECT url, gender, count(*)
    FROM page_views
    INNER JOIN users
    GROUP BY url, gender;

    View Slide

  17. Who is doing it?
    0
    15
    30
    45
    60
    /index.html /about.html /pricing.html /signup.html
    Men Women

    View Slide

  18. Where are they doing it?

    View Slide

  19. Where are they doing it?
    SELECT url, city, count(*)
    FROM page_views
    INNER JOIN users
    GROUP BY url, city;

    View Slide

  20. Where are they doing it?
    0
    15
    30
    45
    60
    /index.html /about.html /pricing.html /signup.html
    San Francisco Denver London

    View Slide

  21. When are they doing it?

    View Slide

  22. When are they doing it?
    SELECT url,
    DATE_FORMAT(‘%Y-%m),
    count(*)
    FROM page_views
    GROUP BY url, ...;

    View Slide

  23. When are they doing it?
    0
    10
    20
    30
    40
    /index.html /about.html /pricing.html /signup.html
    Jan 2013 Feb 2013
    Mar 2013 April 2013

    View Slide

  24. SQL works great
    for these questions:

    View Slide

  25. SQL works great
    for these questions:
    WHAT

    View Slide

  26. SQL works great
    for these questions:
    WHAT WHO

    View Slide

  27. SQL works great
    for these questions:
    WHO
    WHEN
    WHAT

    View Slide

  28. SQL works great
    for these questions:
    WHEN WHERE
    WHO
    WHAT

    View Slide

  29. SQL analyzes state.

    View Slide

  30. How do we understand actions?

    View Slide

  31. How are they doing it?

    View Slide

  32. How are they doing it?
    “H*w” .#&!$ /!*w'!( w)&%
    )&pp#!#" "0r'!( &! &-%'*! *r %&$/.

    View Slide

  33. How are they doing it?
    “H*w” .#&!$ /!*w'!( w)&%
    )&pp#!#" "0r'!( &! &-%'*! *r %&$/.
    1&.p,#:
    H*w "* 0$#r$ -)#-/*0%?

    View Slide

  34. How are they doing it?
    SELECT t1.url, t2.url AS next_url, count(*)
    FROM page_views t1
    LEFT JOIN (
    SELECT url
    FROM page_views
    WHERE id > t1.id AND user_id = t1.user_id
    ORDER BY id ASC
    LIMIT 1) AS t2
    WHERE t1.url = ‘/checkout.html’
    GROUP BY t1.url, t2.url;

    View Slide

  35. How are they doing it?
    0
    37.5
    75
    112.5
    150
    /enter_cc_info.html /product.html /index.html NULL

    View Slide

  36. How are they doing it?
    0
    37.5
    75
    112.5
    150
    /enter_cc_info.html /product.html /index.html NULL
    Ab&!"*!#" S)*pp'!( C&r%$!

    View Slide

  37. Why are they doing it?

    View Slide

  38. Why are they doing it?
    “W)+” .#&!$ /!*w'!( w)&%
    -&.# b#f*r# &! &-%'*!.

    View Slide

  39. Why are they doing it?
    “W)+” .#&!$ /!*w'!( w)&%
    -&.# b#f*r# &! &-%'*!.
    1&.p,#:
    W)+ "* 0$#r$ -&!-#, &--*0!%$?

    View Slide

  40. Why are they doing it?
    SELECT t1.url, t2.url AS prev_url, count(*)
    FROM page_views t1
    LEFT JOIN (
    SELECT url
    FROM page_views
    WHERE id < t1.id AND user_id = t1.user_id
    ORDER BY id DESC
    LIMIT 1) AS t2
    WHERE t1.url = ‘/cancel_account.html’
    GROUP BY t1.url, t2.url;

    View Slide

  41. Why are they doing it?
    0
    20
    40
    60
    80
    /help.html /index.html /404.html /contact_us.html

    View Slide

  42. Why are they doing it?
    0
    20
    40
    60
    80
    /help.html /index.html /404.html /contact_us.html
    Y*0r “)#,p” p&(#$ "*!’% )#,p.

    View Slide

  43. Why are they doing it?
    0
    20
    40
    60
    80
    /help.html /index.html /404.html /contact_us.html
    U$#r$ -*.'!( %* +*0r $'%# %* q0'%

    View Slide

  44. Why are they doing it?
    0
    20
    40
    60
    80
    /help.html /index.html /404.html /contact_us.html
    Y*0r w#b $'%# '$ br*/#!

    View Slide

  45. Why are they doing it?
    0
    20
    40
    60
    80
    /help.html /index.html /404.html /contact_us.html
    Y*0r -0$%*.#r $0pp*r% $0-/$.

    View Slide

  46. B#)&v'*r&, &!&,+%'-$ '$ &b*0%
    0!"#r$%&!"'!( %)# “)*w”
    &!" %)# “w)+”

    View Slide

  47. Now let’s use Hadoop to
    understand our users

    View Slide

  48. Building Funnel Analysis in Hadoop

    View Slide

  49. Building Funnel Analysis in Hadoop
    Step 1: Parse logs

    View Slide

  50. Building Funnel Analysis in Hadoop
    Step 1: Parse logs
    Step 2: Group by user

    View Slide

  51. Building Funnel Analysis in Hadoop
    Step 1: Parse logs
    Step 2: Group by user
    Step 3: Sessionize

    View Slide

  52. Building Funnel Analysis in Hadoop
    Step 1: Parse logs
    Step 2: Group by user
    Step 3: Sessionize
    Step 4: Apply Pattern Matching

    View Slide

  53. Building Funnel Analysis in Hadoop
    Step 1: Parse logs
    Step 2: Group by user
    Step 3: Sessionize
    Step 4: Apply Pattern Matching
    Step 5: Aggregate

    View Slide

  54. Building Funnel Analysis in Hadoop
    Step 1: Parse logs
    Step 2: Group by user
    Step 3: Sessionize
    Step 4: Apply Pattern Matching
    Step 5: Aggregate
    Step 6: Pull hair repeatedly.

    View Slide

  55. Why not make a system &
    language for describing
    behavior?

    View Slide

  56. Behavior is...

    View Slide

  57. A-%'*!$

    View Slide

  58. A-%'*!$ & S%&%#

    View Slide

  59. A-%'*!$ + 2.#
    & S%&%#

    View Slide

  60. You need a way to describe time:

    View Slide

  61. “WHEN”

    View Slide

  62. “WHEN”
    Let’s see some examples

    View Slide

  63. Simple Hit Count
    H*w .&!+ p#*p,# -,'-/ *! .+ )*.# p&(#?

    View Slide

  64. Simple Hit Count
    WHEN action == ‘/index.html’ THEN
    SELECT count();
    END
    H*w .&!+ p#*p,# -,'-/ *! .+ )*.# p&(#?

    View Slide

  65. Simple Hit Count
    H*w .&!+ p#*p,# -,'-/ *! .+ )*.# p&(#?
    0
    25
    50
    75
    100
    /index.html

    View Slide

  66. Simple Hit Count w/ Demographics
    H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?

    View Slide

  67. Simple Hit Count w/ Demographics
    WHEN action == ‘/index.html’ THEN
    SELECT count() GROUP BY gender;
    END
    H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?

    View Slide

  68. Simple Hit Count w/ Demographics
    0
    25
    50
    75
    100
    Men Women
    H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?

    View Slide

  69. Answering the “How”
    H*w "* 0$#r$ -)#-/*0%?

    View Slide

  70. Answering the “How”
    WHEN action == ‘/checkout.html’ THEN
    WHEN WITHIN 1 STEP THEN
    SELECT count() GROUP BY action;
    END
    END
    H*w "* 0$#r$ -)#-/*0%?

    View Slide

  71. 0
    25
    50
    75
    100
    /enter_cc.html /product.html /index.html NULL
    Answering the “How”
    H*w "* 0$#r$ -)#-/*0%?

    View Slide

  72. Answering the “How”
    H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?

    View Slide

  73. Answering the “How”
    WHEN action == ‘/checkout.html’ THEN
    WHEN action == ‘/enter_cc.html’ WITHIN 1 STEP
    THEN
    WHEN WITHIN 1 STEP THEN
    SELECT count() GROUP BY action;
    END
    END
    END
    H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?

    View Slide

  74. 0
    25
    50
    75
    100
    /confirm.html NULL
    Answering the “How”
    H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?

    View Slide

  75. Combine these to make a funnel:

    View Slide

  76. F0!!#, Q0#r+:

    View Slide

  77. WHEN action == ‘/checkout.html’ THEN
    SELECT count() INTO “step0”;
    WHEN action == ‘/enter_cc.html’ WITHIN 1 STEP
    THEN
    SELECT count() INTO “step1”;
    WHEN action == ‘/confirm.html’ WITHIN 1 STEP
    THEN
    SELECT count() INTO “step2”;
    END
    END
    F0!!#, Q0#r+:

    View Slide

  78. F0!!#, O0%p0%:

    View Slide

  79. {
    “step0”:{“count”:100},
    “step1”:{“count”:40},
    “step2”:{“count”:32}
    }
    F0!!#, O0%p0%:

    View Slide

  80. 0
    25
    50
    75
    100
    /checkout.html /enter_cc.html /confirm.html
    Answering the “How”
    H*w "* 0$#r$ -)#-/*0%?
    (step0) (step1) (step2)

    View Slide

  81. 0
    25
    50
    75
    100
    /checkout.html /enter_cc.html /confirm.html
    Answering the “How”
    H*w "* 0$#r$ -)#-/*0%?
    (step0) (step1) (step2)

    View Slide

  82. 0
    25
    50
    75
    100
    /checkout.html /enter_cc.html /confirm.html
    Answering the “How”
    H*w "* 0$#r$ -)#-/*0%?
    (step0) (step1) (step2)

    View Slide

  83. Answering the “Why”
    W)+ "* 0$#r$ -&!-#, &--*0!%$?

    View Slide

  84. Answering the “Why”
    WHEN action == ‘/cancel_account’ THEN
    WHEN WITHIN -1 STEP THEN
    SELECT count() GROUP BY action;
    END
    END
    W)+ "* 0$#r$ -&!-#, &--*0!%$?

    View Slide

  85. Answering the “Why”
    WHEN action == ‘/cancel_account’ THEN
    WHEN WITHIN -1 STEP THEN
    SELECT count() GROUP BY action;
    END
    END
    W)+ "* 0$#r$ -&!-#, &--*0!%$?

    View Slide

  86. What can we do with “within”?

    View Slide

  87. Within +/- n Steps

    View Slide

  88. Within 1 Step

    View Slide

  89. Within 2 Steps

    View Slide

  90. Within 1..3 Steps

    View Slide

  91. Within -1 Steps

    View Slide

  92. Within -2 Steps

    View Slide

  93. Within +/- Sessions

    View Slide

  94. Within +/- Sessions

    View Slide

  95. Within 0 Sessions
    (current session only)

    View Slide

  96. Within 1 Session

    View Slide

  97. Within -1 Session

    View Slide

  98. W'%)'! &,$* w*r/$ *! "&%#$
    Days, Weeks, Months, etc.

    View Slide

  99. S/+
    An open source, behavioral analytics database.

    View Slide

  100. Why Build a Database?

    View Slide

  101. Data Model

    View Slide

  102. Performance
    Data Model

    View Slide

  103. Performance
    Scaling
    Data Model

    View Slide

  104. The Data Model

    View Slide

  105. The Data Model
    Relational databases look like this:

    View Slide

  106. The Data Model
    Relational databases look like this:
    ID name gender state
    1 Bob M CO
    2 Susy F CA
    users

    View Slide

  107. The Data Model
    Relational databases look like this:
    ID name gender state
    1 Bob M CO
    2 Susy F CA
    users
    User ID url date
    1 /home ...
    1 /signup ...
    2 /home ...
    page_views

    View Slide

  108. The Data Model
    Logs look like this:

    View Slide

  109. The Data Model
    Logs look like this:
    User ID url date
    1 /home ...
    2 /home ...
    2 /signup ...
    1 /product/123 ...
    3 /login ...
    1 /checkout ...

    View Slide

  110. The Data Model
    Logs look like this:
    User ID url date
    1 /home ...
    2 /home ...
    2 /signup ...
    1 /product/123 ...
    3 /login ...
    1 /checkout ...

    View Slide

  111. The Data Model
    We think of behavior in terms of timelines

    View Slide

  112. The Data Model
    We think of behavior in terms of timelines
    Bob

    View Slide

  113. The Data Model
    We think of behavior in terms of timelines
    Bob
    /home

    View Slide

  114. The Data Model
    We think of behavior in terms of timelines
    Bob
    /home /product/123

    View Slide

  115. The Data Model
    We think of behavior in terms of timelines
    Bob
    /home /product/123 /checkout

    View Slide

  116. The Data Model
    We think of behavior in terms of timelines
    Bob
    /home /product/123 /checkout
    Susy

    View Slide

  117. The Data Model
    We think of behavior in terms of timelines
    Bob
    /home /product/123 /checkout
    Susy
    /signup

    View Slide

  118. The Data Model
    We think of behavior in terms of timelines
    Bob
    /home /product/123 /checkout
    Susy
    /signup /welcome

    View Slide

  119. The Data Model
    So that’s how Sky stores it.
    Bob
    /home /product/123 /checkout
    Susy
    /signup /welcome

    View Slide

  120. Performance

    View Slide

  121. Organize data how you’ll analyze it
    Performance

    View Slide

  122. Organize data how you’ll analyze it
    Performance
    Minimize data movement

    View Slide

  123. Organize data how you’ll analyze it
    Performance
    Minimize data movement
    Tightly pack data (so more stays in memory)

    View Slide

  124. Expect to query about
    Performance
    10,000,000 #v#!%$ / -*r# / $#-*!"

    View Slide

  125. Expect to query about
    Performance
    10,000,000 #v#!%$ / -*r# / $#-*!"
    (with linear scaling)

    View Slide

  126. Scaling

    View Slide

  127. Timelines are analyzed in isolation
    Scaling

    View Slide

  128. Timelines are analyzed in isolation
    Automatically shards data across cores
    Scaling

    View Slide

  129. Timelines are analyzed in isolation
    Automatically shards data across cores
    Automatically shards across nodes (v0.3.1)
    Scaling

    View Slide

  130. S/+ I!%#r!&,$
    A quick peek under the hood

    View Slide

  131. S/+ I!%#r!&,$

    View Slide

  132. S/+ I!%#r!&,$
    Recently ported to Go (originally C)

    View Slide

  133. S/+ I!%#r!&,$
    Recently ported to Go (originally C)
    Query Engine still written in C

    View Slide

  134. S/+ I!%#r!&,$
    Recently ported to Go (originally C)
    Query Engine still written in C
    Storage handled with LevelDB

    View Slide

  135. S/+ I!%#r!&,$
    Recently ported to Go (originally C)
    Query Engine still written in C
    Storage handled with LevelDB
    RESTful JSON over HTTP protocol

    View Slide

  136. S/+ I!%#r!&,$
    Loose Schema, No Limit On “Columns”

    View Slide

  137. S/+ I!%#r!&,$
    Loose Schema, No Limit On “Columns”
    Types: Strings, Integers, Floats & Booleans

    View Slide

  138. S/+ I!%#r!&,$
    Loose Schema, No Limit On “Columns”
    Types: Strings, Integers, Floats & Booleans
    Possible Future Types: Map, Array, Dates, Lat/Long

    View Slide

  139. S/+ I!%#r!&,$
    Loose Schema, No Limit On “Columns”
    Types: Strings, Integers, Floats & Booleans
    Possible Future Types: Map, Array, Dates, Lat/Long
    Flexible Query System Built On LuaJIT

    View Slide

  140. O%)#r R&" S%0ff

    View Slide

  141. O%)#r R&" S%0ff
    (C*.'!( S**!)

    View Slide

  142. SQL-like Query Language
    (Currently uses a JSON query interface)

    View Slide

  143. Multi-node Distribution
    (coming in next version, v0.3.1)

    View Slide

  144. Predictive Behavioral
    Analytics

    View Slide

  145. Landmark

    View Slide

  146. Landmark
    Hosted Behavioral Analytics

    View Slide

  147. Landmark
    Hosted Behavioral Analytics
    Sign up at http://landmark.io

    View Slide

  148. Demo

    View Slide

  149. Questions?
    http://skydb.io
    Google Group: skydb
    Twitter: @benbjohnson
    [email protected]

    View Slide

  150. Image Attribution
    Database designed by Sergey Shmidt from The Noun Project
    Line Graph designed by Cris Dobbins from The Noun Project
    Computer designed by Olivier Guin from The Noun Project
    Database designed by Cees de Vries from The Noun Project
    Document designed by Piotrek Chuchla from The Noun Project
    Worker designed by Bart Laugs from The Noun Project
    People designed by Studio Het Mes from The Noun Project
    Clock designed by Infinity Kim from The Noun Project
    Question designed by Greg Pabst from The Noun Project
    Hammer designed by John Caserta from The Noun Project
    Rocket designed by James Fenton from The Noun Project
    Expand designed by Dmitry Baranovskiy from The Noun Project
    Rock n Roll designed by Cengiz SARI from The Noun Project
    Robot designed by Simon Child from The Noun Project
    Profanity designed by Juan Pablo Bravo from The Noun Project

    View Slide