Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Behavioral Analytics - Understanding the "why" and "how" of your users.

Behavioral Analytics - Understanding the "why" and "how" of your users.

The slides for my talk at the Boulder/Denver Big Data Meetup on April 24th, 2013.

Video of the talk can be found on YouTube: http://www.youtube.com/watch?v=STtjSm-5A-8

benbjohnson

April 19, 2013
Tweet

More Decks by benbjohnson

Other Decks in Technology

Transcript

  1. Our Example: You are a SaaS company with a web

    app. SQL database? check. Tons of Logs?
  2. Our Example: You are a SaaS company with a web

    app. SQL database? check. Tons of Logs? hell yeah!
  3. What are users doing? 0 12.5 25 37.5 50 /index.html

    /about.html /pricing.html /signup.html
  4. Who is doing it? SELECT url, gender, count(*) FROM page_views

    INNER JOIN users GROUP BY url, gender;
  5. Who is doing it? 0 15 30 45 60 /index.html

    /about.html /pricing.html /signup.html Men Women
  6. Where are they doing it? SELECT url, city, count(*) FROM

    page_views INNER JOIN users GROUP BY url, city;
  7. Where are they doing it? 0 15 30 45 60

    /index.html /about.html /pricing.html /signup.html San Francisco Denver London
  8. When are they doing it? 0 10 20 30 40

    /index.html /about.html /pricing.html /signup.html Jan 2013 Feb 2013 Mar 2013 April 2013
  9. How are they doing it? “H*w” .#&!$ /!*w'!( w)&% )&pp#!#"

    "0r'!( &! &-%'*! *r %&$/. 1&.p,#: H*w "* 0$#r$ -)#-/*0%?
  10. How are they doing it? SELECT t1.url, t2.url AS next_url,

    count(*) FROM page_views t1 LEFT JOIN ( SELECT url FROM page_views WHERE id > t1.id AND user_id = t1.user_id ORDER BY id ASC LIMIT 1) AS t2 WHERE t1.url = ‘/checkout.html’ GROUP BY t1.url, t2.url;
  11. How are they doing it? 0 37.5 75 112.5 150

    /enter_cc_info.html /product.html /index.html NULL
  12. How are they doing it? 0 37.5 75 112.5 150

    /enter_cc_info.html /product.html /index.html NULL Ab&!"*!#" S)*pp'!( C&r%$!
  13. Why are they doing it? “W)+” .#&!$ /!*w'!( w)&% -&.#

    b#f*r# &! &-%'*!. 1&.p,#: W)+ "* 0$#r$ -&!-#, &--*0!%$?
  14. Why are they doing it? SELECT t1.url, t2.url AS prev_url,

    count(*) FROM page_views t1 LEFT JOIN ( SELECT url FROM page_views WHERE id < t1.id AND user_id = t1.user_id ORDER BY id DESC LIMIT 1) AS t2 WHERE t1.url = ‘/cancel_account.html’ GROUP BY t1.url, t2.url;
  15. Why are they doing it? 0 20 40 60 80

    /help.html /index.html /404.html /contact_us.html
  16. Why are they doing it? 0 20 40 60 80

    /help.html /index.html /404.html /contact_us.html Y*0r “)#,p” p&(#$ "*!’% )#,p.
  17. Why are they doing it? 0 20 40 60 80

    /help.html /index.html /404.html /contact_us.html U$#r$ -*.'!( %* +*0r $'%# %* q0'%
  18. Why are they doing it? 0 20 40 60 80

    /help.html /index.html /404.html /contact_us.html Y*0r w#b $'%# '$ br*/#!
  19. Why are they doing it? 0 20 40 60 80

    /help.html /index.html /404.html /contact_us.html Y*0r -0$%*.#r $0pp*r% $0-/$.
  20. Building Funnel Analysis in Hadoop Step 1: Parse logs Step

    2: Group by user Step 3: Sessionize Step 4: Apply Pattern Matching
  21. Building Funnel Analysis in Hadoop Step 1: Parse logs Step

    2: Group by user Step 3: Sessionize Step 4: Apply Pattern Matching Step 5: Aggregate
  22. Building Funnel Analysis in Hadoop Step 1: Parse logs Step

    2: Group by user Step 3: Sessionize Step 4: Apply Pattern Matching Step 5: Aggregate Step 6: Pull hair repeatedly.
  23. Simple Hit Count H*w .&!+ p#*p,# -,'-/ *! .+ )*.#

    p&(#? 0 25 50 75 100 /index.html
  24. Simple Hit Count w/ Demographics WHEN action == ‘/index.html’ THEN

    SELECT count() GROUP BY gender; END H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?
  25. Simple Hit Count w/ Demographics 0 25 50 75 100

    Men Women H*w .&!+ .#! & w*.#! -,'-/ *! .+ )*.# p&(#?
  26. Answering the “How” WHEN action == ‘/checkout.html’ THEN WHEN WITHIN

    1 STEP THEN SELECT count() GROUP BY action; END END H*w "* 0$#r$ -)#-/*0%?
  27. Answering the “How” WHEN action == ‘/checkout.html’ THEN WHEN action

    == ‘/enter_cc.html’ WITHIN 1 STEP THEN WHEN WITHIN 1 STEP THEN SELECT count() GROUP BY action; END END END H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?
  28. 0 25 50 75 100 /confirm.html NULL Answering the “How”

    H*w "* 0$#r$ -)#-/*0% (%w* $%#p$ '!)?
  29. WHEN action == ‘/checkout.html’ THEN SELECT count() INTO “step0”; WHEN

    action == ‘/enter_cc.html’ WITHIN 1 STEP THEN SELECT count() INTO “step1”; WHEN action == ‘/confirm.html’ WITHIN 1 STEP THEN SELECT count() INTO “step2”; END END F0!!#, Q0#r+:
  30. 0 25 50 75 100 /checkout.html /enter_cc.html /confirm.html Answering the

    “How” H*w "* 0$#r$ -)#-/*0%? (step0) (step1) (step2)
  31. 0 25 50 75 100 /checkout.html /enter_cc.html /confirm.html Answering the

    “How” H*w "* 0$#r$ -)#-/*0%? (step0) (step1) (step2)
  32. 0 25 50 75 100 /checkout.html /enter_cc.html /confirm.html Answering the

    “How” H*w "* 0$#r$ -)#-/*0%? (step0) (step1) (step2)
  33. Answering the “Why” WHEN action == ‘/cancel_account’ THEN WHEN WITHIN

    -1 STEP THEN SELECT count() GROUP BY action; END END W)+ "* 0$#r$ -&!-#, &--*0!%$?
  34. Answering the “Why” WHEN action == ‘/cancel_account’ THEN WHEN WITHIN

    -1 STEP THEN SELECT count() GROUP BY action; END END W)+ "* 0$#r$ -&!-#, &--*0!%$?
  35. The Data Model Relational databases look like this: ID name

    gender state 1 Bob M CO 2 Susy F CA users
  36. The Data Model Relational databases look like this: ID name

    gender state 1 Bob M CO 2 Susy F CA users User ID url date 1 /home ... 1 /signup ... 2 /home ... page_views
  37. The Data Model Logs look like this: User ID url

    date 1 /home ... 2 /home ... 2 /signup ... 1 /product/123 ... 3 /login ... 1 /checkout ...
  38. The Data Model Logs look like this: User ID url

    date 1 /home ... 2 /home ... 2 /signup ... 1 /product/123 ... 3 /login ... 1 /checkout ...
  39. The Data Model We think of behavior in terms of

    timelines Bob /home /product/123
  40. The Data Model We think of behavior in terms of

    timelines Bob /home /product/123 /checkout
  41. The Data Model We think of behavior in terms of

    timelines Bob /home /product/123 /checkout Susy
  42. The Data Model We think of behavior in terms of

    timelines Bob /home /product/123 /checkout Susy /signup
  43. The Data Model We think of behavior in terms of

    timelines Bob /home /product/123 /checkout Susy /signup /welcome
  44. The Data Model So that’s how Sky stores it. Bob

    /home /product/123 /checkout Susy /signup /welcome
  45. Timelines are analyzed in isolation Automatically shards data across cores

    Automatically shards across nodes (v0.3.1) Scaling
  46. S/+ I!%#r!&,$ Recently ported to Go (originally C) Query Engine

    still written in C Storage handled with LevelDB
  47. S/+ I!%#r!&,$ Recently ported to Go (originally C) Query Engine

    still written in C Storage handled with LevelDB RESTful JSON over HTTP protocol
  48. S/+ I!%#r!&,$ Loose Schema, No Limit On “Columns” Types: Strings,

    Integers, Floats & Booleans Possible Future Types: Map, Array, Dates, Lat/Long
  49. S/+ I!%#r!&,$ Loose Schema, No Limit On “Columns” Types: Strings,

    Integers, Floats & Booleans Possible Future Types: Map, Array, Dates, Lat/Long Flexible Query System Built On LuaJIT
  50. Image Attribution Database designed by Sergey Shmidt from The Noun

    Project Line Graph designed by Cris Dobbins from The Noun Project Computer designed by Olivier Guin from The Noun Project Database designed by Cees de Vries from The Noun Project Document designed by Piotrek Chuchla from The Noun Project Worker designed by Bart Laugs from The Noun Project People designed by Studio Het Mes from The Noun Project Clock designed by Infinity Kim from The Noun Project Question designed by Greg Pabst from The Noun Project Hammer designed by John Caserta from The Noun Project Rocket designed by James Fenton from The Noun Project Expand designed by Dmitry Baranovskiy from The Noun Project Rock n Roll designed by Cengiz SARI from The Noun Project Robot designed by Simon Child from The Noun Project Profanity designed by Juan Pablo Bravo from The Noun Project