Frank van Lankvelt - HGT14

590cb146b1dd6a6410b31436ee689d3a?s=47 Hippo CMS
June 06, 2014
60

Frank van Lankvelt - HGT14

Frank van Lankvelt on Hippo's Big Data experiments.

Presented at the Hippo GetTogether, Hippo's famous annual developer conference.
Please see http://onehippo.com for more information

590cb146b1dd6a6410b31436ee689d3a?s=128

Hippo CMS

June 06, 2014
Tweet

Transcript

  1. 1.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Big Data @ Hippo Hippo GetTogether 2014 - Trouw Frank van Lankvelt follow the Hippo trail
  2. 2.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Co-occurrence Relating Attributes
  3. 4.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Contingency Table A not A B x 20 - x 20 not B 40 - x 140 + x 180 40 160 200 Documents A, B total # visitors visitors of B visitors of A x P(x >= 8) ≈ 3% visitors of A & B
  4. 5.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Co-occurrence Insights Insight: a high cohesion of page visits in the partner section standing out from the regular ‘.com’ visitor cluster suggests that visitors looking for a partner go through every single page and probably can’t find what they’re looking for. Action: Hippo suggests to improve navigation, search or filtering. • attribute / url relatedness find partner /fr .com .org generic release notes
  5. 6.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Recommendations Alice Bob Charlie Star Wars 3 4 Finding Nemo 3 4 Sound of Music 5 1 2 genre stars Star Wars sci-fi Portman Finding Nemo animation DeGeneres Sound of Music musical Andrews user - item (rating) collaborative filtering content (meta) data which documents are interesting for ME? find docs similar to visited documents find docs co-occurring with visited documents
  6. 7.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Implementation combine in search index: Recommendation Query Content-based: (meta) data Collaborative Filtering: co-occurrence
  7. 9.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Recommended For You 1. Collect ID of viewed content 2. Calculate co-occurrences 3. Index, along with content ◦ IDs of co-viewed documents 4. Search with recent IDs, similarity 5. Repeat with other collected data
  8. 10.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Patterns Beyond Co-occurrence
  9. 11.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Patterns in the Data customers that buy diapers often buy beer as well (young dads rewarding themselves?)
  10. 12.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Itemsets Rules Find the patterns (association rule mining): 1. sets of items that are bought together P(beer,diapers) > 1% (support) 2. subsets that are good predictors > 4 (lift) P(beer,diapers) P(beer) P(diapers)
  11. 13.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo http://www.onehippo.com/en/thankyou - Thank You Beer? Diapers? Conversions!!!
  12. 14.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo http://www.onehippo.com/en/thankyou • will a visitor go there? P(conversion|request log) • what are the relevant “signals”? • which configuration performs best?
  13. 15.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Patterns For Conversion single item: • referrer www.google.com pattern/itemset: • visited demo • 2014 week 4 correlations
  14. 17.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo 1. Build Frequent Prefix Tree (FPGrowth) 2. Extract patterns relevant for conversion (using contingencies) Finding Frequent Itemsets
  15. 18.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Pattern Contingency Table converted not converted pattern matches pattern does not match converted • visited /thankyou sample pattern • visited demo • in 2014 week 4
  16. 19.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Sub-Pattern Filtering Problem: when pattern (A, B, C) is relevant, patterns (A), (B), (C), (A, B), (A, C), (B, C) (likely) also match. E.g. with C meta-data on page B. Solution: test for independence using contingency!
  17. 20.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Actionable Insights? The found itemsets are quite numerous and seem to contain a lot of redundancy. But they are certainly interesting, e.g. for a periodic evaluation.
  18. 21.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Personalization Putting Patterns to Use
  19. 22.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Naive A/B Testing The naive solution: • route some traffic to alternative configuration ◦ A (old config): 80% ◦ B (new config): 20% • run for some time • see if B has relatively more conversions
  20. 23.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Problems With Naive Solution • if B is drastically worse, 20% of traffic is LOST • marketer must regularly check and decide ◦ when has a new config PROVEN itself? • number of concurrent experiments is LOW • no user context
  21. 25.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Predict Conversion Conversion rate depends on context: x the patterns w the “weights” ϕ cdf of normal dist.
  22. 26.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Experimental Setup Split data set (.org + .com) 1. training set 189660 visitors, 435 conversions 2. test set 27013 visitors, 40 conversions
  23. 27.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Can We Predict Conversion? • 1260 itemsets • ROC curve TPR versus FPR @ false positive rate 10% : 96% true positive rate
  24. 28.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Towards Actionable Insights Use A utomatic R elevance D etermination to prune the patterns (optimize the prior) σ μ relevant irrelevant weights (w)
  25. 29.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Top 20 Patterns For Conversion referer.go.onehippo.com .pathInfo./resources/whitepapers/forrester-market- overview-web-content-management-systems.html .pathInfo./resources/whitepapers/cms---a-critical- solution-for-todays-ecommerce.html .pathInfo./resources/whitepapers/hippo-cms-for-the- enterprise.html .pathInfo./resources/whitepapers/web-content- management-in-the-cloud.html .collectorData.channel.One Hippo English Site . collectorData.audience.terms. referer.www.onehippo. com .collectorData.categories.terms.cms .pathInfo. /mobile-cms .collectorData.channel.One Hippo English Site . pathInfo./ressourcen/demo .pathInfo./resources/videos/hippo-cms-grand-tour. html .collectorData.channel.One Hippo English Site . collectorData.audience.terms. .collectorData. categories.terms.cms .pathInfo./ressources/demo .pathInfo./what_to_buy/compare.html referer.www.cmswire.com .pathInfo./resources/demo .collectorData.categories.terms. mobile .pathInfo./resources/whitepapers/understanding-hippo-cms-7- software-architecture.html .pathInfo./resources/whitepapers/selecting-today’s- enterprise-web-content-management-system.html .collectorData.channel.One Hippo English Site referer.www. google.nl referer.www.onehippo.com .pathInfo./resources/videos/a- quick-overview-of-hippo-cms-in-just-under-3-minutes.html .collectorData.categories.terms.repository .pathInfo. /resources/whitepapers/selecting-today’s-enterprise-web- content-management-system.html .collectorData.categories.terms. .collectorData.categories. terms.relevance
  26. 30.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Actionable Insights! we can find a small model that can be used for human interpretation and automated personalization
  27. 31.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Product Challenge KISS # parameters should be minimal
  28. 32.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Parameters Recommendations 1 hyper-param Personalization idem NICE!
  29. 34.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Fonts & Colors • Use either Georgia (for headers) • or Proxima Nova (for body) as replacement for interstate • Use the colors from the styleguide: • Bright ◦ #F585466 #EF3E42 #00A5E3 ◦ #9AC13C #38B9AB #8C64AB • Neutral ◦ #F0EFE8 #D3D2C6 #405168 ◦ #8D98A9 #FFFFFF ;-) follow the Hippo trail Org / venue / event Create Digital Miracles
  30. 35.

    follow the Hippo trail Hippo GetTogether 2014 Big Data @

    Hippo Visuals • Use the stock images from drive at collateral/visuals/stockimages ◦ Ask marketing if you need others from istockphoto.com, we’re usually happy to buy them for you follow the Hippo trail Org / venue / event Create Digital Miracles