Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a recommender from a big behavior graph over Cassandra

Building a recommender from a big behavior graph over Cassandra

Magazine Luiza recommender system Presented at Strata 2014 San Jose with Arthur Grava.

Gleicon Moraes

March 16, 2017
Tweet

More Decks by Gleicon Moraes

Other Decks in Technology

Transcript

  1. Building a recommender from a big behavior graph over Cassandra

    Arthur Grava Technical Lead - Big Data @arthur_grava [email protected] Gleicon Moraes Director of Data Engineering https://luc.id [email protected]
  2. • 786 stores • 8 distribution centers • +18k employees

    • +40 million clients • 16 million unique visitors / monthly
  3. POC

  4. POC

  5. POC

  6. POC

  7. POC

  8. POC

  9. Why Graph Database ? • Intuitive schema modeling • Abstraction

    on customer and product relations • Easy to iterate over entities and its relations, through Gremlin DSL • Simple way to calculate common customer behaviours • No use of complex matrix calculations • Cassandra + Titan + Rexster + Python
  10. Results • Running in AB test with the current solution

    using wvav recommendations 30% increase on sales
  11. Limitations • Unnecessary python layer • This layer was significantly

    increasing response time • Recommendations were calculated directly on the graph • High computational cost when doing several traversals on graph at once (more on that later) • Supernodes • Hard to add tags or non-graph attributes (multiples email addresses referring to a single customer) without increasing the graph size significantly • Events collected server side • Implemented a tracker (1x1 pixel) and a async pipeline
  12. Production • No more proxies, direct access to the Graph

    (using Java) • Implemented our pixel tag to directly collect information from browser • Our own analytics system was also developed • Recommendations being calculated outside the graph
  13. Production • No more proxies, direct access to the Graph

    (using Java) • Implemented our pixel tag to directly collect information from browser • Our own analytics system was also developed • Recommendations being calculated outside the graph
  14. Production • No more proxies, direct access to the Graph

    (using Java) • Implemented our pixel tag to directly collect information from browser • Our own analytics system was also developed • Recommendations being calculated outside the graph
  15. • Half average response time • Reduced load on cassandra,

    that enabled • User centric recommendations • Emails and push notifications Results 25% share on emails 23.8% of recommendations as push notifications
  16. Problems • Too many responsibilities for the API module •

    Hard to maintain the code • Problem accessing disk too many times for personalized recommendations • Email and push notifications API Down on most important times
  17. Microservices • Concern separations: event collection and serving recommendations •

    Scale and isolate these behaviours: we can serve recommendations even if we are not collecting events. • Small and simpler codebase, refactoring won't affect the overall system too much, deploys are not huge switches • Faster to add new features and try new algorithms • Better application profiling
  18. • Moving recommendations from Cassandra to Elasticsearch • Pre calculated

    recommendations are stored on elasticsearch, easy to rebuild and query. • Recommendation time drop (personalised): from 400ms to 50ms - that matters in terms of conversion and customer bail-out. • Less traversals on the graph, less overall load in the system Microservices
  19. Business impact From 6% to 15% share on sales 35%

    share on email sales 31.6% of recommendations as push notifications