Building a recommender from a big behavior graph over Cassandra

Building a recommender from a big behavior graph over Cassandra
Arthur Grava Technical Lead - Big Data @arthur_grava [email protected] Gleicon Moraes Director of Data Engineering https://luc.id [email protected]

• 786 stores • 8 distribution centers • +18k employees
• +40 million clients • 16 million unique visitors / monthly

Recommender Systems

Proof of Concept

POC - Environment

Why Graph Database ? • Intuitive schema modeling • Abstraction
on customer and product relations • Easy to iterate over entities and its relations, through Gremlin DSL • Simple way to calculate common customer behaviours • No use of complex matrix calculations • Cassandra + Titan + Rexster + Python

Schema

Graph Language

Graph Language SKU COUNT tv_2 1 tv_3 2 tv_4 3
tv_5 1

Graph Language SKU COUNT tv_4 3 tv_3 2 tv_2 1
tv_5 1

Results • Running in AB test with the current solution
using wvav recommendations 30% increase on sales

Limitations • Unnecessary python layer • This layer was significantly
increasing response time • Recommendations were calculated directly on the graph • High computational cost when doing several traversals on graph at once (more on that later) • Supernodes • Hard to add tags or non-graph attributes (multiples email addresses referring to a single customer) without increasing the graph size significantly • Events collected server side • Implemented a tracker (1x1 pixel) and a async pipeline

Growing up

Production • No more proxies, direct access to the Graph
(using Java) • Implemented our pixel tag to directly collect information from browser • Our own analytics system was also developed • Recommendations being calculated outside the graph

Production environment

• Half average response time • Reduced load on cassandra,
that enabled • User centric recommendations • Emails and push notifications Results 25% share on emails 23.8% of recommendations as push notifications

Problems • Too many responsibilities for the API module •
Hard to maintain the code • Problem accessing disk too many times for personalized recommendations • Email and push notifications API Down on most important times

Problems - data layout on cassandra tables

Scaling up for good

Microservices • Concern separations: event collection and serving recommendations •
Scale and isolate these behaviours: we can serve recommendations even if we are not collecting events. • Small and simpler codebase, refactoring won't affect the overall system too much, deploys are not huge switches • Faster to add new features and try new algorithms • Better application profiling

• Moving recommendations from Cassandra to Elasticsearch • Pre calculated
recommendations are stored on elasticsearch, easy to rebuild and query. • Recommendation time drop (personalised): from 400ms to 50ms - that matters in terms of conversion and customer bail-out. • Less traversals on the graph, less overall load in the system Microservices

Business impact From 6% to 15% share on sales 35%
share on email sales 31.6% of recommendations as push notifications

Microservices environment

Questions?

Building a recommender from a big behavior grap...

Building a recommender from a big behavior graph over Cassandra

More Decks by Gleicon Moraes

Other Decks in Technology

Featured

Transcript