Upgrade to Pro — share decks privately, control downloads, hide ads and more …

In-Memory, Component-Based Recommender Architecture

In-Memory, Component-Based Recommender Architecture

https://careers.sorabel.io/

In this talk, we will share the under-the-hood details of our newly re-architected recommender we started building last year.

To put it simply: we eschewed all accepted practices of building web-scale services of structuring your service and data in a nicely-divided stateless and stateful layers. Instead, we loaded most of our recommendation-relevant signals — including the entirety of our product catalogue and the complex multi-dimensional data of our user behaviour and other user features that we have engineered — within a single-layer stateful service's resident memory. We then wrapped those data, alongside data science models, in an intelligently-caching, React's virtualdom-like architecture, coupled with on-the-fly, real-time custom logic, all running on Go's runtime.

Over the half year it's been up, the new approach has given us a major boost in runtime performance, substantial increase in productivity for both engineers and data scientists, and most importantly, improvements in business metrics across the board — leading to some of the best conversion rates in the industry.

Speaker:
Ahmad Zaky is a senior backend engineer at Sorabel (previously named Sale Stock). He has been there for over two years, previously working on notification service, order gRPC microservice, and customer service system, before joining the recommender team. Before Sale Stock, he had a summer internship in Palantir in 2015, and actively participated in Competitive Programming contests. He currently resides in Bandung, and is organizing Facebook Developer Circle Bandung, monthly meetups of developers in the city.

Sale Stock Engineering

April 16, 2019
Tweet

More Decks by Sale Stock Engineering

Other Decks in Technology

Transcript

  1. Who are We? • E-commerce founded in late 2014 ◦

    Internal Engineering founded in early 2015 ◦ Launched our in-house website in mid-2015, app in late-2015 • Concentrated on women's fashion ◦ Around Rp 100K range as opposed to > Rp 200K range ◦ One of the top fashion e-commerce in Indonesia • Recently rebranded to Sorabel in January 2019
  2. • Previous Recommender Development Workflow ◦ Exploration, Computation, Serving •

    New Recommender Development Workflow ◦ Exploration, Computation, Serving ◦ The Impact • In-Memory Recommender Architecture ◦ In-Depth Architecture ◦ Pros & Cons Talk Overview
  3. • Software that tries to optimize future actions of its

    users ◦ Personalized: based on a specific user’s past behavior ◦ Non-personalized: based on the aggregate non-user contextual model What is a Recommender?
  4. • We have recommenders for many things: ◦ For our

    buyers: Restock, Budget, Trend Recommenders ◦ For our warehouse: Order-Item Picking Route Optimizer, Item Placement Optimizer, Logistics Partner Order Allocator ◦ For our customers: Product Recommender • In this talk, we are focused to Product Recommender Recommenders in Sorabel
  5. • Home Feed • Catalogue Feed • Similar Products •

    Search, and others! Uses of Product Recommender
  6. • Exploration ◦ Data scientists sift through the data to

    derive deeper insights by looking beyond the basic metrics • Computation ◦ Which classes of algorithm are computationally feasible, and how each model should be built, fit, validated, and then recomputed regularly • Serving ◦ How models are ultimately stored post-computation and served during production to our millions of users Development Workflow
  7. Previous Recommender Architecture • BigQuery: data warehouse, from multiple data

    source -- MySQL, Cassandra, Kafka • Apache Spark: Engine for large-scale data processing • Dataproc: running Apache Spark jobs inside the GCP • ElasticSearch: An open-source, distributed search engine
  8. • Data scientists explore, analyze the data, try to build

    the right model / heuristics ◦ Most of their explorations are done in Python • This is usually done locally → limited hardware resource ◦ Inherently lower limit to the size of data during experimentation → data scientists are then limited to “toy” datasets during this stage ◦ Harder for data scientists to collaborate Exploration
  9. • Data engineers translate data scientists’ works into appropriate Spark

    jobs ◦ Computation was mostly done inside Google Dataproc • Data engineers will make necessary changes so the model is ready for production ◦ For example, dummy dataset vs production-scale dataset ◦ Long back-and-forth feedback loop between data scientists & engineers • Recommendations were largely precomputed at less-than-optimal scope: at the feed level, done daily ◦ Computation and disk-writes take a long time (+ storage costs!) ◦ Low usage rate → not everyone will visit their precomputed feed on a daily basis Computation
  10. • Production read-path that serves the actual models / recommendations

    to our users • A dual-layer architecture: ◦ Highly-stateful data storage layer -- ElasticSearch ◦ Stateless (horizontally-scaled) REST API servers, that mostly read from the stateful layer with minimal post-processing • Implemented and maintained by backend engineers Serving
  11. • Exploration is usually done locally ◦ Local resource is

    limited ◦ They usually play with tiny subsets of data to get the work done locally ◦ Harder for data scientists to collaborate • Going back and forth between data scientists and data engineers took longer than it should be • Long indexing time (daily job took ~4-8 hrs) • Non-trivial cost and complexity in the serving infra (Dataproc + ElasticSearch) Recap: Previous Workflow Problems
  12. • Data Scientists can utilize Sciencebox ◦ Jupyterlab running in

    dedicated containers on top of Kubernetes for each data scientist ◦ Instant access to powerful resources ▪ Large core count & RAM ▪ GPU if needed • No longer need to play around with tiny subset of data • Easier to collaborate, share, and evaluate works between data scientists Exploration
  13. • Introducing DataQuery ◦ A platform where anyone can build

    their own derived tables easily ◦ Derived table is composite-data tables -- tables whose data are composed from multiple tables ▪ From raw tables / other derived tables ▪ Mostly defined by a SQL query ▪ Editable data refresh frequency ◦ Built on top of Google BigQuery Computation
  14. • Frequently, simpler model can be realized by using just

    DataQuery ◦ No need for any Spark jobs for most of the cases ◦ Data Scientists can do this independently Computation
  15. • Serving infrastructure is now a single-layer, Go-based in-memory service

    (IM for short) ◦ We load the “models” from DataQuery (or any other data) into the service’s resident memory as “Components”, conditionally at startup or as-needed ◦ Components are built on top of each other to build a more complete and capable “component tree” that then serves the actual recommendations as a group Serving
  16. • Serving infrastructure is now a single-layer, Go-based in-memory service

    (IM for short), ◦ Additional computations (including but not limited to inter-model component stitching, data re-sorting, realtime-data-sensitive logic, etc.) can be done within the Components on-request, on-the-fly ◦ Centralized “component registry” handles caching / re-computations of different parts of the component-tree for best performance with little manual work, not dissimilar to React’s Virtual DOM concept used in user interfaces ◦ A much larger chunk of the user-specific recommendation computation can now be done on-the-fly, only when the user comes to the site Serving
  17. • Backend engineers implement the components ◦ However, due to

    its simplicity, data scientists often implement components by themselves ◦ Data scientist’s workflow on a new feature is now then very simple: i. Play around with algorithms & data at Sciencebox ii. Write the production version of the algorithm as a DataQuery table iii. “Wrap” the DataQuery model in an IM `Component` ◦ We’ll talk about how this works more in-depth later on Serving
  18. Workflow Comparison Exploration Computation Serving Data Scientists Data Engineers/ Backend

    Engineers Exploration Computation Serving Previous Workflow: New Workflow:
  19. • Data Component • Registry • Cache • Endpoint •

    All within a single service The IM Architecture
  20. Data Component • Responsible for processing data on-the-fly • May

    depend on external data source (BigQuery, etc.) or other Data Components (or both) • The resulting data can be used directly by other components
  21. Data Component • CachePolicy defines the caching configuration -- more

    on this later • GetKey serializes the args into a string -- for cache key • GetData processes data with given args ◦ Fetcher is an utility interface to interact with other components and data sources
  22. • Registry is the central manager that is responsible in

    making sure Components are computed, cached, and pre-computed at the right times. • It handles data caching, depending on the component’s CachePolicy ◦ Uses LRU (least recently used) eviction policy • All fetcher.Fetch (component sub-dependency) calls go through the registry ◦ It checks the cache with the key returned by Component’s GetKey method ◦ If exists, it returns the cached data ◦ Otherwise, it calls the component’s GetData method and update the cache Registry
  23. • Expiration: data TTL • MaxEntrySize: maximum number of cached

    data entries • NoCache: GetData will be called at all times • MustPresent: determines the cache type Cache Policy
  24. • Persistent Cache: data must be present at all times

    ◦ Critical data components are of this type, i.e. the service cannot run without those data. Example: ProductMapComponent ◦ They are initially fetched during startup ◦ If the cached data is expired, the registry returns the expired data, while fetching the new one in the background • Volatile Cache: it is okay when the data is expired ◦ Most of them are components that depend to other components Cache Types
  25. • A cache stampede is a type of cascading failure

    that can occur when massively parallel computing systems with caching mechanisms come under very high load. • Example: A sudden jump in traffic (e.g. from TVC) to a page will end up in mostly identical data component request tree -- wasting CPU cycles and useless downstream requests to service dependencies Cache Stampede Problem
  26. • Solution: Make sure only one of the identical GetData

    requests is executed at the same time ◦ Try to acquire lock whenever there’s a cache miss ◦ If there’s a failure in acquiring the lock, wait until the lock is released. By then, the data should already be cached Cache Stampede Problem
  27. • We load member-specific data only when that member arrives

    • Meanwhile, we scale our service horizontally to multiple servers ◦ Since the data is cached locally, the state of each server could be different • Multiple requests by the same member can end up being served by different server ◦ The same computation will be done redundantly across multiple servers ◦ Reducing the cache hit-rate Member Affinity Routing
  28. • We employ custom header-based routing through Envoy -- microservice

    proxy ◦ Consistent routing through X-SS-Member-ID header • The same member will always be served via the same server • Example: ◦ A member opens the home page. One of the servers then compute the home feed specific to that member. ◦ As the member scrolls down, the next requests will come to the same server. As the result is already cached, it returns almost instantly. Member Affinity Routing
  29. Putting It All Together • When we try to implement

    some feature, we think of the building blocks ◦ What data components do we need? What is the data source? How should the data be cached? ◦ What are the dependencies between the components? ▪ For most of the cases, it may depend on our existing components
  30. Example: Search Endpoint • Suppose that we want to create

    a search endpoint: ◦ Users can search by keyword ◦ Users can also sort the result, by price, popularity, etc.
  31. • MatchedProductsForKeywords component filters the products from Product component, and

    sort them based on the score it obtained from ProductScore component • But sort is a CPU-heavy operation. How can we improve? Example: Search Endpoint
  32. • We can introduce an intermediary component: ScoreSortedProducts • ScoreSortedProducts

    component pre-sorts the composite catalogue made out of the Product and ProductScore. At runtime, MatchedProductsForKeywords would then only need to filter based on the keyword -- without the need to re-sort for every new keyword, thus saving CPU time. Example: Search Endpoint
  33. • How if we want to make the search result

    personalized? Example: Search Endpoint
  34. • It is a simple yet powerful abstraction. Problems can

    be much simplified when thinking of the building blocks / components that are required to build the recommendation context. • Programming overhead is further reduced when dependencies can be assumed to be in-memory and precomputed most of the time ◦ No need to worry about network hops for every component dependency fetch (i.e. vs an alternative approach where say, components are stored in Redis Cluster) ◦ More approachable style of programming for data scientists → more end-to-end ownership by data scientists Pros (#1)
  35. • Granular components: ◦ Flexible and composable ◦ Easier to

    understand and debug ◦ Can divide work across engineers & scientists ◦ Smaller unit-of-work → better parallelization • In short, it boosts engineers’ and data scientists’ productivity Pros (#2)
  36. • Reduced cost (~50%) ; reduced the number of Dataproc

    workers, eliminated ElasticSearch instances ◦ Even though the cost of computing instances is increased • User-specific computations are done only when the user comes -- inherently less wasteful precomputations • Better runtime performance (in-memory → no network latency) ◦ Sub-100ms p95 latency in search and home feed ◦ Automatic caching infrastructure makes us more resilient to traffic bursts Pros (#3)
  37. • Improved business metrics in record time ◦ Much faster

    iteration in experimentation and deployment of new recommendation algorithms • It is fun! Pros -- the most important
  38. • Slow startup time; since we fetch all the required

    data first ◦ Now it takes ~4 minutes for each startup (to load prerequisites data components) ◦ We implement local file cache to speed up local development • More sensitive to memory-leak problems • It requires a lot of RAM (~4G upon startup, increasing as cached data build up) ◦ May pose a problem when developing locally with limited resource Cons
  39. • The issues on slow startup time and resource limitation

    can be fixed by playing around with prerequisites configuration ◦ We can choose a minimal subset of data to be fetched during startup ◦ Other data can be fetched gradually as the service runs -- or to disable completely during development Cons
  40. • This approach gives us a powerful programming and architecture

    model -- coupled with relevant tools like Sciencebox and DataQuery: ◦ greatly improved productivity of our engineers ◦ enabled the self-sufficiency of data scientists ◦ improved the rate of iteration on our algorithms -- thus, metrics and business performance Final Notes
  41. • Various positions are open at https://careers.sorabel.io ◦ Software Engineers

    ◦ Data Scientists ◦ Product Designers • Projects: ◦ Massively diverse kinds of projects in a vertically-integrated company ◦ Many different companies in one ◦ Flexibility to choose projects We’re hiring!
  42. • We're between 5-15x smaller in engineering team size to

    the next biggest ecommerce in the rankings (we only have ~30 engineers!) ◦ Each of you joining will have massive impact on the direction of the company • Immediately productive with: ◦ Best-in-class, SV-level infrastructure ◦ Infrastructure & tooling team who cares We’re hiring!