In-Memory, Component-Based Recommender Architecture

In-Memory, Component-Based Recommender Architecture June 25th, 2019

Who are We? • E-commerce founded in late 2014 ◦
Internal Engineering founded in early 2015 ◦ Launched our in-house website in mid-2015, app in late-2015 • Concentrated on women's fashion ◦ Around Rp 100K range as opposed to > Rp 200K range ◦ One of the top fashion e-commerce in Indonesia • Recently rebranded to Sorabel in January 2019

Google Play App Rankings (5th)

• Previous Recommender Development Workflow ◦ Exploration, Computation, Serving •
New Recommender Development Workflow ◦ Exploration, Computation, Serving ◦ The Impact • In-Memory Recommender Architecture ◦ In-Depth Architecture ◦ Pros & Cons Talk Overview

• Software that tries to optimize future actions of its
users ◦ Personalized: based on a specific user’s past behavior ◦ Non-personalized: based on the aggregate non-user contextual model What is a Recommender?

• We have recommenders for many things: ◦ For our
buyers: Restock, Budget, Trend Recommenders ◦ For our warehouse: Order-Item Picking Route Optimizer, Item Placement Optimizer, Logistics Partner Order Allocator ◦ For our customers: Product Recommender • In this talk, we are focused to Product Recommender Recommenders in Sorabel

• Home Feed • Catalogue Feed • Similar Products •
Search, and others! Uses of Product Recommender

• Exploration • Computation • Serving Development Workﬂow

• Exploration ◦ Data scientists sift through the data to
derive deeper insights by looking beyond the basic metrics • Computation ◦ Which classes of algorithm are computationally feasible, and how each model should be built, fit, validated, and then recomputed regularly • Serving ◦ How models are ultimately stored post-computation and served during production to our millions of users Development Workﬂow

Previous Development Workﬂow

Previous Recommender Architecture • BigQuery: data warehouse, from multiple data
source -- MySQL, Cassandra, Kafka • Apache Spark: Engine for large-scale data processing • Dataproc: running Apache Spark jobs inside the GCP • ElasticSearch: An open-source, distributed search engine

• Data scientists explore, analyze the data, try to build
the right model / heuristics ◦ Most of their explorations are done in Python • This is usually done locally → limited hardware resource ◦ Inherently lower limit to the size of data during experimentation → data scientists are then limited to “toy” datasets during this stage ◦ Harder for data scientists to collaborate Exploration

• Data engineers translate data scientists’ works into appropriate Spark
jobs ◦ Computation was mostly done inside Google Dataproc • Data engineers will make necessary changes so the model is ready for production ◦ For example, dummy dataset vs production-scale dataset ◦ Long back-and-forth feedback loop between data scientists & engineers • Recommendations were largely precomputed at less-than-optimal scope: at the feed level, done daily ◦ Computation and disk-writes take a long time (+ storage costs!) ◦ Low usage rate → not everyone will visit their precomputed feed on a daily basis Computation

• Production read-path that serves the actual models / recommendations
to our users • A dual-layer architecture: ◦ Highly-stateful data storage layer -- ElasticSearch ◦ Stateless (horizontally-scaled) REST API servers, that mostly read from the stateful layer with minimal post-processing • Implemented and maintained by backend engineers Serving

• Exploration is usually done locally ◦ Local resource is
limited ◦ They usually play with tiny subsets of data to get the work done locally ◦ Harder for data scientists to collaborate • Going back and forth between data scientists and data engineers took longer than it should be • Long indexing time (daily job took ~4-8 hrs) • Non-trivial cost and complexity in the serving infra (Dataproc + ElasticSearch) Recap: Previous Workﬂow Problems

New Development Workﬂow

• Data Scientists can utilize Sciencebox ◦ Jupyterlab running in
dedicated containers on top of Kubernetes for each data scientist ◦ Instant access to powerful resources ▪ Large core count & RAM ▪ GPU if needed • No longer need to play around with tiny subset of data • Easier to collaborate, share, and evaluate works between data scientists Exploration

• Introducing DataQuery ◦ A platform where anyone can build
their own derived tables easily ◦ Derived table is composite-data tables -- tables whose data are composed from multiple tables ▪ From raw tables / other derived tables ▪ Mostly defined by a SQL query ▪ Editable data refresh frequency ◦ Built on top of Google BigQuery Computation

• Frequently, simpler model can be realized by using just
DataQuery ◦ No need for any Spark jobs for most of the cases ◦ Data Scientists can do this independently Computation

• Serving infrastructure is now a single-layer, Go-based in-memory service
(IM for short) ◦ We load the “models” from DataQuery (or any other data) into the service’s resident memory as “Components”, conditionally at startup or as-needed ◦ Components are built on top of each other to build a more complete and capable “component tree” that then serves the actual recommendations as a group Serving

• Serving infrastructure is now a single-layer, Go-based in-memory service
(IM for short), ◦ Additional computations (including but not limited to inter-model component stitching, data re-sorting, realtime-data-sensitive logic, etc.) can be done within the Components on-request, on-the-fly ◦ Centralized “component registry” handles caching / re-computations of diﬀerent parts of the component-tree for best performance with little manual work, not dissimilar to React’s Virtual DOM concept used in user interfaces ◦ A much larger chunk of the user-specific recommendation computation can now be done on-the-fly, only when the user comes to the site Serving

• Backend engineers implement the components ◦ However, due to
its simplicity, data scientists often implement components by themselves ◦ Data scientist’s workflow on a new feature is now then very simple: i. Play around with algorithms & data at Sciencebox ii. Write the production version of the algorithm as a DataQuery table iii. “Wrap” the DataQuery model in an IM `Component` ◦ We’ll talk about how this works more in-depth later on Serving

Workﬂow Comparison Exploration Computation Serving Data Scientists Data Engineers/ Backend
Engineers Exploration Computation Serving Previous Workflow: New Workflow:

Architecture Comparison Previous Architecture: New Architecture:

In-Memory Recommender In-Depth

• Data Component • Registry • Cache • Endpoint •
All within a single service The IM Architecture

The IM Architecture

Data Component • Responsible for processing data on-the-fly • May
depend on external data source (BigQuery, etc.) or other Data Components (or both) • The resulting data can be used directly by other components

Data Component • CachePolicy defines the caching configuration -- more
on this later • GetKey serializes the args into a string -- for cache key • GetData processes data with given args ◦ Fetcher is an utility interface to interact with other components and data sources

Data Component: Example

• Registry is the central manager that is responsible in
making sure Components are computed, cached, and pre-computed at the right times. • It handles data caching, depending on the component’s CachePolicy ◦ Uses LRU (least recently used) eviction policy • All fetcher.Fetch (component sub-dependency) calls go through the registry ◦ It checks the cache with the key returned by Component’s GetKey method ◦ If exists, it returns the cached data ◦ Otherwise, it calls the component’s GetData method and update the cache Registry

• Expiration: data TTL • MaxEntrySize: maximum number of cached
data entries • NoCache: GetData will be called at all times • MustPresent: determines the cache type Cache Policy

• Persistent Cache: data must be present at all times
◦ Critical data components are of this type, i.e. the service cannot run without those data. Example: ProductMapComponent ◦ They are initially fetched during startup ◦ If the cached data is expired, the registry returns the expired data, while fetching the new one in the background • Volatile Cache: it is okay when the data is expired ◦ Most of them are components that depend to other components Cache Types

• A cache stampede is a type of cascading failure
that can occur when massively parallel computing systems with caching mechanisms come under very high load. • Example: A sudden jump in traﬀic (e.g. from TVC) to a page will end up in mostly identical data component request tree -- wasting CPU cycles and useless downstream requests to service dependencies Cache Stampede Problem

• Solution: Make sure only one of the identical GetData
requests is executed at the same time ◦ Try to acquire lock whenever there’s a cache miss ◦ If there’s a failure in acquiring the lock, wait until the lock is released. By then, the data should already be cached Cache Stampede Problem

Cache Stampede Problem

• We load member-specific data only when that member arrives
• Meanwhile, we scale our service horizontally to multiple servers ◦ Since the data is cached locally, the state of each server could be different • Multiple requests by the same member can end up being served by different server ◦ The same computation will be done redundantly across multiple servers ◦ Reducing the cache hit-rate Member Affinity Routing

• We employ custom header-based routing through Envoy -- microservice
proxy ◦ Consistent routing through X-SS-Member-ID header • The same member will always be served via the same server • Example: ◦ A member opens the home page. One of the servers then compute the home feed specific to that member. ◦ As the member scrolls down, the next requests will come to the same server. As the result is already cached, it returns almost instantly. Member Aﬃnity Routing

Putting It All Together • When we try to implement
some feature, we think of the building blocks ◦ What data components do we need? What is the data source? How should the data be cached? ◦ What are the dependencies between the components? ▪ For most of the cases, it may depend on our existing components

Example: Search Endpoint • Suppose that we want to create
a search endpoint: ◦ Users can search by keyword ◦ Users can also sort the result, by price, popularity, etc.

Example: Search Endpoint

• MatchedProductsForKeywords component filters the products from Product component, and
sort them based on the score it obtained from ProductScore component • But sort is a CPU-heavy operation. How can we improve? Example: Search Endpoint

• We can introduce an intermediary component: ScoreSortedProducts • ScoreSortedProducts
component pre-sorts the composite catalogue made out of the Product and ProductScore. At runtime, MatchedProductsForKeywords would then only need to filter based on the keyword -- without the need to re-sort for every new keyword, thus saving CPU time. Example: Search Endpoint

Putting It All Together

• How if we want to make the search result
personalized? Example: Search Endpoint

Pros and Cons

• It is a simple yet powerful abstraction. Problems can
be much simplified when thinking of the building blocks / components that are required to build the recommendation context. • Programming overhead is further reduced when dependencies can be assumed to be in-memory and precomputed most of the time ◦ No need to worry about network hops for every component dependency fetch (i.e. vs an alternative approach where say, components are stored in Redis Cluster) ◦ More approachable style of programming for data scientists → more end-to-end ownership by data scientists Pros (#1)

• Granular components: ◦ Flexible and composable ◦ Easier to
understand and debug ◦ Can divide work across engineers & scientists ◦ Smaller unit-of-work → better parallelization • In short, it boosts engineers’ and data scientists’ productivity Pros (#2)

• Reduced cost (~50%) ; reduced the number of Dataproc
workers, eliminated ElasticSearch instances ◦ Even though the cost of computing instances is increased • User-specific computations are done only when the user comes -- inherently less wasteful precomputations • Better runtime performance (in-memory → no network latency) ◦ Sub-100ms p95 latency in search and home feed ◦ Automatic caching infrastructure makes us more resilient to traﬀic bursts Pros (#3)

• Improved business metrics in record time ◦ Much faster
iteration in experimentation and deployment of new recommendation algorithms • It is fun! Pros -- the most important

• Slow startup time; since we fetch all the required
data first ◦ Now it takes ~4 minutes for each startup (to load prerequisites data components) ◦ We implement local file cache to speed up local development • More sensitive to memory-leak problems • It requires a lot of RAM (~4G upon startup, increasing as cached data build up) ◦ May pose a problem when developing locally with limited resource Cons

• The issues on slow startup time and resource limitation
can be fixed by playing around with prerequisites configuration ◦ We can choose a minimal subset of data to be fetched during startup ◦ Other data can be fetched gradually as the service runs -- or to disable completely during development Cons

Final Notes

• This approach gives us a powerful programming and architecture
model -- coupled with relevant tools like Sciencebox and DataQuery: ◦ greatly improved productivity of our engineers ◦ enabled the self-suﬀiciency of data scientists ◦ improved the rate of iteration on our algorithms -- thus, metrics and business performance Final Notes

Thanks!

Q & A Session

We’re Hiring!

• Various positions are open at https://careers.sorabel.io ◦ Software Engineers
◦ Data Scientists ◦ Product Designers • Projects: ◦ Massively diverse kinds of projects in a vertically-integrated company ◦ Many diﬀerent companies in one ◦ Flexibility to choose projects We’re hiring!

• We're between 5-15x smaller in engineering team size to
the next biggest ecommerce in the rankings (we only have ~30 engineers!) ◦ Each of you joining will have massive impact on the direction of the company • Immediately productive with: ◦ Best-in-class, SV-level infrastructure ◦ Infrastructure & tooling team who cares We’re hiring!

Thanks! Questions?

In-Memory, Component-Based Recommender Architec...

In-Memory, Component-Based Recommender Architecture

More Decks by Sale Stock Engineering

Other Decks in Technology

Featured

Transcript