Building Prefix Search as a Service

GLE STYLE AUTOCOMPLETE ERING ENGINE FOR SUGGESTIO DING PREFIX SEARCH
AS A SER W TO BUILD A GOOGLE STYLE OCOMPLETE POWERING ENGIN GESTIONS BUILDING PREFIX SE SERVICE - HOW TO BUILD A BUILDING PREFIX SEARCH AS A SERVICE

WHAT IS PREFIXY? • A hosted prefix search engine that
powers autocomplete suggestions • Dynamically updates and ranks autocomplete suggestions based on user input • Easy to use service for app developers to implement on any search field in their app

OVERVIEW • What we prioritized during the research and development
stages of the project • How we thought about tradeoffs as we designed and built a system from scratch • How we chose the right data structures, algorithms, and data stores for the system • How we built a system with the flexibility to scale as we get more users

IMPORTANT TERMS prefix completions score suggestions selection cade maggio 99
caleb runte 50 camilia wintheiser 49 camryn hauck 40 cara block 38 cameron brown 10 catie leeroy 10 ... ... ca cade maggio 99 caleb runte 50 camilia wintheiser 49 camryn hauck 40 cara block 38

DESIGN GOALS Requirements • Must have lightning fast reads •
Suggestions should be dynamically ranked and relevant to the app user Implications & Approach • We want to prioritize speed of reads • We need a ranking algorithm

DATA STRUCTURES & ALGORITHMS

THREE MEASUREMENTS TO CONSIDER FOR BIG O • N -
the number of keys/nodes (e.g. prefixes) in our dataset • K - the number of completions for a given prefix • L - the length of the string we are looking up

SOME COMPLETIONS WITH SCORES “car” → 30 “cat” → 90
“cod” → 10 “cart” → 10 “coin” → 1 “cold” → 5 How should we store these?

Allows for O(L) lookup of a prefix. All descendents of
a given node share the node as a common prefix, therefore consume less space TRIE IS A NATURAL FIT FOR PREFIX SEARCH c a o r: 30 t: 90 i l n: 1 root d: 5 d: 10 t: 10

SEARCHING FOR COMPLETIONS IN A TRIE c ca co car:
30 cat: 90 coi col coin: 1 root cold: 5 cod: 10 cart: 10 O(L) + O(N) + O(K log K) [car:30, cart:10, cat:90] 1. Find the prefix O(L) 2. Find all completions (and put them into an array) O(N) 3. Sort the completions O(K log K) Search for all completions that start with “ca” car: 30 cat: 90 cart: 10 c ca root

STORING COMPLETIONS c ca co car cat coi col coin
root cold cod cart We can store all completions that begin with that prefix in each node. 1. Find the prefix O(L) 2. Find all completions O(N) 3. Sort the completions O(K log K) But this comes at a cost! More space consumed and more writes. O(L) + O(K log K) [car:30, cart:10, cat:90] * Scores omitted for presentation purposes [cold, coin, cod, cat, cart, car]* [car, cart]* [cart]*

Diminishing returns of large L and K FURTHER OPTIMIZATIONS Limit
max length of prefixes we store Hold L constant • “Tyrannosaurus Rex lived during the late Cretaceous period” Limit completion bucket size in node Hold K constant • Only need enough to support suggestions and ranking O(L) + O(K log K) O(1) + O(1) O(1)

PREFIX HASH TREE c ca co car cat coi col
coin root cold cod cart • One step access as we no longer need to traverse. • Easy to implement with a key-value NoSQL data store (like Redis!) key value c [car:30, cat…] ca [car:30, cat…] cat [cat:90] car [car:30, cart:10] cart [cart:10] co [cod:10, coin…] coi [coin:1] coin [coin:1] col [cold:5] cold [cold:5] cod [cod:10]

BIG O SUMMARY Search Insert Update / Delete Space Consideration
Trie O(L) + O(N) + O(K log K) O(L) O(L) completions share prefixes Trie With Completions O(L) + O(K log K) O(L) O(LK) bucket of completions at each prefix node hold L constant O(K log K) O(1) O(K) reduces number of prefixes we store hold K constant O(1) O(1) O(1) caps size bucket of completions size Prefix Hash Tree w/ completions, constant L & K O(1) no traversal! O(1) O(1) slightly more space to accommodate hash table allocation, plus have to duplicate prefixes (“c”, “ca”, “car”)

REDIS IMPLEMENTATION

WHY REDIS? key value c [car:30, cat…] ca [car:30, cat…]
cat [cat:90] car [car:30, cart:10] cart [cart:10] co [cod:10, coin…] coi [coin:1] coin [coin:1] col [cold:5] cold [cold:5] cod [cod:10] Which Redis data structure to use for completions? • Entirely in-memory meets our performance requirement • Native in-memory data structures for managing completions

REDIS LIST … hi:50 hello:44 help:30 how are you:30 here:28
happy:29 h how many:29 Key Value • Lists in Redis are a type of linked list • Access head and tail nodes in O(1) • Access other nodes in O(K)

REDIS LIST - SEARCH … hi:50 hello:44 help:30 how are
you:30 here:28 happy:29 how many:29 [‘hi:50’,‘hello:44’,‘help:30’,‘how are you:30’,‘happy:29’] LRANGE to get first 5 nodes: O(1) 1

REDIS LIST - INCREMENT … hi:50 hello:44 help:30 how are
you:30 here:28 happy:29 how many:29 [‘hi:50’,‘hello:44’,‘help:30’,‘how are you:30’,‘happy:29’,‘how many:29’,’here:28’, ...] LRANGE to get entire list: O(K) 1

… hi:50 hello:44 help:30 how are you:30 here:28 happy:29 how
many:29 [‘hi:50’,‘hello:44’,‘help:30’,‘how are you:30’,’happy:30’,‘how many:29’,‘here:29’, ...] Binary search of returned array to find and increment completion: O(log K) 2 REDIS LIST - INCREMENT

many:29 [‘hi:50’,‘hello:44’,’happy:30’,‘help:30’,‘how are you:30’,‘how many:29’,‘here:29’, ...] Binary search for insertion in new location: O(log K) 3 REDIS LIST - INCREMENT

many:29 LREM to remove completion from its current position: O(K) 4 REDIS LIST - INCREMENT

… hi:50 hello:44 happy:30 help:30 here:28 how are you:30 how
many:29 LINSERT to insert completion in its new position: O(K) 5 REDIS LIST - INCREMENT

REDIS LIST Pros • O(1) search Cons Reads: O(1) Writes:
O(K) *Round Trips: 2 • 2 round trips per update • May have concurrency issues • Large payload • No uniqueness guarantee • Have to sort in JavaScript instead of on DB level … hi:50 hello:44 happy:30 help:30 here:28 how are you:30 how many:29

REDIS SORTED SET completion score hi 50 hello 44 help
30 how are you 30 happy 29 how many 29 here 28 ... ... h Key Value • Sorted sets in Redis are implemented with skip lists • Handles uniqueness and order • Most operations are O(log K)

completion score hi 50 hello 44 help 30 how are
you 30 happy 29 how many 29 here 28 ... ... [‘hi’,‘hello’,‘help’,‘how are you’,‘happy’] ZRANGE to get first 5 elements: O(log K) 1 REDIS SORTED SET - SEARCH

REDIS SORTED SET - INCREMENT completion score hi 50 hello
44 happy 30 help 30 how are you 30 happy 29 how many 29 here 28 ... ... ZINCRBY to increment score: O(log K) Redis handles order and uniqueness constraint. 1

REDIS SORTED SET Cons • Search is technically O(log K)
Pros completion score hi 50 hello 44 happy 30 help 30 how are you 30 how many 29 here 28 ... ... Reads: O(log K) Writes: O(log K) *Round Trips: 1 • Fewer round trips • Less chance of concurrency issues • Smaller payloads • Uniqueness guarantee • Faster than doing it in JS • Non-blocking

MAINTAINING BUCKET LIMIT (K) completion score ... ... java 15
jquery 10 jshint 10 completion score ... ... java 15 javascript 11 jquery 10 User submits a search for “javascript” 1 Since our bucket is full, we remove the lowest ranked completion Insert “javascript” with “jshint”’s score plus 1 3 2

WHAT IF WE RUN OUT OF MEMORY? • Set an
LRU policy in Redis • Persist to MongoDB: able to store more than what we can fit in memory • Reads still fast: generally 1 trip per search, more trips for updates Redis MongoDB Prefixy Always check Redis first 1 If we have a cache miss, check Mongo 2

SYSTEM ARCHITECTURE SO FAR request response CLI Prefixy App Server
client.js Redis MongoDB

BUILDING THE SERVICE

TOKEN GENERATION + AUTHENTICATION WORKFLOW Prefixy Token Generator client.js Jane,
an app developer, visits token generator to get her JWT + custom scripts Now any request to Prefixy sent from Jane’s site will include her JWT Token generator creates a unique tenant ID, and then encrypts it into a JWT Jane includes her custom script in the frontend code of her web application Server decrypts JWT to get tenant ID 1 2 3 4 5

Prefixy MongoDB Redis Jane’s client.js MULTI-TENANCY WORKFLOW Prefixy decrypts the
JWT to get Jane’s tenant ID Searches from Jane’s client.js will send the request to Prefixy with Jane’s JWT 1 2 Prefixy uses Jane’s tenant ID to get the data that is specific to her site 3

MULTI-TENANCY ON THE BACK-END <tenantId> { prefix completions } key
value <tenantId>:c ... <tenantId>:ca ... <tenantId>:cam ... { prefix completions } { prefix: ‘c’, completions: [...] } Redis Mongo We allocate a Mongo collection to Jane, and the name of this collection is her tenant ID In Redis, we prepend every key of Jane’s data with her tenant ID

Redis MongoDB CLI Prefixy client.js Token Generator request response SYSTEM
ARCHITECTURE

KEY FUNCTIONS

function draw() { ... this.suggestions.forEach((suggestion, index) => { const li
= document.createElement('li'); const span1 = document.createElement('span'); const span2 = document.createElement('span') span1.classList.add('suggestion', 'typed'); span2.classList.add('suggestion'); span1.textContent = suggestion.match(typed); span2.textContent = suggestion.slice(span1.textContent.length); li.appendChild(span1); li.appendChild(span2); this.listUI.appendChild(li); }); } CLIENT.JS … and show suggestions to user 3

1 async function search(prefixQuery, tenant, opts={}) { 2 const defaultOpts
= { limit: this.suggestionCount, withScores: false }; 3 opts = { ...defaultOpts, ...opts } 4 const limit = opts.limit - 1; 5 const prefix = this.normalizePrefix(prefixQuery); 6 const prefixWithTenant = this.addTenant(prefix, tenant); 7 8 let args = [prefixWithTenant, 0, limit]; 9 if (opts.withScores) args = args.concat('WITHSCORES'); 10 11 let result = await this.redisClient.zrangeAsync(...args); 12 13 if (result.length === 0) { 14 await this.mongoLoad(prefix, tenant); 15 result = await this.redisClient.zrangeAsync(...args); 16 } 17 18 return result; 19 } SEARCH

1 async function increment(completion, tenant) { 3 const prefixes =
this.extractPrefixes(completion); 4 const commands = []; 5 6 for (let i = 0; i < prefixes.length; i++) { 7 let prefixWithTenant = this.addTenant(prefixes[i], tenant); 8 let count = await this.getCompletionsCount(prefixes[i], tenant, prefixWithTenant); 9 const includesCompletion = await this.redisClient.zscoreAsync(prefixWithTenant, completion); 10 11 if (count >= this.limit && !includesCompletion) { 12 const lastPosition = this.limit - 1; 13 const lastElement = await this.redisClient.zrangeAsync(prefixWithTenant, lastPosition, 'WITHSCORES'); 14 const newScore = lastElement[1] - 1; 15 commands.push(['zremrangebyrank', prefixWithTenant, lastPosition, -1]); 16 commands.push(['zadd', prefixWithTenant, newScore, completion]); 17 } else { 18 commands.push(['zincrby', prefixWithTenant, -1, completion]); 19 } 20 } 21 return this.redisClient.batch(commands).execAsync().then(async () => { 22 await this.persistPrefixes(prefixes, tenant); 23 }); INCREMENT

FUTURE PLANS • Custom configurability of L and K •
Scaling Redis to minimize cache misses • Scripting Redis to reduce network requests (e.g. write-through cache) • API rate limiter

QUESTIONS? jayshenk.com [email protected] @jay_shenk

Building Prefix Search as a Service

Building Prefix Search as a Service

More Decks by Jay Shenk

Other Decks in Programming

Featured

Transcript