Slide 1

Slide 1 text

Asynq: Asynchronous Programming @ Quora Riley Patterson, Platform Team @ Quora

Slide 2

Slide 2 text

What is Quora?

Slide 3

Slide 3 text

Share & Grow the World’s Knowledge

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Everyone Has Knowledge

Slide 9

Slide 9 text

1. Asynq a. Motivation b. Design c. Applications d. Side Notes 2. Q & A Today’s Talk

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Motivation

Slide 12

Slide 12 text

● Most of our data is stored in MySQL or HBase ● These services are reliable but they are also slow

Slide 13

Slide 13 text

● Solution: Caching ● We extensively use memcache, an in-memory cache store

Slide 14

Slide 14 text

Most of our data fetches look like: webserver memcache MySQL quora.com 1-2ms 10-100ms

Slide 15

Slide 15 text

● Only go to MySQL in case of cache miss ● Miss rate: < 5%

Slide 16

Slide 16 text

Next Optimization: Network Requests

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Image URL Name Short Bio Follower Count

Slide 19

Slide 19 text

def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) short_bio = render_short_bio(uid) follow_button = render_follow_button(uid) ...

Slide 20

Slide 20 text

def render_name(uid): name = mc.get(‘user-name:’ + uid) return ‘’ + name + ‘

Slide 21

Slide 21 text

def render_name(uid): name = mc.get(‘user-name:’ + uid) return ‘’ + name + ‘

Slide 22

Slide 22 text

render_profile(uid) render_name(uid) render_photo(uid) render_follow(uid) mc.get(...) mc.get(...) mc.get(...)

Slide 23

Slide 23 text

webserver memcache ‘user-name’ ‘image-url’ Serial gets 1ms 1ms 1μs

Slide 24

Slide 24 text

render_profile(uid) render_photo(uid) render_name(uid) render_short_bio(uid) mc.get(‘image-url’) mc.get(‘name’) mc.get(‘short-bio’)

Slide 25

Slide 25 text

webserver memcache Batched get multi-get( ‘user-name’, ‘image-url’, …, ‘short-bio’ ) 1ms 1ms 10μs

Slide 26

Slide 26 text

def prime_render_profile(uid): mc.multiget([ ‘user-name’, ‘short-bio’, ‘image-url’, ‘follow-count’, ... ]) def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) short_bio = render_short_bio(uid) follow_button = render_follow_button(uid) ...

Slide 27

Slide 27 text

What do we have now?

Slide 28

Slide 28 text

● Caching for fast data fetches ● Batched network requests to decrease network roundtrips

Slide 29

Slide 29 text

But….

Slide 30

Slide 30 text

● Developers have to write most application logic twice ○ ○ Once for the actual logic that uses that data ● And we have to maintain code in two places

Slide 31

Slide 31 text

Detour: Python Generators

Slide 32

Slide 32 text

def gen(arg): g1 = yield arg * 2 g2 = yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)

Slide 33

Slide 33 text

def gen(arg): g1 = yield arg * 2 g2 = yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)

Slide 34

Slide 34 text

def gen(arg): g1 = yield arg * 2 g2 = yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)

Slide 35

Slide 35 text

def gen(arg): g1 = yield arg * 2 g2 = yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)

Slide 36

Slide 36 text

def gen(arg): g1 = yield arg * 2 g2 = yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)

Slide 37

Slide 37 text

Back to Asynq...

Slide 38

Slide 38 text

Back to Asynq... Design

Slide 39

Slide 39 text

def render_name(uid): name = yield Future(‘user-name:’ + uid) return ‘’ + name + ‘

Slide 40

Slide 40 text

@async() def render_name(uid): name = yield Future(‘user-name:’ + uid) return ‘’ + name + ‘

Slide 41

Slide 41 text

@async() def render_name(uid): name = yield Future(user-name:’ + uid) return ‘’ + name + ‘’ def scheduler(): gen = render_name(uid) obj = gen.next() val = mc.get(obj.key) gen.send(val)

Slide 42

Slide 42 text

Applications

Slide 43

Slide 43 text

def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) bio = render_bio(uid) follow_button = render_follow_button(uid) ...

Slide 44

Slide 44 text

@async() def render_profile(uid): (photo, name, bio, follow_button) = yield ( render_profile_photo(uid), render_name(uid), render_bio(uid), render_follow_button(uid) )

Slide 45

Slide 45 text

render_profile(uid) render_photo(uid) render_name(uid) render_bio(uid) mc.get(‘profile-img’) mc.get(‘name’) mc.get(‘bio’)

Slide 46

Slide 46 text

● Batching logically-distinct units of code ● Batching email pipelines ● … your application?

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

Side Notes

Slide 49

Slide 49 text

● Relationship to asyncio ○ More constrained API for users ○ Minority of our code is the scheduler, which we originally built in python2.7 and could largely be replaced with asyncio

Slide 50

Slide 50 text

● Adventures with making this intuitive ○ At Quora, developers with a wide variety of backgrounds work in our Python codebase, including designers with relatively little coding experience. ○ Several API decisions were made to make using this as easy or easier than priming ○ Fun story about returning from generators

Slide 51

Slide 51 text

● Migrating huge codebase to new data fetching API ○ Made heavy use of static-analysis/AST-based auto migration scripts ○ Saved 50%+ time (including developing those scripts) vs. an estimated manual approach

Slide 52

Slide 52 text

Learn More ...

Slide 53

Slide 53 text

● https://engineering.quora.com/Asynchronous-P rogramming-in-Python ● https://github.com/quora/asynq

Slide 54

Slide 54 text

Q & A