2017 - Performant Asynchronous Programming at Quora

Asynq: Asynchronous Programming @ Quora Riley Patterson, Platform Team @
Quora

What is Quora?

Share & Grow the World’s Knowledge

Everyone Has Knowledge

1. Asynq a. Motivation b. Design c. Applications d. Side
Notes 2. Q & A Today’s Talk

Motivation

• Most of our data is stored in MySQL or
HBase • These services are reliable but they are also slow

• Solution: Caching • We extensively use memcache, an in-memory
cache store

Most of our data fetches look like: webserver memcache MySQL
quora.com 1-2ms 10-100ms

• Only go to MySQL in case of cache miss
• Miss rate: < 5%

Next Optimization: Network Requests

Image URL Name Short Bio Follower Count

def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) short_bio =
render_short_bio(uid) follow_button = render_follow_button(uid) ...

def render_name(uid): name = mc.get(‘user-name:’ + uid) return ‘’ +
name + ‘’

render_profile(uid) render_name(uid) render_photo(uid) render_follow(uid) mc.get(...) mc.get(...) mc.get(...)

webserver memcache ‘user-name’ ‘image-url’ Serial gets 1ms 1ms 1μs

render_profile(uid) render_photo(uid) render_name(uid) render_short_bio(uid) mc.get(‘image-url’) mc.get(‘name’) mc.get(‘short-bio’)

webserver memcache Batched get multi-get( ‘user-name’, ‘image-url’, …, ‘short-bio’ )
1ms 1ms 10μs

def prime_render_profile(uid): mc.multiget([ ‘user-name’, ‘short-bio’, ‘image-url’, ‘follow-count’, ... ]) def
render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) short_bio = render_short_bio(uid) follow_button = render_follow_button(uid) ...

What do we have now?

• Caching for fast data fetches • Batched network requests
to decrease network roundtrips

But….

• Developers have to write most application logic twice ◦
◦ Once for the actual logic that uses that data • And we have to maintain code in two places

Detour: Python Generators

def gen(arg): g1 = yield arg * 2 g2 =
yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)

Back to Asynq...

Back to Asynq... Design

def render_name(uid): name = yield Future(‘user-name:’ + uid) return ‘’
+ name + ‘’

@async() def render_name(uid): name = yield Future(‘user-name:’ + uid) return
‘’ + name + ‘’

@async() def render_name(uid): name = yield Future(user-name:’ + uid) return
‘’ + name + ‘’ def scheduler(): gen = render_name(uid) obj = gen.next() val = mc.get(obj.key) gen.send(val)

Applications

def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) bio =
render_bio(uid) follow_button = render_follow_button(uid) ...

@async() def render_profile(uid): (photo, name, bio, follow_button) = yield (
render_profile_photo(uid), render_name(uid), render_bio(uid), render_follow_button(uid) )

render_profile(uid) render_photo(uid) render_name(uid) render_bio(uid) mc.get(‘profile-img’) mc.get(‘name’) mc.get(‘bio’)

• Batching logically-distinct units of code • Batching email pipelines
• … your application?

Side Notes

• Relationship to asyncio ◦ More constrained API for users
◦ Minority of our code is the scheduler, which we originally built in python2.7 and could largely be replaced with asyncio

• Adventures with making this intuitive ◦ At Quora, developers
with a wide variety of backgrounds work in our Python codebase, including designers with relatively little coding experience. ◦ Several API decisions were made to make using this as easy or easier than priming ◦ Fun story about returning from generators

• Migrating huge codebase to new data fetching API ◦
Made heavy use of static-analysis/AST-based auto migration scripts ◦ Saved 50%+ time (including developing those scripts) vs. an estimated manual approach

Learn More ...

• https://engineering.quora.com/Asynchronous-P rogramming-in-Python • https://github.com/quora/asynq

2017 - Performant Asynchronous Programming at Quora

2017 - Performant Asynchronous Programming at Quora

More Decks by PyBay

Other Decks in Programming

Featured

Transcript