Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2017 - Performant Asynchronous Programming at Quora

Db2ee812bdc6fd057f8f4209c08b6f63?s=47 PyBay
August 21, 2017

2017 - Performant Asynchronous Programming at Quora

Description
In this talk, we will discuss the design of Quora's asynq framework, which provides an asynchronous API to a global scheduler for data requests. We will explore in depth the common problem that motivated it, the design of the framework, and how it has been used in practice to make both the product and development faster at Quora.

Abstract
In order to provide a fast distributed web application to millions of Quora users, we need to be smart about batching data requests to minimize the time spent blocked on network I/O. Moreover, it's important to accomplish this batching in a general way that doesn't require repetitive work every time we make a change or require new data. In this talk, we will discuss the design of Quora's asynq framework, which provides an asynchronous API to a global scheduler for data requests. We will explore in depth the common problem that motivated it, the design of the framework, and how it has been used in practice to make both the product and development faster at Quora.

Bio
Riley Patterson is a software engineering manager on the Platform Frameworks Team at Quora. Our core web application platform is built in Python on top of a web framework that we built on the core of Pylons. As such, the Platform Team uses and builds a wide variety of Python tools and abstractions to enable faster, more effective, and more enjoyable development across the entire team at Quora.

https://www.youtube.com/watch?v=0iqibyfxw3w

Db2ee812bdc6fd057f8f4209c08b6f63?s=128

PyBay

August 21, 2017
Tweet

Transcript

  1. Asynq: Asynchronous Programming @ Quora Riley Patterson, Platform Team @

    Quora
  2. What is Quora?

  3. Share & Grow the World’s Knowledge

  4. None
  5. None
  6. None
  7. None
  8. Everyone Has Knowledge

  9. 1. Asynq a. Motivation b. Design c. Applications d. Side

    Notes 2. Q & A Today’s Talk
  10. None
  11. Motivation

  12. • Most of our data is stored in MySQL or

    HBase • These services are reliable but they are also slow
  13. • Solution: Caching • We extensively use memcache, an in-memory

    cache store
  14. Most of our data fetches look like: webserver memcache MySQL

    quora.com 1-2ms 10-100ms
  15. • Only go to MySQL in case of cache miss

    • Miss rate: < 5%
  16. Next Optimization: Network Requests

  17. None
  18. Image URL Name Short Bio Follower Count

  19. def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) short_bio =

    render_short_bio(uid) follow_button = render_follow_button(uid) ...
  20. def render_name(uid): name = mc.get(‘user-name:’ + uid) return ‘<b>’ +

    name + ‘</b>’
  21. def render_name(uid): name = mc.get(‘user-name:’ + uid) return ‘<b>’ +

    name + ‘</b>’
  22. render_profile(uid) render_name(uid) render_photo(uid) render_follow(uid) mc.get(...) mc.get(...) mc.get(...)

  23. webserver memcache ‘user-name’ ‘image-url’ Serial gets 1ms 1ms 1μs

  24. render_profile(uid) render_photo(uid) render_name(uid) render_short_bio(uid) mc.get(‘image-url’) mc.get(‘name’) mc.get(‘short-bio’)

  25. webserver memcache Batched get multi-get( ‘user-name’, ‘image-url’, …, ‘short-bio’ )

    1ms 1ms 10μs
  26. def prime_render_profile(uid): mc.multiget([ ‘user-name’, ‘short-bio’, ‘image-url’, ‘follow-count’, ... ]) def

    render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) short_bio = render_short_bio(uid) follow_button = render_follow_button(uid) ...
  27. What do we have now?

  28. • Caching for fast data fetches • Batched network requests

    to decrease network roundtrips
  29. But….

  30. • Developers have to write most application logic twice ◦

    ◦ Once for the actual logic that uses that data • And we have to maintain code in two places
  31. Detour: Python Generators

  32. def gen(arg): g1 = yield arg * 2 g2 =

    yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)
  33. def gen(arg): g1 = yield arg * 2 g2 =

    yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)
  34. def gen(arg): g1 = yield arg * 2 g2 =

    yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)
  35. def gen(arg): g1 = yield arg * 2 g2 =

    yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)
  36. def gen(arg): g1 = yield arg * 2 g2 =

    yield g1 + 1 yield g2 / 2 def caller(): obj = gen(10) v1 = obj.next() v2 = obj.send(v1 - 10) return obj.send(v2 + 1)
  37. Back to Asynq...

  38. Back to Asynq... Design

  39. def render_name(uid): name = yield Future(‘user-name:’ + uid) return ‘<b>’

    + name + ‘</b>’
  40. @async() def render_name(uid): name = yield Future(‘user-name:’ + uid) return

    ‘<b>’ + name + ‘</b>’
  41. @async() def render_name(uid): name = yield Future(user-name:’ + uid) return

    ‘<b>’ + name + ‘</b>’ def scheduler(): gen = render_name(uid) obj = gen.next() val = mc.get(obj.key) gen.send(val)
  42. Applications

  43. def render_profile(uid): name = render_name(uid) photo = render_profile_photo(uid) bio =

    render_bio(uid) follow_button = render_follow_button(uid) ...
  44. @async() def render_profile(uid): (photo, name, bio, follow_button) = yield (

    render_profile_photo(uid), render_name(uid), render_bio(uid), render_follow_button(uid) )
  45. render_profile(uid) render_photo(uid) render_name(uid) render_bio(uid) mc.get(‘profile-img’) mc.get(‘name’) mc.get(‘bio’)

  46. • Batching logically-distinct units of code • Batching email pipelines

    • … your application?
  47. None
  48. Side Notes

  49. • Relationship to asyncio ◦ More constrained API for users

    ◦ Minority of our code is the scheduler, which we originally built in python2.7 and could largely be replaced with asyncio
  50. • Adventures with making this intuitive ◦ At Quora, developers

    with a wide variety of backgrounds work in our Python codebase, including designers with relatively little coding experience. ◦ Several API decisions were made to make using this as easy or easier than priming ◦ Fun story about returning from generators
  51. • Migrating huge codebase to new data fetching API ◦

    Made heavy use of static-analysis/AST-based auto migration scripts ◦ Saved 50%+ time (including developing those scripts) vs. an estimated manual approach
  52. Learn More ...

  53. • https://engineering.quora.com/Asynchronous-P rogramming-in-Python • https://github.com/quora/asynq

  54. Q & A