Nicholas Tollervey - Lessons learned with asyncio ("Look ma, I wrote a distributed hash table!")

Nicholas Tollervey - Lessons learned with asyncio ("Look ma, I wrote a distributed hash table!")

This talk introduces the asyncio module. I'll cover what it's for, how it works and describe how I used it to write a real-world networked application (a distributed hash table).
We'll explore the event loop, co-routines, futures and networking with examples from my code.
This won't be an exhaustive exposition. Rather, attendees will grasp enough of asyncio to continue with their own studies.

https://us.pycon.org/2015/schedule/presentation/387/

D5710b3bca38f1233274b4cbc523dc4b?s=128

PyCon 2015

April 18, 2015
Tweet

Transcript

  1. LESSONS LEARNED WITH ASYNCIO (“LOOK MA, I WROTE A DISTRIBUTED

    HASH TABLE!”) / Nicholas H.Tollervey @ntoll
  2. WHAT DOES ASYNCIO DO..?

  3. None
  4. None
  5. A PROBLEM CLEARLY STATED: Messages arrive and depart via the

    network at unpredictable times - asyncio lets you deal with such interactions simultaneously.
  6. WHAT IS A DISTRIBUTED HASH TABLE?

  7. HASH TABLE = DICT (IN PYTHON) > > > h

    o m e = { } > > > h o m e [ ' n t o l l ' ] = ' T o w c e s t e r ' > > > h o m e [ ' v o i d s p a c e ' ] = ' B u g b r o o k e ' > > > h o m e [ ' p i n n e r ' ] = ' C o v e n t r y ' > > > h o m e { ' n t o l l ' : ' T o w c e s t e r ' , ' v o i d s p a c e ' : ' B u g b r o o k e ' , ' p i n n e r ' : ' C o v e n t r y ' } > > > h o m e [ ' n t o l l ' ] ' T o w c e s t e r ' A very simple key / value data store.
  8. DISTRIBUTED

  9. DECENTRALIZED

  10. A DISTRIBUTED HASH TABLE (DHT) IS A PEER-TO-PEER KEY /

    VALUE DATA STORE
  11. HOW?

  12. CORE CONCEPT #1 THE EVENT LOOP

  13. (Based on real events - participants have been replaced by

    unreasonably happy actors)
  14. IMPORTANT! PEP 315 states that callbacks are... “[...] strictly serialized:

    one callback must finish before the next one will be called. This is an important guarantee: when two or more callbacks use or modify shared state, each callback is guaranteed that while it is running, the shared state isn't changed by another callback.”
  15. HANG ON A MINUTE..? THAT DOESN'T SOUND VERY CONCURRENT!

  16. None
  17. CONCURRENT TASKS INTERFERE WITH SHARED RESOURCES 1. Task A reads

    a record. 2. Task B reads a record. 3. Both A and B change the retrieved data in different ways. 4. Task B writes its changes. 5. Task A writes its changes. Task A overwrites the record containing task B's changes.
  18. ACT SYNCHRONOUSLY TO AVOID INTERFERENCE! 1. First do A, then

    B followed by C (and so on). 2. Easy to understand and deterministic. 3. What happens if A needs to wait for something, for example, a reply from a machine on the network? 4. The program waits until A's network call completes. It can't get on with other stuff while waiting for A. :-(
  19. WELCOME TO THE MOST IMPORTANT SLIDE OF THIS TALK The

    program does not wait for a reply from network calls before continuing. Programmers define callbacks to be run when the result of a network call is known. In the meantime the program continues to poll for and respond to other network related I/O events. Callbacks execute during the iteration of the event loop immediately after the expected network I/O event is detected.
  20. CONFUSED..? DON'T BE, ITS EXACTLY HOW HUMANS THINK ABOUT CONCURRENCY.

  21. We make plans: when the washing machine finishes, take the

    clothes and hang them out to dry.
  22. As humans we work on concurrent tasks (like preparing breakfast)

    in a similar non-blocking manner.
  23. a s y n c i o avoids potentially confusing

    and complicated “threaded” concurrency while retaining the benefits of strictly sequential code.
  24. QUESTIONS: How are asynchronous concurrent tasks created? How do such

    tasks pause while waiting for non-blocking network based I/O? How are callbacks defined (to handle the eventual result)? You need to understand coroutines, futures and tasks.
  25. CORE CONCEPT #2 COROUTINES (Are FUN!)

  26. Coroutines are generators They may be suspended (yield from) They

    'yield from' other objects At the end of the chain is an object that returns a result or raises an exception
  27. @ a s y n c i o . c

    o r o u t i n e d e f h a n d l e _ r e q u e s t ( s e l f , m e s s a g e , p a y l o a d ) : " " " H a n d l e a n i n c o m i n g H T T P r e q u e s t . " " " r e s p o n s e _ c o d e = 4 0 5 # M e t h o d N o t A l l o w e d r e s p o n s e _ d a t a = N o n e i f m e s s a g e . m e t h o d = = ' P O S T ' : t r y : r a w _ d a t a = y i e l d f r o m p a y l o a d . r e a d ( ) r e s p o n s e _ d a t a = y i e l d f r o m s e l f . p r o c e s s _ d a t a ( r a w _ d a t a ) r e s p o n s e _ c o d e = 2 0 0 # O K e x c e p t E x c e p t i o n a s e x : # L o g a l l e r r o r s l o g . e r r o r ( e x ) r e s p o n s e _ c o d e = 5 0 0 # I n t e r n a l S e r v e r E r r o r # e t c . . . r e t u r n r e s p o n s e
  28. BUT WHAT ABOUT CALLBACKS? How do I handle the result

    of a coroutine?
  29. CORE CONCEPTS #3 & #4 FUTURES AND TASKS (Are also

    FUN!)
  30. d e f h a n d l e _

    r e s o l v e d _ f u t u r e ( f u t u r e ) : " " " T h i s f u n c t i o n i s a c a l l b a c k . I t s o n l y a r g u m e n t i s t h e r e s o l v e d f u t u r e w h o s e r e s u l t i t l o g s . " " " l o g . i n f o ( f u t u r e . r e s u l t ( ) ) # I n s t a n t i a t e t h e f u t u r e w e ' r e g o i n g t o u s e t o r e p r e s e n t t h e # a s ­ y e t u n k n o w n r e s u l t . m y _ f u t u r e = a s y n c i o . F u t u r e ( ) # A d d t h e c a l l b a c k t o t h e l i s t o f t h i n g s t o d o w h e n t h e # r e s u l t i s k n o w n ( t h e f u t u r e i s r e s o l v e d ) . m y _ f u t u r e . a d d _ d o n e _ c a l l b a c k ( h a n d l e _ r e s o l v e d _ f u t u r e ) (Time passes) # i n s o m e c o r o u t i n e t h a t h a s t h e F u t u r e r e f e r e n c e d m y _ f u t u r e . s e t _ r e s u l t ( ' A r e s u l t s e t s o m e t i m e l a t e r ! ' )
  31. d e f h a n d l e _

    r e s o l v e d _ t a s k ( t a s k ) : " " " T h i s f u n c t i o n i s a c a l l b a c k . I t s o n l y a r g u m e n t i s t h e r e s o l v e d t a s k w h o s e r e s u l t i t l o g s . " " " l o g . i n f o ( t a s k . r e s u l t ( ) ) t a s k = a s y n c i o . T a s k ( s l o w _ c o r o u t i n e _ o p e r a t i o n ( ) ) t a s k . a d d _ d o n e _ c a l l b a c k ( h a n d l e _ r e s o l v e d _ t a s k ) l o o p = a s y n c i o . g e t _ e v e n t _ l o o p ( ) t r y : l o o p . r u n _ u n t i l _ c o m p l e t e ( t a s k ) f i n a l l y : l o o p . c l o s e ( ) No need to resolve the task in a coroutine!
  32. FIRST CLASS FUNCTIONS m y _ f u t u

    r e . a d d _ d o n e _ c a l l b a c k ( h a n d l e _ r e s o l v e d _ f u t u r e ) FIRST CLASS FUNCTION CALLS a d d _ g e n e r i c _ c a l l b a c k s _ t o ( m y _ f u t u r e _ o r _ t a s k )
  33. RECAP... (THE STORY SO FAR)

  34. A DHT EXAMPLE HASHING, DISTANCE AND LOOKUPS

  35. A CLOCK FACE OF NODES

  36. NODE ID IS DERIVED FROM A HASH AND INDICATES ITS

    LOCATION
  37. ITEMS ARE A KEY / VALUE PAIR > > >

    f r o m h a s h l i b i m p o r t s h a 5 1 2 > > > i t e m = { . . . ' m y _ k e y ' : ' S o m e v a l u e I w a n t t o s t o r e ' . . . } > > > s h a 5 1 2 ( ' m y _ k e y ' ) . h e x d i g e s t ( ) ' 1 7 6 b 1 c 6 5 a 5 8 c 6 9 b b 8 3 c f 0 f 9 e 0 6 6 9 5 c 4 0 9 4 b c 3 5 e 6 9 f 2 5 7 6 4 6 4 a 0 2 7 f a 5 2 f a 5 3 a 7 a b 3 5 c 2 b 4 a 3 9 2 0 3 a f f 9 8 6 0 6 a e d 6 4 1 f 4 5 a b b c 0 d 3 9 d 2 b e 0 7 2 3 f 4 4 c c 0 4 e 9 b 3 e 7 e 0 f 8 7 '
  38. AARDVARK BELONGS...

  39. ... UNDER "A"

  40. BUT, ZEBRA BELONGS...

  41. ... UNDER "Z"

  42. TRACKING VIA THE ROUTING TABLE

  43. INTERACTIONS GIVE TRACKING DATA (ID, IP address and port etc...)

  44. PEERS STORED IN FIXED SIZE BUCKETS

  45. SIMPLE RULES For the purposes of housekeeping: Reply with a

    value or X closest peers Ignore unresponsive peers Refresh the Routing Table Re-publish items etc...
  46. GET() & SET() REQUIRE A LOOKUP. All interactions are asynchronous.

    Lookups are also parallel (concurrent).
  47. RECURSIVE LOOKUP

  48. SIX DEGREES OF SEPARATION

  49. ASK CLOSEST KNOWN PEERS

  50. THEY REPLY WITH CLOSER PEERS

  51. THEY REPLY WITH THE TARGET

  52. GET() & SET() REQUIRE A LOOKUP. All interactions are asynchronous.

    Lookups are also parallel (concurrent).
  53. LOOKUP IS A FUTURE c l a s s L

    o o k u p ( a s y n c i o . F u t u r e ) : " " " E n c a p s u l a t e s a l o o k u p i n t h e D H T g i v e n a p a r t i c u l a r t a r g e t k e y a n d m e s s a g e t y p e . W i l l r e s o l v e w h e n a r e s u l t i s f o u n d o r e r r b a c k o t h e r w i s e . " " " d e f _ _ i n i t _ _ ( s e l f , k e y , m e s s a g e _ t y p e , n o d e , e v e n t _ l o o p ) : " " " k e y ­ s h a 5 1 2 o f t a r g e t k e y . m e s s a g e _ t y p e ­ c l a s s t o c r e a t e i n t e r ­ n o d e m e s s a g e s . n o d e ­ t h e l o c a l n o d e i n t h e D H T . e v e n t _ l o o p ­ t h e e v e n t l o o p . " " " . . . e t c . . .
  54. LOOKUP IS A FUTURE m y _ l o o

    k u p = L o o k u p ( k e y , F i n d V a l u e , m y _ n o d e , m y _ e v e n t _ l o o p ) d e f g o t _ r e s u l t ( l o o k u p ) : " " " N a i v e c a l l b a c k " " " r e s u l t = l o o k u p . r e s u l t ( ) i f i s i n s t a n c e ( l o o k u p . m e s s a g e _ t y p e , F i n d V a l u e ) : f o r r e m o t e _ n o d e i n r e s u l t : # r e s u l t i s a l i s t o f c l o s e s t n o d e s t o " k e y " . # P U T t h e v a l u e a t t h e s e n o d e s . . . . e t c . . . e l s e : # r e s u l t i s a v a l u e s t o r e d a t t h e l o c a t i o n o f " k e y " . . . e t c . . . m y _ l o o k u p . a d d _ d o n e _ c a l l b a c k ( g o t _ r e s u l t )
  55. WHAT ABOUT NETWORKING?

  56. CORE CONCEPTS #5 & #6 TRANSPORTS AND PROTOCOLS (Are also

    a lot of FUN!)
  57. TRANSPORTS

  58. PROTOCOLS

  59. c l a s s N e t s t

    r i n g P r o t o c o l ( a s y n c i o . P r o t o c o l ) : " " " h t t p : / / c r . y p . t o / p r o t o / n e t s t r i n g s . t x t " " " d e f d a t a _ r e c e i v e d ( s e l f , d a t a ) : " " " C a l l e d w h e n e v e r t h e l o c a l n o d e r e c e i v e s d a t a f r o m t h e r e m o t e p e e r . " " " s e l f . _ _ d a t a = d a t a t r y : w h i l e s e l f . _ _ d a t a : i f s e l f . _ r e a d e r _ s t a t e = = D A T A : s e l f . h a n d l e _ d a t a ( ) e l i f s e l f . _ r e a d e r _ s t a t e = = C O M M A : s e l f . h a n d l e _ c o m m a ( ) e l i f s e l f . _ r e a d e r _ s t a t e = = L E N G T H : s e l f . h a n d l e _ l e n g t h ( ) e l s e : m s g = ' I n v a l i d N e t s t r i n g m o d e ' r a i s e R u n t i m e E r r o r ( m s g ) e x c e p t N e t s t r i n g P a r s e E r r o r : s e l f . t r a n s p o r t . c l o s e ( )
  60. FINAL THOUGHTS... Twisted..? 100% unit test coverage DHT < 1000

    loc IO vs CPU bound
  61. FIN github.com/ntoll/drogulus twitter.com/ntoll

  62. QUESTIONS..? github.com/ntoll/drogulus twitter.com/ntoll