Slide 1

Slide 1 text

LESSONS LEARNED WITH ASYNCIO (“LOOK MA, I WROTE A DISTRIBUTED HASH TABLE!”) / Nicholas H.Tollervey @ntoll

Slide 2

Slide 2 text

WHAT DOES ASYNCIO DO..?

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

A PROBLEM CLEARLY STATED: Messages arrive and depart via the network at unpredictable times - asyncio lets you deal with such interactions simultaneously.

Slide 6

Slide 6 text

WHAT IS A DISTRIBUTED HASH TABLE?

Slide 7

Slide 7 text

HASH TABLE = DICT (IN PYTHON) > > > h o m e = { } > > > h o m e [ ' n t o l l ' ] = ' T o w c e s t e r ' > > > h o m e [ ' v o i d s p a c e ' ] = ' B u g b r o o k e ' > > > h o m e [ ' p i n n e r ' ] = ' C o v e n t r y ' > > > h o m e { ' n t o l l ' : ' T o w c e s t e r ' , ' v o i d s p a c e ' : ' B u g b r o o k e ' , ' p i n n e r ' : ' C o v e n t r y ' } > > > h o m e [ ' n t o l l ' ] ' T o w c e s t e r ' A very simple key / value data store.

Slide 8

Slide 8 text

DISTRIBUTED

Slide 9

Slide 9 text

DECENTRALIZED

Slide 10

Slide 10 text

A DISTRIBUTED HASH TABLE (DHT) IS A PEER-TO-PEER KEY / VALUE DATA STORE

Slide 11

Slide 11 text

HOW?

Slide 12

Slide 12 text

CORE CONCEPT #1 THE EVENT LOOP

Slide 13

Slide 13 text

(Based on real events - participants have been replaced by unreasonably happy actors)

Slide 14

Slide 14 text

IMPORTANT! PEP 315 states that callbacks are... “[...] strictly serialized: one callback must finish before the next one will be called. This is an important guarantee: when two or more callbacks use or modify shared state, each callback is guaranteed that while it is running, the shared state isn't changed by another callback.”

Slide 15

Slide 15 text

HANG ON A MINUTE..? THAT DOESN'T SOUND VERY CONCURRENT!

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

CONCURRENT TASKS INTERFERE WITH SHARED RESOURCES 1. Task A reads a record. 2. Task B reads a record. 3. Both A and B change the retrieved data in different ways. 4. Task B writes its changes. 5. Task A writes its changes. Task A overwrites the record containing task B's changes.

Slide 18

Slide 18 text

ACT SYNCHRONOUSLY TO AVOID INTERFERENCE! 1. First do A, then B followed by C (and so on). 2. Easy to understand and deterministic. 3. What happens if A needs to wait for something, for example, a reply from a machine on the network? 4. The program waits until A's network call completes. It can't get on with other stuff while waiting for A. :-(

Slide 19

Slide 19 text

WELCOME TO THE MOST IMPORTANT SLIDE OF THIS TALK The program does not wait for a reply from network calls before continuing. Programmers define callbacks to be run when the result of a network call is known. In the meantime the program continues to poll for and respond to other network related I/O events. Callbacks execute during the iteration of the event loop immediately after the expected network I/O event is detected.

Slide 20

Slide 20 text

CONFUSED..? DON'T BE, ITS EXACTLY HOW HUMANS THINK ABOUT CONCURRENCY.

Slide 21

Slide 21 text

We make plans: when the washing machine finishes, take the clothes and hang them out to dry.

Slide 22

Slide 22 text

As humans we work on concurrent tasks (like preparing breakfast) in a similar non-blocking manner.

Slide 23

Slide 23 text

a s y n c i o avoids potentially confusing and complicated “threaded” concurrency while retaining the benefits of strictly sequential code.

Slide 24

Slide 24 text

QUESTIONS: How are asynchronous concurrent tasks created? How do such tasks pause while waiting for non-blocking network based I/O? How are callbacks defined (to handle the eventual result)? You need to understand coroutines, futures and tasks.

Slide 25

Slide 25 text

CORE CONCEPT #2 COROUTINES (Are FUN!)

Slide 26

Slide 26 text

Coroutines are generators They may be suspended (yield from) They 'yield from' other objects At the end of the chain is an object that returns a result or raises an exception

Slide 27

Slide 27 text

@ a s y n c i o . c o r o u t i n e d e f h a n d l e _ r e q u e s t ( s e l f , m e s s a g e , p a y l o a d ) : " " " H a n d l e a n i n c o m i n g H T T P r e q u e s t . " " " r e s p o n s e _ c o d e = 4 0 5 # M e t h o d N o t A l l o w e d r e s p o n s e _ d a t a = N o n e i f m e s s a g e . m e t h o d = = ' P O S T ' : t r y : r a w _ d a t a = y i e l d f r o m p a y l o a d . r e a d ( ) r e s p o n s e _ d a t a = y i e l d f r o m s e l f . p r o c e s s _ d a t a ( r a w _ d a t a ) r e s p o n s e _ c o d e = 2 0 0 # O K e x c e p t E x c e p t i o n a s e x : # L o g a l l e r r o r s l o g . e r r o r ( e x ) r e s p o n s e _ c o d e = 5 0 0 # I n t e r n a l S e r v e r E r r o r # e t c . . . r e t u r n r e s p o n s e

Slide 28

Slide 28 text

BUT WHAT ABOUT CALLBACKS? How do I handle the result of a coroutine?

Slide 29

Slide 29 text

CORE CONCEPTS #3 & #4 FUTURES AND TASKS (Are also FUN!)

Slide 30

Slide 30 text

d e f h a n d l e _ r e s o l v e d _ f u t u r e ( f u t u r e ) : " " " T h i s f u n c t i o n i s a c a l l b a c k . I t s o n l y a r g u m e n t i s t h e r e s o l v e d f u t u r e w h o s e r e s u l t i t l o g s . " " " l o g . i n f o ( f u t u r e . r e s u l t ( ) ) # I n s t a n t i a t e t h e f u t u r e w e ' r e g o i n g t o u s e t o r e p r e s e n t t h e # a s ­ y e t u n k n o w n r e s u l t . m y _ f u t u r e = a s y n c i o . F u t u r e ( ) # A d d t h e c a l l b a c k t o t h e l i s t o f t h i n g s t o d o w h e n t h e # r e s u l t i s k n o w n ( t h e f u t u r e i s r e s o l v e d ) . m y _ f u t u r e . a d d _ d o n e _ c a l l b a c k ( h a n d l e _ r e s o l v e d _ f u t u r e ) (Time passes) # i n s o m e c o r o u t i n e t h a t h a s t h e F u t u r e r e f e r e n c e d m y _ f u t u r e . s e t _ r e s u l t ( ' A r e s u l t s e t s o m e t i m e l a t e r ! ' )

Slide 31

Slide 31 text

d e f h a n d l e _ r e s o l v e d _ t a s k ( t a s k ) : " " " T h i s f u n c t i o n i s a c a l l b a c k . I t s o n l y a r g u m e n t i s t h e r e s o l v e d t a s k w h o s e r e s u l t i t l o g s . " " " l o g . i n f o ( t a s k . r e s u l t ( ) ) t a s k = a s y n c i o . T a s k ( s l o w _ c o r o u t i n e _ o p e r a t i o n ( ) ) t a s k . a d d _ d o n e _ c a l l b a c k ( h a n d l e _ r e s o l v e d _ t a s k ) l o o p = a s y n c i o . g e t _ e v e n t _ l o o p ( ) t r y : l o o p . r u n _ u n t i l _ c o m p l e t e ( t a s k ) f i n a l l y : l o o p . c l o s e ( ) No need to resolve the task in a coroutine!

Slide 32

Slide 32 text

FIRST CLASS FUNCTIONS m y _ f u t u r e . a d d _ d o n e _ c a l l b a c k ( h a n d l e _ r e s o l v e d _ f u t u r e ) FIRST CLASS FUNCTION CALLS a d d _ g e n e r i c _ c a l l b a c k s _ t o ( m y _ f u t u r e _ o r _ t a s k )

Slide 33

Slide 33 text

RECAP... (THE STORY SO FAR)

Slide 34

Slide 34 text

A DHT EXAMPLE HASHING, DISTANCE AND LOOKUPS

Slide 35

Slide 35 text

A CLOCK FACE OF NODES

Slide 36

Slide 36 text

NODE ID IS DERIVED FROM A HASH AND INDICATES ITS LOCATION

Slide 37

Slide 37 text

ITEMS ARE A KEY / VALUE PAIR > > > f r o m h a s h l i b i m p o r t s h a 5 1 2 > > > i t e m = { . . . ' m y _ k e y ' : ' S o m e v a l u e I w a n t t o s t o r e ' . . . } > > > s h a 5 1 2 ( ' m y _ k e y ' ) . h e x d i g e s t ( ) ' 1 7 6 b 1 c 6 5 a 5 8 c 6 9 b b 8 3 c f 0 f 9 e 0 6 6 9 5 c 4 0 9 4 b c 3 5 e 6 9 f 2 5 7 6 4 6 4 a 0 2 7 f a 5 2 f a 5 3 a 7 a b 3 5 c 2 b 4 a 3 9 2 0 3 a f f 9 8 6 0 6 a e d 6 4 1 f 4 5 a b b c 0 d 3 9 d 2 b e 0 7 2 3 f 4 4 c c 0 4 e 9 b 3 e 7 e 0 f 8 7 '

Slide 38

Slide 38 text

AARDVARK BELONGS...

Slide 39

Slide 39 text

... UNDER "A"

Slide 40

Slide 40 text

BUT, ZEBRA BELONGS...

Slide 41

Slide 41 text

... UNDER "Z"

Slide 42

Slide 42 text

TRACKING VIA THE ROUTING TABLE

Slide 43

Slide 43 text

INTERACTIONS GIVE TRACKING DATA (ID, IP address and port etc...)

Slide 44

Slide 44 text

PEERS STORED IN FIXED SIZE BUCKETS

Slide 45

Slide 45 text

SIMPLE RULES For the purposes of housekeeping: Reply with a value or X closest peers Ignore unresponsive peers Refresh the Routing Table Re-publish items etc...

Slide 46

Slide 46 text

GET() & SET() REQUIRE A LOOKUP. All interactions are asynchronous. Lookups are also parallel (concurrent).

Slide 47

Slide 47 text

RECURSIVE LOOKUP

Slide 48

Slide 48 text

SIX DEGREES OF SEPARATION

Slide 49

Slide 49 text

ASK CLOSEST KNOWN PEERS

Slide 50

Slide 50 text

THEY REPLY WITH CLOSER PEERS

Slide 51

Slide 51 text

THEY REPLY WITH THE TARGET

Slide 52

Slide 52 text

GET() & SET() REQUIRE A LOOKUP. All interactions are asynchronous. Lookups are also parallel (concurrent).

Slide 53

Slide 53 text

LOOKUP IS A FUTURE c l a s s L o o k u p ( a s y n c i o . F u t u r e ) : " " " E n c a p s u l a t e s a l o o k u p i n t h e D H T g i v e n a p a r t i c u l a r t a r g e t k e y a n d m e s s a g e t y p e . W i l l r e s o l v e w h e n a r e s u l t i s f o u n d o r e r r b a c k o t h e r w i s e . " " " d e f _ _ i n i t _ _ ( s e l f , k e y , m e s s a g e _ t y p e , n o d e , e v e n t _ l o o p ) : " " " k e y ­ s h a 5 1 2 o f t a r g e t k e y . m e s s a g e _ t y p e ­ c l a s s t o c r e a t e i n t e r ­ n o d e m e s s a g e s . n o d e ­ t h e l o c a l n o d e i n t h e D H T . e v e n t _ l o o p ­ t h e e v e n t l o o p . " " " . . . e t c . . .

Slide 54

Slide 54 text

LOOKUP IS A FUTURE m y _ l o o k u p = L o o k u p ( k e y , F i n d V a l u e , m y _ n o d e , m y _ e v e n t _ l o o p ) d e f g o t _ r e s u l t ( l o o k u p ) : " " " N a i v e c a l l b a c k " " " r e s u l t = l o o k u p . r e s u l t ( ) i f i s i n s t a n c e ( l o o k u p . m e s s a g e _ t y p e , F i n d V a l u e ) : f o r r e m o t e _ n o d e i n r e s u l t : # r e s u l t i s a l i s t o f c l o s e s t n o d e s t o " k e y " . # P U T t h e v a l u e a t t h e s e n o d e s . . . . e t c . . . e l s e : # r e s u l t i s a v a l u e s t o r e d a t t h e l o c a t i o n o f " k e y " . . . e t c . . . m y _ l o o k u p . a d d _ d o n e _ c a l l b a c k ( g o t _ r e s u l t )

Slide 55

Slide 55 text

WHAT ABOUT NETWORKING?

Slide 56

Slide 56 text

CORE CONCEPTS #5 & #6 TRANSPORTS AND PROTOCOLS (Are also a lot of FUN!)

Slide 57

Slide 57 text

TRANSPORTS

Slide 58

Slide 58 text

PROTOCOLS

Slide 59

Slide 59 text

c l a s s N e t s t r i n g P r o t o c o l ( a s y n c i o . P r o t o c o l ) : " " " h t t p : / / c r . y p . t o / p r o t o / n e t s t r i n g s . t x t " " " d e f d a t a _ r e c e i v e d ( s e l f , d a t a ) : " " " C a l l e d w h e n e v e r t h e l o c a l n o d e r e c e i v e s d a t a f r o m t h e r e m o t e p e e r . " " " s e l f . _ _ d a t a = d a t a t r y : w h i l e s e l f . _ _ d a t a : i f s e l f . _ r e a d e r _ s t a t e = = D A T A : s e l f . h a n d l e _ d a t a ( ) e l i f s e l f . _ r e a d e r _ s t a t e = = C O M M A : s e l f . h a n d l e _ c o m m a ( ) e l i f s e l f . _ r e a d e r _ s t a t e = = L E N G T H : s e l f . h a n d l e _ l e n g t h ( ) e l s e : m s g = ' I n v a l i d N e t s t r i n g m o d e ' r a i s e R u n t i m e E r r o r ( m s g ) e x c e p t N e t s t r i n g P a r s e E r r o r : s e l f . t r a n s p o r t . c l o s e ( )

Slide 60

Slide 60 text

FINAL THOUGHTS... Twisted..? 100% unit test coverage DHT < 1000 loc IO vs CPU bound

Slide 61

Slide 61 text

FIN github.com/ntoll/drogulus twitter.com/ntoll

Slide 62

Slide 62 text

QUESTIONS..? github.com/ntoll/drogulus twitter.com/ntoll