Slide 1

Slide 1 text

Connecting Applications to Distributed Systems Sean Cribbs @seancribbs Just Open a Socket Tuesday, May 14, 13

Slide 2

Slide 2 text

http://www.quora.com/What-is-the-background-of-the-just-open-a-socket-meme Tuesday, May 14, 13

Slide 3

Slide 3 text

Tuesday, May 14, 13

Slide 4

Slide 4 text

Tuesday, May 14, 13

Slide 5

Slide 5 text

Tuesday, May 14, 13

Slide 6

Slide 6 text

Tuesday, May 14, 13

Slide 7

Slide 7 text

Tuesday, May 14, 13

Slide 8

Slide 8 text

TCP slow-start window scaling fast-retransmit Nagle’s algorithm delayed ACK MTU incast exponential backo! connection states thread safety starvation resource control deadlock Tuesday, May 14, 13

Slide 9

Slide 9 text

“The more you know, you know you don't know shit” - Ben Folds, “Bastard” Tuesday, May 14, 13

Slide 10

Slide 10 text

“The more you know, you know you don't know shit” - Ben Folds, “Bastard” “...so why you gotta act like you know, when you don’t know?” Tuesday, May 14, 13

Slide 11

Slide 11 text

http://thecatchandthehatch.com/pages/river-resources/ Islands in the Stream Tuesday, May 14, 13

Slide 12

Slide 12 text

Streaming Ops client server stream_me Tuesday, May 14, 13

Slide 13

Slide 13 text

Streaming Ops client server stream_me result result result done Tuesday, May 14, 13

Slide 14

Slide 14 text

Streaming Ops client server list-keys & MapReduce stream_me result result result done Tuesday, May 14, 13

Slide 15

Slide 15 text

Stream in Ruby # Request a streamed operation client.stream_something do |result| process(result) end Tuesday, May 14, 13

Slide 16

Slide 16 text

Stream in Ruby # Request a streamed operation client.stream_something do |result| process(result) end # Stream via curb if block_given? curl.on_body {|c| yield c; c.size } else curl.on_body # Clear out the callback end curl.http(method) # Perform request Tuesday, May 14, 13

Slide 17

Slide 17 text

Curl::Err::MultiBadEasyHandle: Invalid easy handle Tuesday, May 14, 13

Slide 18

Slide 18 text

Curling Back def curl @curl ||= Curl::Easy.new end Tuesday, May 14, 13

Slide 19

Slide 19 text

Curling Back def curl @curl ||= Curl::Easy.new end def curl Thread.current[:curl] ||= Curl::Easy.new end Tuesday, May 14, 13

Slide 20

Slide 20 text

Tuesday, May 14, 13

Slide 21

Slide 21 text

Re-entrant "In computing, a computer program or subroutine is called reentrant if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocation's complete execution." Wikipedia Tuesday, May 14, 13

Slide 22

Slide 22 text

Even in a single thread, I was concurrent. I had to remove the implicit “global state”. Tuesday, May 14, 13

Slide 23

Slide 23 text

Use multiple connections. Distributed systems were designed for this! Tuesday, May 14, 13

Slide 24

Slide 24 text

Ensure safety. Don’t let a connection be reused while processing. Tuesday, May 14, 13

Slide 25

Slide 25 text

Bin-packing Tuesday, May 14, 13

Slide 26

Slide 26 text

Problem • Customer claims secondary indexes were slower on 1.2 by 20% • Our benchmarks: inconclusive • Customer uses Python and Protobu!s (2I over MapReduce) Tuesday, May 14, 13

Slide 27

Slide 27 text

Ryan Zezeski DTrace showed most new work was in PBC interface Call  counts:                                                dict:find/2              3                                lists:keyreplace3/4              3                  prim_inet:encode_opt_val/1              3      riak_api_pb_server:handle_info/2              3 riak_api_pb_server:process_stream/5              3                    riak_index:parse_fields/1              3                                      riak_pipe:exec/2              3                                      sets:on_bucket/3              3                                riak_kv_pb:decode/3              6 riak_kv_mapred_term:parse_request/1              9                      riak_index:parse_field/3            12                                            lists:foldl/3            18    string:'-­‐to_lower/1-­‐lc$^0/1-­‐0-­‐'/1            66                                              dict:fetch/2      23487                              dict:store_bkt_val/3      23487                                riak_kv_pb:iolist/2      70443                  protobuffs:encode_varint/2      93918 Total  Time  (ms):                                lists:keyreplace3/4            0                                      sets:on_bucket/3            0 riak_api_pb_server:process_stream/5            0                                            lists:foldl/3            0                      riak_index:parse_field/3            0                                                dict:find/2            0      riak_api_pb_server:handle_info/2            0 riak_kv_mapred_term:parse_request/1            0                                      riak_pipe:exec/2            0    string:'-­‐to_lower/1-­‐lc$^0/1-­‐0-­‐'/1            0                                riak_kv_pb:decode/3            0                    riak_index:parse_fields/1            1                              dict:store_bkt_val/3        195                                              dict:fetch/2        224                  protobuffs:encode_varint/2        998                                riak_kv_pb:iolist/2      5685 Tuesday, May 14, 13

Slide 28

Slide 28 text

http://sidoxia.files.wordpress.com/2011/09/whack-a-mole.jpg Performance Whack-a-Mole Tuesday, May 14, 13

Slide 29

Slide 29 text

http://sidoxia.files.wordpress.com/2011/09/whack-a-mole.jpg Andrew Thompson Performance Whack-a-Mole Tuesday, May 14, 13

Slide 30

Slide 30 text

http://sidoxia.files.wordpress.com/2011/09/whack-a-mole.jpg Andrew Thompson Performance Whack-a-Mole Tuesday, May 14, 13

Slide 31

Slide 31 text

Mole #1: iolists • Representations of character streams • Lists (linked) • Binaries (byte arrays/bu!ers) •list_to_binary/1 binary_to_list/1 iolist_to_binary/1 • Code called iolist_to_binary dozens of times! Tuesday, May 14, 13

Slide 32

Slide 32 text

Mole #2: dict • dict is a hash-map data structure used in the server code to track request state {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[], [],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[]}}} • Ine"cient for small key counts! Updates require copying. Tuesday, May 14, 13

Slide 33

Slide 33 text

Mole #3: gen_tcp • Server code called gen_tcp:send() MANY times per response. • Each call took ~30ms, with very small payloads! Tuesday, May 14, 13

Slide 34

Slide 34 text

Eureka! • MapReduce results were tiny, so TCP sends were tiny! • Paid cost to call TCP stack with little data (and probably Nagle) Tuesday, May 14, 13

Slide 35

Slide 35 text

MTU = Maximum Transmission Unit How many bytes can I send in a single frame? Bu!er outgoing messages up to 1KB or an empty message queue. Performance improved to pre-1.2 levels! Tuesday, May 14, 13

Slide 36

Slide 36 text

Understand the network stack. TCP works, if you know what you’re doing. Tuesday, May 14, 13

Slide 37

Slide 37 text

All Aboard the Failed Request! Tuesday, May 14, 13

Slide 38

Slide 38 text

Distributed systems fail in strange ways. Tuesday, May 14, 13

Slide 39

Slide 39 text

Failures • Down node • Slow node • Inter-node delay • Dropped packets • Flappy links • Incast • Corruption • Logical failures Tuesday, May 14, 13

Slide 40

Slide 40 text

It’s really hard for a client to tell the di!erence! Tuesday, May 14, 13

Slide 41

Slide 41 text

System/Net Errors R Tuesday, May 14, 13

Slide 42

Slide 42 text

System/Net Errors Refused R Tuesday, May 14, 13

Slide 43

Slide 43 text

System/Net Errors Refused Aborted R Tuesday, May 14, 13

Slide 44

Slide 44 text

System/Net Errors Refused Aborted Reset R Tuesday, May 14, 13

Slide 45

Slide 45 text

System/Net Errors Refused Aborted Reset Network unreachable R Tuesday, May 14, 13

Slide 46

Slide 46 text

System/Net Errors Refused Aborted Reset Network unreachable Network down R Tuesday, May 14, 13

Slide 47

Slide 47 text

System/Net Errors Refused Aborted Reset Network unreachable Network down Kernel/errno R Tuesday, May 14, 13

Slide 48

Slide 48 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno R Tuesday, May 14, 13

Slide 49

Slide 49 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found R Tuesday, May 14, 13

Slide 50

Slide 50 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet R Tuesday, May 14, 13

Slide 51

Slide 51 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors R Tuesday, May 14, 13

Slide 52

Slide 52 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors Bad requests R Tuesday, May 14, 13

Slide 53

Slide 53 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored R Tuesday, May 14, 13

Slide 54

Slide 54 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout R Tuesday, May 14, 13

Slide 55

Slide 55 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout Retry R Tuesday, May 14, 13

Slide 56

Slide 56 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout Retry Retry if idem. R Tuesday, May 14, 13

Slide 57

Slide 57 text

System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout Retry Retry if idem. R Retry later? Tuesday, May 14, 13

Slide 58

Slide 58 text

Prepare for failure. Distinguish between failure types and recover appropriately. Don’t retry inde#nitely. Tuesday, May 14, 13

Slide 59

Slide 59 text

Spread the Load Tuesday, May 14, 13

Slide 60

Slide 60 text

client Tuesday, May 14, 13

Slide 61

Slide 61 text

client Tuesday, May 14, 13

Slide 62

Slide 62 text

client client Tuesday, May 14, 13

Slide 63

Slide 63 text

client client client Tuesday, May 14, 13

Slide 64

Slide 64 text

client client client Tuesday, May 14, 13

Slide 65

Slide 65 text

client proxy LB Tuesday, May 14, 13

Slide 66

Slide 66 text

client Tuesday, May 14, 13

Slide 67

Slide 67 text

Tuesday, May 14, 13

Slide 68

Slide 68 text

Spread connections around. If you have more than one of something, use them all. Tuesday, May 14, 13

Slide 69

Slide 69 text

Summary • Open multiple connections, protect them from concurrent use. • Know the structure of the network stack. • Categorize failures and decide whether and how to recover. • Connect to as many nodes as you can. Tuesday, May 14, 13

Slide 70

Slide 70 text

On Passivity Tuesday, May 14, 13

Slide 71

Slide 71 text

The Observer Effect Tuesday, May 14, 13

Slide 72

Slide 72 text

Tuesday, May 14, 13

Slide 73

Slide 73 text

Tuesday, May 14, 13

Slide 74

Slide 74 text

Tuesday, May 14, 13

Slide 75

Slide 75 text

Not here Tuesday, May 14, 13

Slide 76

Slide 76 text

Here! Tuesday, May 14, 13

Slide 77

Slide 77 text

Thanks! #clients #riconeast @seancribbs Tuesday, May 14, 13