Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Just Open a Socket - Connecting Applications to Distributed Systems

Just Open a Socket - Connecting Applications to Distributed Systems

Client-server programming is a discipline as old as computer networks and well-known. Just connect socket to the server and send some bytes back and forth, right?

Au contraire! Building reliable, robust client libraries and applications is actually quite difficult, and exposes a lot of classic distributed and concurrent programming problems. From understanding and manipulating the TCP/IP network stack, to multiplexing connections across worker threads, to handling partial failures, to juggling protocols and encodings, there are many different angles one must cover.

In this talk, we'll discuss how Basho has addressed these problems and others in our client libraries and server-side interfaces for Riak, and how being a good client means being a participant in the distributed system, rather than just a spectator.

Sean Cribbs

May 13, 2013
Tweet

More Decks by Sean Cribbs

Other Decks in Programming

Transcript

  1. TCP slow-start window scaling fast-retransmit Nagle’s algorithm delayed ACK MTU

    incast exponential backo! connection states thread safety starvation resource control deadlock Tuesday, May 14, 13
  2. “The more you know, you know you don't know shit”

    - Ben Folds, “Bastard” Tuesday, May 14, 13
  3. “The more you know, you know you don't know shit”

    - Ben Folds, “Bastard” “...so why you gotta act like you know, when you don’t know?” Tuesday, May 14, 13
  4. Stream in Ruby # Request a streamed operation client.stream_something do

    |result| process(result) end Tuesday, May 14, 13
  5. Stream in Ruby # Request a streamed operation client.stream_something do

    |result| process(result) end # Stream via curb if block_given? curl.on_body {|c| yield c; c.size } else curl.on_body # Clear out the callback end curl.http(method) # Perform request Tuesday, May 14, 13
  6. Curling Back def curl @curl ||= Curl::Easy.new end def curl

    Thread.current[:curl] ||= Curl::Easy.new end Tuesday, May 14, 13
  7. Re-entrant "In computing, a computer program or subroutine is called

    reentrant if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocation's complete execution." Wikipedia Tuesday, May 14, 13
  8. Even in a single thread, I was concurrent. I had

    to remove the implicit “global state”. Tuesday, May 14, 13
  9. Problem • Customer claims secondary indexes were slower on 1.2

    by 20% • Our benchmarks: inconclusive • Customer uses Python and Protobu!s (2I over MapReduce) Tuesday, May 14, 13
  10. Ryan Zezeski DTrace showed most new work was in PBC

    interface Call  counts:                                                dict:find/2              3                                lists:keyreplace3/4              3                  prim_inet:encode_opt_val/1              3      riak_api_pb_server:handle_info/2              3 riak_api_pb_server:process_stream/5              3                    riak_index:parse_fields/1              3                                      riak_pipe:exec/2              3                                      sets:on_bucket/3              3                                riak_kv_pb:decode/3              6 riak_kv_mapred_term:parse_request/1              9                      riak_index:parse_field/3            12                                            lists:foldl/3            18    string:'-­‐to_lower/1-­‐lc$^0/1-­‐0-­‐'/1            66                                              dict:fetch/2      23487                              dict:store_bkt_val/3      23487                                riak_kv_pb:iolist/2      70443                  protobuffs:encode_varint/2      93918 Total  Time  (ms):                                lists:keyreplace3/4            0                                      sets:on_bucket/3            0 riak_api_pb_server:process_stream/5            0                                            lists:foldl/3            0                      riak_index:parse_field/3            0                                                dict:find/2            0      riak_api_pb_server:handle_info/2            0 riak_kv_mapred_term:parse_request/1            0                                      riak_pipe:exec/2            0    string:'-­‐to_lower/1-­‐lc$^0/1-­‐0-­‐'/1            0                                riak_kv_pb:decode/3            0                    riak_index:parse_fields/1            1                              dict:store_bkt_val/3        195                                              dict:fetch/2        224                  protobuffs:encode_varint/2        998                                riak_kv_pb:iolist/2      5685 Tuesday, May 14, 13
  11. Mole #1: iolists • Representations of character streams • Lists

    (linked) • Binaries (byte arrays/bu!ers) •list_to_binary/1 binary_to_list/1 iolist_to_binary/1 • Code called iolist_to_binary dozens of times! Tuesday, May 14, 13
  12. Mole #2: dict • dict is a hash-map data structure

    used in the server code to track request state {dict,0,16,16,8,80,48, {[],[],[],[],[],[],[],[], [],[],[],[],[],[],[],[]}, {{[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[]}}} • Ine"cient for small key counts! Updates require copying. Tuesday, May 14, 13
  13. Mole #3: gen_tcp • Server code called gen_tcp:send() MANY times

    per response. • Each call took ~30ms, with very small payloads! Tuesday, May 14, 13
  14. Eureka! • MapReduce results were tiny, so TCP sends were

    tiny! • Paid cost to call TCP stack with little data (and probably Nagle) Tuesday, May 14, 13
  15. MTU = Maximum Transmission Unit How many bytes can I

    send in a single frame? Bu!er outgoing messages up to 1KB or an empty message queue. Performance improved to pre-1.2 levels! Tuesday, May 14, 13
  16. Understand the network stack. TCP works, if you know what

    you’re doing. Tuesday, May 14, 13
  17. Failures • Down node • Slow node • Inter-node delay

    • Dropped packets • Flappy links • Incast • Corruption • Logical failures Tuesday, May 14, 13
  18. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet R Tuesday, May 14, 13
  19. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors R Tuesday, May 14, 13
  20. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors Bad requests R Tuesday, May 14, 13
  21. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored R Tuesday, May 14, 13
  22. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout R Tuesday, May 14, 13
  23. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout Retry R Tuesday, May 14, 13
  24. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout Retry Retry if idem. R Tuesday, May 14, 13
  25. System/Net Errors Unexpected Results Refused Aborted Reset Network unreachable Network

    down Kernel/errno Not found Quorum unmet Quorum errors Bad requests Server-side code errored Timeout Retry Retry if idem. R Retry later? Tuesday, May 14, 13
  26. Spread connections around. If you have more than one of

    something, use them all. Tuesday, May 14, 13
  27. Summary • Open multiple connections, protect them from concurrent use.

    • Know the structure of the network stack. • Categorize failures and decide whether and how to recover. • Connect to as many nodes as you can. Tuesday, May 14, 13