Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stripe CTF3 wrap-up

Ce372660b174f71253f57c474788ada0?s=47 Stripe
January 30, 2014

Stripe CTF3 wrap-up

CTF3, Stripe's third Capture-the-Flag, focused on distributed systems engineering with a goal of learning to build fault-tolerant, performant software while playing around with a bunch of cool cutting-edge technologies.

More here: https://stripe.com/blog/ctf3-launch.

Ce372660b174f71253f57c474788ada0?s=128

Stripe

January 30, 2014
Tweet

Transcript

  1. Greg Brockman Andy Brody Christian Anderson Philipp Antoni Carl Jackson

    Jonas Schneider Siddarth Chandrasekaran Ludwig Pettersson Nelson Elhage Steve Woodrow Jorge Ortiz
  2. •  ~7.5k participants from 97 countries, 6 continents •  ~9.5k

    unique IP logins •  216 capturers Participation
  3. Solvers

  4. CAPTURE THE FLAG ARCHITECTURE Or, an illustrated guide to flowcharts

  5. Ø What’s the holy grail of scaling?

  6. Challenges Scaling up to unknown capacity - 53 instances, 800

    ECU at peak User isolation Reliability and Availability (*-abilities)
  7. git push: what happens?

  8. stripe-ctf.com. IN A gate gate (nginx, haproxy) submitter (poseidon) submitter

    (poseidon) submitter (poseidon) ctfdb (mongo) ctfdb (mongo) ctfdb (mongo) queue (RabbitMQ) . queue (RabbitMQ) colossus (LDAP, scoring) test case generator test case generator test case generator test case generator test case generator worker (docker) test case generator test case generator test case generator test case generator test case generator build (docker) gitcoin test case generator test case generator test case generator test case generator test case generator test case gen . ctfweb (sinatra) ctfweb (sinatra) ctfweb (sinatra) git push → lvl0-asdf@stripe-ctf.com:level0 https://stripe-ctf.com/
  9. What Went Wrong – containerization – garbage collection - containers,

    filesystems, disk space – system stability – bugs, misconfiguration
  10. What Went Right – containerization – service architecture - queueing,

    separation of roles – load balancing – horizontal scaling
  11. Level 0 The Mysterious Program

  12. Level 0: Sets

  13. Level 0: mmap •  mmap,munmap - map or unmap files

    or devices into memory •  mmap the dictionary into memory •  You can actually mmap stdin as well! •  Binary search!
  14. Level 0: Bloom filters •  Hash function: f(str) => int

    •  Look at the result of N hash functions. •  Probabilistic. •  False positives, but no false negatives. •  If you run into false positives, just push again!
  15. Level 0: Minimal perfect hashing •  Given dictionary D =

    {w₁, w₂, … wn} •  use MATH to generate a hash function f •  f : D → {0..n-1} is one-to-one •  aka every word hashes to a different small integer •  So you can build a no-collisions hash table •  CMPH - C Minimal Perfect Hashing Library •  Build this ahead of time, link it to the binary
  16. Level 1 Gitcoin

  17. Gitcoin commit 000000216ba61aecaafb11135ee43b1674855d6ff7 Author: Alyssa P Hacker <alyssa@example.com> Date: Wed

    Jan 22 14:10:15 2014 -0800 Give myself a Gitcoin! nonce: tahf8buC diff --git a/LEDGER.txt b/LEDGER.txt index 3890681..41980b2 100644 --- a/LEDGER.txt +++ b/LEDGER.txt@@ -7,3 +7,4 @@ carl: 30 gdb: 12 nelhage: 45 jorge: 30+user-aph123: 1
  18. Why crypto currencies? – distributed currency system – git security

    model – massively parallel problems – using the right tools for the job
  19. Git object model

  20. Git object model $ git cat-file -p 000000effe7d920b391a24633e7298469dcf51b5 tree 7da86a5b10ff6db916598b653ce63e1dc0cb73c8

    parent 0000000df4815161b72f4c5ed23e9fbf5deed922 author Alyssa P Hacker <alyssa@example.com> 1391129313 +0000 committer Alyssa P Hacker <alyssa@example.com> 1391129313 +0000 Mined a Gitcoin! nonce 0302d1e2 $ git show 000000 error: short SHA1 000000 is ambiguous. error: short SHA1 000000 is ambiguous.
  21. Remember wafflecopter? Want to do as few rounds of SHA1

    as possible Compute prefix, update only for nonce
  22. SHA1 is Embarrassingly Parallel – Each miner can be totally

    independent – Each miner requires little memory – Each miner requires little code
  23. Tools: a spoon (bash)

  24. Tools: a shovel (go)

  25. Tools: an army of backhoes (GPU)

  26. Tools: big machines (ASIC)

  27. Bonus Round Hash Rates: bash: 400 H/s our go miners:

    1.9 MH/s 100 cores EC2: 130 MH/s GPU: 1-2 GH/s Network: ~10 GH/s
  28. WE DID IT! Dogecoin is only at 80 GH/s

  29. Level 2 DDOS Defense

  30. Elephants and mice: a DDOS model

  31. Elephants and mice: a DDOS model

  32. The stub of a Node.js proxy

  33. Balance across the backends

  34. Balance across the backends 1. Round robin

  35. Balance across the backends 1. Round robin 2. Choose backend with min.

    load
  36. Balance across the backends 1. Round robin 2. Choose backend with min.

    load 3. Randomize
  37. Let the mice through

  38. Let the mice through 1. Reduce the overall load

  39. Let the mice through 1. Reduce the overall load 2. Use an

    off-the-shelf solution
  40. Let the mice through 1. Reduce the overall load 2. Use an

    off-the-shelf solution 3. Learn to recognize a mouse
  41. If you had a global view

  42. Recognize a mouse 1. Threshold rate or number 2. Learn by hand

    or with automation
  43. The top solutions ➔ Balance load in a simple way ➔ Learn

    to recognize a mouse ➔ Keep an eye on the backends
  44. Level 3 Instant Code Search

  45. We’re sorry about the Scala

  46. •  Text search over ~100M of text •  Arbitrary substring

    search (not just whole words) •  There is a “free” indexing stage •  Distribute across up to 4 nodes •  Each node is limited to 500M of RAM The problem
  47. Search 101: Inverted Index /tree/A: “the quick brown fox jumps

    over …” /tree/B: “the fox is red” “the” [A, B] “quick” [A] “brown” [A] “fox” [A, B] “red” [B]
  48. Search 102: Arbitrary Substring •  “trigram index” •  Store an

    inverted index of trigrams •  “the quick brown fox …” → “the”, “he_”, “e_q”, “_qu”, “qui”, “uic”, … •  To query, look up all trigrams in the search term, and intersect •  Search(“rown”) → index[“row”] ∩ index[“own”] ◦  Check each result to verify the match
  49. Sharding •  We give you four nodes •  …but they

    all run on the same physical node during grading •  And we didn’t resource-limit grading containers (other than memory) •  So you don’t actually get more CPU, disk I/ O, or memory bandwidth •  Sharding ended up not really mattering
  50. Winning the contest •  The spec is for arbitrary substring

    search •  But we only generate/query words from a dictionary •  Some words are substrings of other words… •  … but not too many •  Use an inverted index over dictionary words
  51. Handling substrings •  Option A ◦  substrings : word →

    [all words containing that word] ◦  index : word → [list of lines containing that full word] ◦  for word in substrings[query]: ◦  results += index[word] ◦  Can compute substrings table by brute search ◦  When indexing lines, just split into words •  Option B ◦  index : word → [all lines containing that word, including as a substring] ◦  Need to do the substring search as you index each line
  52. Other ways to beat the level •  Slurp the entire

    tree into RAM and use a decent string search ◦  (not java.lang.String.indexOf() -- that’s slow!) •  Shell out to grep ◦  GNU grep is fast
  53. Level 4 SQLCluster

  54. SQLCluster •  5 SQLite nodes •  Queries submitted to all

    nodes •  Random network and node failures •  Must maintain full linearizability of queries
  55. Octopus http://wallpaperswide.com/angry_octopus-wallpapers.html

  56. Octopus •  (Grumpy) network simulator •  Submits queries and checks

    for correctness •  Several “monkeys” manipulate the network: ◦  Netsplit monkey ◦  Lagsplit monkey ◦  SPOF monkey ◦  etc.
  57. Consensus Algorithms •  Raft (“In Search of an Understandable Consensus

    Algorithm”, Diego Ongaro and John Ousterhout, 2013) •  Zab (“Zab: High-performance broadcast for primary-backup systems”, Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini, 2011) •  Paxos (“The Part-Time Parliament”, Leslie Lamport, 1998, originally submitted 1990) •  Viewstamped Replication (“Viewstamped Replication: A New Primary Copy Method to Support Highly Available Distributed Systems”, Brian M. Oki and Barbara H. Liskov, 1988) etc. etc.
  58. Consensus Algorithms Almost everyone chose Raft https://github.com/goraft/raft

  59. Gotchas: Idempotency •  Octopus sends node1 a commit •  Node1

    forwards it to the leader, node0 •  Node0 processes it and sends it back •  Octopus kills the node0 㱻 node1 link vs. •  Octopus sends node1 a commit •  Node1 forwards it to the leader, node0 •  Octopus kills the node0 㱻 node1 link
  60. Gotchas: Idempotency •  How do you tell between these two

    cases? •  Naive: resubmit the query to find out! •  If the query was processed, return the old result •  Common trick: Idempotency tokens UPDATE ctf3 SET friendCount=friendCount+10, requestCount=requestCount+1, favoriteWord="jjfqcjamhpghnqq" WHERE name="carl"; SELECT * FROM ctf3
  61. Making it Fast •  Every top solution replaced sql.go • 

    Two main strategies: ◦  Write your own sqlite (or enough of it to pass) ◦  Use sqlite bindings, :memory: database •  These perform roughly equally well •  Raft has a few timers you can tune •  Golf network traffic
  62. Octopus Vulnerabilities 4 0 3 1 2 http://sweetclipart.com/octopus-line-art-756

  63. First Solve: Single Master 4 0 3 1 2

  64. Forward-to-Master 4 0 3 1 2

  65. Single Point of Failure Detection 4 0 3 1 2

  66. Redirect-to-Master 4 0 3 1 2

  67. Redirect-to-Master 4 0 3 1 2 302 http://node0/ sql

  68. Redirect-to-Master 4 0 3 1 2

  69. Leaderboard Top 10 •  Everyone changed SQLite “bindings” •  Seven

    solutions used go-raft •  Two solutions used redirect-to-master •  One solution implemented Raft in C++ •  If at first you don’t succeed… ◦  Mean submissions: 1444 (stddev 1031) ◦  Max: 3946 ◦  Min: 58 (the C++ solution)
  70. Speculative: Time-based consensus •  All nodes were run on the

    same host •  Local clock: total ordering on events •  Leaderless state machine •  ??? •  Profit? •  Similar ideas to Spanner (Google)
  71. Greg Brockman Andy Brody Christian Anderson Philipp Antoni Carl Jackson

    Jonas Schneider Siddarth Chandrasekaran Ludwig Pettersson Nelson Elhage Steve Woodrow Jorge Ortiz