Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Inside Cloudbleed

majek04
December 07, 2017

Inside Cloudbleed

majek04

December 07, 2017
Tweet

More Decks by majek04

Other Decks in Technology

Transcript

  1. Reverse proxy 3 Eyeball Reverse proxy Origin server • Caching

    • Security • DDoS protection • Optimizations
  2. 4

  3. 5

  4. Friday, 23 Feb 6 “Our edge servers were running past

    the end of a buffer and returning memory that contained private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data. And some of that data had been cached by search engines.”
  5. 7

  6. 8 When? Uninitialized memory during: Affected: Worst problem Heartbleed 1

    Apr 2014 TLS Heartbeat Requests All OpenSSL servers globally SSL keys leaked Cloudbleed 17 Feb 2017 Malformed HTML passing Cloudflare All Cloudflare customers Cached by search engines
  7. 9

  8. 11

  9. 12

  10. 16:40 Friday, February 17 PT (T+00:08) • Team assembled in

    San Francisco • Initial impact assessed 14
  11. 17:19 Friday, Feb 17 PT (T+00:47) • "Email Obfuscation" feature

    disabled globally • Project Zero confirm they no longer see leaking data 15
  12. 23:22 Friday, Feb 17 PT (T+06:50) • Implemented and deployed

    kill switch for Server-Side Excludes 19
  13. 13:59 Monday, 20 Feb PT (T+3d) • SAFE_CHAR fix deployed

    globally 20 2017/02/19 13:47:34 [crit] 27558#0: *2 email filter tried to access char past EOF while sending response to client, client: 127.0.0.1, server: localhost, request: "GET / malformed-test.html HTTP/1.1”
  14. 10:03 Tuesday, Feb 21 PT (T+4d) • Automatic HTTPS Rewrites,

    Server-Side Excludes and Email Obfuscation re-enabled worldwide 21
  15. Unclosed HTML attribute at end of page • Domain had

    to have one of • Email Obfuscation • Server-Side Excludes + other feature • Automatic HTTPS Redirects + other feature • Page had to end with something like 24 <script type=" <img height="50px" width="200px" src="
  16. 25

  17. Buffer overrun 26 /* generated C code */ if (

    ++p == pe ) goto _test_eof; current position end of buffer
  18. 27 < s c r i p t t y

    p e = " ☐ - - - - - - - - - - - - - - - p pe
  19. 28 < s c r i p t t y

    p e = " ☐ - - - - - - - - - - - - - - - p pe
  20. 29 < s c r i p t t y

    p e = " ☐ - - - - - - - - - - - - - - - p pe
  21. 30 script_consume_attr := \ ((unquoted_attr_char)* :>> (space|'/'|'>')) @{ fhold; fgoto

    script_tag_parse; } $lerr{ fgoto script_consume_attr; }; fhold --> p-- fgoto --> p++ Ok! Error! Missing fhold.... Ragel parser generator
  22. 36 email obfuscation old parser new parser 36 https rewrites

    email obfuscation other features server-side excludes
  23. Trigering conditions • Buffer smaller than 4KB • End with

    malformed script or img tag • Enabled features using both new and old parser 38
  24. Bad: leaking sensitive data • Customer: • HTTP headers, including

    cookies; POST data (passwords, potentially credit card numbers, SSNs); URI parameters; JSON blobs for API calls; API authentication secrets; OAuth keys • Private Cloudflare: • Keys, authentication secrets 43
  25. Project zero timelines • 90 days • 7 days -

    critical vulnerabilities under active exploitation 51
  26. 52

  27. 57

  28. Impact statistics • SAFE_CHAR logs allowed us to estimate the

    impact. • September 22, 2016 -> February 13, 2017 605,307 • February 13, 2017 -> February 18, 2017 637,034 62
  29. Each leaked page contained • (based on search engine caches)

    • 67.54 Internal Cloudflare HTTP Headers • 0.44 Cookies • 0.04 Authorization Headers / Tokens • No passwords, credit cards, SSNs found in the wild 63
  30. Estimated customer impact 64 Requests per Month Estimated Leaks ------------------

    ----------------- 200B – 300B 22,356 – 33,534 100B – 200B 11,427 – 22,356 50B – 100B 5,962 – 11,427 10B – 50B 1,118 – 5,926 1B – 10B 112 – 1,118 500M – 1B 56 – 112 250M – 500M 25 – 56 100M – 250M 11 – 25 50M – 100M 6 – 11 10M – 50M 1 – 6 <10M < 1 

  31. It's been going on for months • September 22, 2016:

    • Automatic HTTP Rewrites enabled new parser • January 30, 2017 • Server-Side Excludes migrated to new parser • February 13, 2017 • Email Obfuscation partially migrated to new parser • February 18, 2017 • Google reports problem to Cloudflare and leak is stopped 66 ⟯180 sites ⟯6500 sites
  32. 70

  33. All crashes will be investigated 71 From: SRE To: nginx-dev

    On 2017-02-22 between 3:00am and 3:30am UTC we notice 17 core dumps in SJC, ORD, IAD, and one in DOG.
  34. 73

  35. 75

  36. Mystery crash • Not: “I don’t understand why the program

    did that” • Rather: “I believe it is impossible for the program to reach this state, if executed correctly” • At a low level, computers are deterministic! 76
  37. Mystery crashes • On average, ~1 mystery core dump a

    day • Scattered over all servers, all datacenters • Per server, 1 in 10 years • Can’t reproduce • Hard to try any potential fix 77
  38. Mystery crashes • Cosmic rays? • Memory error? (we use

    ECC) • Faulty CPU? (mostly Intel) • Core dumps get corrupted somewhere? • OS bug generating core dumps? • OS virtual memory bug? TLB? 78
  39. Broadwell trail 79 From: SRE To: nginx-dev During further investigation,

    Josh suggested to check if the generations of hardware could be relevant. Surprisingly all 18 nginx SIGSEGV crashes happened on Intel Broadwell servers. Given that broadwell are on a 1/3rd of our fleet, we suspect the crashes might be related to hardware.
  40. 80

  41. 81

  42. Microcode update • Firmware that controls the lowest-level operation of

    the processor
 • Can be updated by the BIOS (from system vendor) or the OS
 • Microcode updates can change the behaviour of the processor to some extent, e.g. to fix errata 83
  43. 87