Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Service Denied! Understanding How Regex DoS Attacks Work

Service Denied! Understanding How Regex DoS Attacks Work

Did you know that people can knock your Rails application offline just by submitting specially formatted strings in a form or API request? In this talk, we’ll take a look at what’s really going on with a regex denial of service (DoS) attack. We’ll take a peek into the CRuby regex engine to see what it’s really doing when we ask it to match against a string. With a basic understanding of how regular expressions work, we can better understand what these attacks do, why they tie up so much CPU, and what we can do to guard against them.

303aae3354beb438eaa44000b1f2f3fd?s=128

Kevin Menard

June 15, 2022
Tweet

More Decks by Kevin Menard

Other Decks in Programming

Transcript

  1. Kevin Menard kevin.menard@shopify.com GitHub: @nirvdrum — Twitter: @nirvdrum Service Denied!

    Understanding How Regex DoS Attacks Work
  2. I work at Shopify. It’s a fun place to be

    and I get to do interesting work. We’re hiring. Please feel free to reach out if you’re interested.
  3. None
  4. • High performance implementation of Ruby • Focuses on peak

    performance • Designed to optimize idiomatic Ruby • Intended to be compatible with CRuby • Even runs native extensions
  5. • Sample benchmarks from yjit-bench • https://github.com/Shopify/yjit-bench • Left: activerecord

    benchmark • Right: erubi benchmark Benchmark data from https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yjit-jruby-tru ff leruby.html
  6. Service Denied! Understanding How Regex DoS Attacks Work

  7. 1. Context for topic 2. Intro to Denial of Service

    3. De f ine ReDoS 4. Crash course in performance analysis (Real world “Big-O” notation!) 5. Dive into regular expressions (Learn about state machines!) 6. Bring it on home Breakdown
  8. © 2021, Matthew Henry https://burst.shopify.com/photos/the-year-2021-in-black-ink

  9. None
  10. None
  11. I was working on improving regex performance in Tru ff

    leRuby for a chunk of 2021 to improve performance of browser_sni ff er gem Tru ff leRuby picked up a second regex engine earlier in the year to JIT common expressions; old engine retained for fallback To ensure compatibility, I spent a lot of time looking into CRuby’s regex engine (Onigmo) A Fortunate Series of Events
  12. None
  13. Regular Expression Denial of Service (ReDoS)

  14. • Attacker is simply trying to mess with you •

    For Rails: “service” usually means a web request • DoS prevents you from achieving performance and availability objectives • Attacker wastes ( f inite) compute resources on junk requests • Either by specially crafted input or by volume • Can sometimes be di ff icult to distinguish malice from corner cases • You can get surprisingly far with “N + 1” queries on small inputs Denial of Service
  15. Regular Expression Denial of Service (ReDoS)

  16. - Open Web Application Security Project® (OWASP) (https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS) The Regular

    expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size).
  17. - Wikipedia https://en.wikipedia.org/wiki/ReDoS A regular expression denial of service (ReDoS)

    is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate.
  18. - Wikipedia https://en.wikipedia.org/wiki/ReDoS The attack exploits the fact that many

    regular expression implementations have super-linear worst-case complexity; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size.
  19. Performance

  20. Performance: Benchmarks Edition

  21. • Results are easy to understand • Considerations: • Require

    code AND inputs to reproduce • Very much tied to benchmark environment • Quite tricky to normalize or eliminate system e ff ects Benchmark data from https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yjit-jruby-tru ff leruby.html
  22. Performance: Algorithmic Complexity Edition

  23. • Idea: Count key operations to measure and compare, doable

    by hand • Example: You have 10 doors and 10 keys • How many times would you have to turn a key before you found each pair? • Now, how about 20 doors and 20 keys? • Input Data: Those operations are always relative to some sort of input: • Array size for array sorting • Match string length of regex • Context: Best case? Worst case? Average case? Something else? • Usually, we’re talking about worst case Algorithmic Complexity
  24. Asymptotic Complexity Notation Descriptive Name O(1) Constant Time O(log2(n)) Logarithmic

    Time O(n) Linear Time O(n⋅lg(n)) Linearithmic Time O(n2) Quadratic Time O(n3) Cubic Time O(2n) Exponential Time }Polynomial Time
  25. Constant Time Linear Time Logarithmic Time Linearithmic Time Quadratic Time

    Exponential Time
  26. ReDoS Example

  27. Adapted from Russ Cox’s Regular Expression Matching Can Be Simple

    and Fast… https://swtch.com/~rsc/regexp/regexp1.html (2007) max_count = ARGV.empty? ? ENV['MAX_COUNT'] : ARGV.first.to_i def run_iteration(count) # /a?ⁿaⁿ/ definition from https://swtch.com/~rsc/regexp/regexp1.html r = Regexp.compile("a?" * count + "a" * count) t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC) r.match?("a" * count) t2 = Process.clock_gettime(Process::CLOCK_MONOTONIC) t2 - t1 end puts "Count,Time (s)" (1..max_count).each do |count| time = run_iteration(count) puts "#{count},#{time}" $stdout.flush end
  28. Adapted from Russ Cox’s Regular Expression Matching Can Be Simple

    and Fast… https://swtch.com/~rsc/regexp/regexp1.html (2007) # /a?ⁿaⁿ/ defn from https://swtch.com/~rsc/regexp/regexp1.html # E.g., n = 3 #=> /a?a?a?aaa/ r = Regexp.compile("a?" * count + "a" * count) # Pathological case: # E.g., n = 3 #=> "aaa" r.match?("a" * count)
  29. None
  30. None
  31. Regular Expression Denial of Service (ReDoS)

  32. • A compact DSL for: • Writing programs to pattern

    match strings • Instructions for building a state machine • Describing a regular language • Sort of: Ruby adds extensions that aren’t regular What are regular expressions?
  33. - Pink Floyd Welcome, my son. Welcome to the machine.

  34. • Abstraction for recording the state an object is and

    how it transitions to other states • States represented by circles, called nodes • Transitions represented by directional arrows • Sometimes transitions have labels to make their selection conditional State Machines
  35. • Two types: NFA and DFA • NFA: Nondeterministic Finite

    Automata • States can have overlapping transitions • What’s nondeterministic is which one is chosen • DFA: Deterministic Finite Automata • For each (state, input_character) pair, there is only one possible transition Regex State Machines
  36. State Machines for /a?a?aa/ NFA DFA

  37. Possible Transitions on First ‘a’ NFA DFA

  38. So… how do we decide which transition to take in

    a NFA?
  39. - Gordon Gekko (Wall Street) Greed — for lack of

    a better word — is good. Greed is right. Greed works.
  40. Greed in Action

  41. DFA for /a?a?a?a?aaaa/ “aa” : Start ➡ S1 ➡ S2

    ➡ 🛑 “abc”: Start ➡ S1 ➡ Start ➡ Start ➡ 🛑 “defg”: Start ➡ Start ➡ Start ➡ Start ➡ 🛑
  42. DFA for /a?a?a?a?aaaa/ O(n) — Linear in size of match

    string in worst case!
  43. None
  44. None
  45. How are we A ff ected?

  46. • Form submission • Can’t rely on front-end validation or

    restrictions (e.g., max form f ield length) • Attacker can just POST the data directly to your server • Attacker could adjust element attributes in web developer tools • API calls made by users • Sourcing data from other locations We Handle Untrusted 3rd Party Data
  47. • Never build a Regexp object from a user-supplied data

    • Process text data before feeding it to a regex • E.g., truncate it if excessively long • Ruby does this for Date parsing methods as of 3.0.3 • Don’t rely on front-end validation or restrictions! • Upgrade Rails and Ruby when security releases are cut • Ruby 3.2 introduces Regexp time limits, both global and per-Regexp What Can we do About It?
  48. • Try to adjust your regex to something equivalent, but

    with limited backtracking: • Don’t nest quanti f iers (e.g, avoid something like /a(.+)*b/) • Specify upper-bounds on quanti f iers if you can • E.g., \w{1,5} instead of \w+ • Use atomic grouping: ?> • Try to think through your pathological cases and test them • Remember to test progressively longer match strings • Make sure your input strings exercise the backtracking behavior Think Like an Attacker
  49. We made it

  50. • How to talk about performance without benchmarking • How

    a regex engine compiles a pattern into a state machine • How two classes of state machines operate: NFA vs DFA • How attackers trigger a ReDoS and how to guard against them What We’ve Learned
  51. • regex101.com: Debugger to see how your regex works •

    regexper.com: Build a visual representation of regex state machine • Test your regex with growing number of ReDoS detection tools: • regexploit: https://github.com/doyensec/regexploit • ReDoS checker: https://devina.io/redos-checker • Try regexp_parser gem dissect regex and match on components • Helpful for writing tests • https://github.com/ammar/regexp_parser Helpful Resources
  52. • regular_expression: https://github.com/kddnewton/regular_expression • A regex engine written in Ruby

    (Shopify Hack Days project) • Russ Cox’s regex articles: https://swtch.com/~rsc/regexp/ • TRegex: https://github.com/oracle/graal/tree/master/regex • “Just-in-Time Compiling Ruby Regexps on Tru ff leRuby” RubyConf 2021 talk Educational Resources
  53. Thank you for your time Kevin Menard kevin.menard@shopify.com Twitter: @nirvdrum

    GitHub: @nirvdrum
  54. • Ruby logo: © 2006, Yukihiro Matsumoto. • Licensed under

    CC BY - SA 2.5: https://creativecommons.org/licenses/by-sa/2.5/ • Tru ff leRuby logo: © 2017 Talkdesk, Inc. • Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ • Rails logo is in the public domain • CC0 1.0 Universal (CC0 1.0) Public Domain Dedication • YJIT logo: © 2021, Shopify, Inc. • Tapioca logo: © Shopify, Inc. • Sorbet logo: © Stripe • “2021” picture: © 2021, Matthew Henry https://burst.shopify.com/photos/the- year-2021-in-black-ink Image Licenses