Slide 1

Slide 1 text

Kevin Menard [email protected] GitHub: @nirvdrum — Twitter: @nirvdrum Service Denied! Understanding How Regex DoS Attacks Work

Slide 2

Slide 2 text

I work at Shopify. It’s a fun place to be and I get to do interesting work. We’re hiring. Please feel free to reach out if you’re interested.

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

• High performance implementation of Ruby • Focuses on peak performance • Designed to optimize idiomatic Ruby • Intended to be compatible with CRuby • Even runs native extensions

Slide 5

Slide 5 text

• Sample benchmarks from yjit-bench • https://github.com/Shopify/yjit-bench • Left: activerecord benchmark • Right: erubi benchmark Benchmark data from https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yjit-jruby-tru ff leruby.html

Slide 6

Slide 6 text

Service Denied! Understanding How Regex DoS Attacks Work

Slide 7

Slide 7 text

1. Context for topic 2. Intro to Denial of Service 3. De f ine ReDoS 4. Crash course in performance analysis (Real world “Big-O” notation!) 5. Dive into regular expressions (Learn about state machines!) 6. Bring it on home Breakdown

Slide 8

Slide 8 text

© 2021, Matthew Henry https://burst.shopify.com/photos/the-year-2021-in-black-ink

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

I was working on improving regex performance in Tru ff leRuby for a chunk of 2021 to improve performance of browser_sni ff er gem Tru ff leRuby picked up a second regex engine earlier in the year to JIT common expressions; old engine retained for fallback To ensure compatibility, I spent a lot of time looking into CRuby’s regex engine (Onigmo) A Fortunate Series of Events

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Regular Expression Denial of Service (ReDoS)

Slide 14

Slide 14 text

• Attacker is simply trying to mess with you • For Rails: “service” usually means a web request • DoS prevents you from achieving performance and availability objectives • Attacker wastes ( f inite) compute resources on junk requests • Either by specially crafted input or by volume • Can sometimes be di ff icult to distinguish malice from corner cases • You can get surprisingly far with “N + 1” queries on small inputs Denial of Service

Slide 15

Slide 15 text

Regular Expression Denial of Service (ReDoS)

Slide 16

Slide 16 text

- Open Web Application Security Project® (OWASP) (https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS) The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size).

Slide 17

Slide 17 text

- Wikipedia https://en.wikipedia.org/wiki/ReDoS A regular expression denial of service (ReDoS) is an algorithmic complexity attack that produces a denial-of-service by providing a regular expression and/or an input that takes a long time to evaluate.

Slide 18

Slide 18 text

- Wikipedia https://en.wikipedia.org/wiki/ReDoS The attack exploits the fact that many regular expression implementations have super-linear worst-case complexity; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size.

Slide 19

Slide 19 text

Performance

Slide 20

Slide 20 text

Performance: Benchmarks Edition

Slide 21

Slide 21 text

• Results are easy to understand • Considerations: • Require code AND inputs to reproduce • Very much tied to benchmark environment • Quite tricky to normalize or eliminate system e ff ects Benchmark data from https://eregon.me/blog/2022/01/06/benchmarking-cruby-mjit-yjit-jruby-tru ff leruby.html

Slide 22

Slide 22 text

Performance: Algorithmic Complexity Edition

Slide 23

Slide 23 text

• Idea: Count key operations to measure and compare, doable by hand • Example: You have 10 doors and 10 keys • How many times would you have to turn a key before you found each pair? • Now, how about 20 doors and 20 keys? • Input Data: Those operations are always relative to some sort of input: • Array size for array sorting • Match string length of regex • Context: Best case? Worst case? Average case? Something else? • Usually, we’re talking about worst case Algorithmic Complexity

Slide 24

Slide 24 text

Asymptotic Complexity Notation Descriptive Name O(1) Constant Time O(log2(n)) Logarithmic Time O(n) Linear Time O(n⋅lg(n)) Linearithmic Time O(n2) Quadratic Time O(n3) Cubic Time O(2n) Exponential Time }Polynomial Time

Slide 25

Slide 25 text

Constant Time Linear Time Logarithmic Time Linearithmic Time Quadratic Time Exponential Time

Slide 26

Slide 26 text

ReDoS Example

Slide 27

Slide 27 text

Adapted from Russ Cox’s Regular Expression Matching Can Be Simple and Fast… https://swtch.com/~rsc/regexp/regexp1.html (2007) max_count = ARGV.empty? ? ENV['MAX_COUNT'] : ARGV.first.to_i def run_iteration(count) # /a?ⁿaⁿ/ definition from https://swtch.com/~rsc/regexp/regexp1.html r = Regexp.compile("a?" * count + "a" * count) t1 = Process.clock_gettime(Process::CLOCK_MONOTONIC) r.match?("a" * count) t2 = Process.clock_gettime(Process::CLOCK_MONOTONIC) t2 - t1 end puts "Count,Time (s)" (1..max_count).each do |count| time = run_iteration(count) puts "#{count},#{time}" $stdout.flush end

Slide 28

Slide 28 text

Adapted from Russ Cox’s Regular Expression Matching Can Be Simple and Fast… https://swtch.com/~rsc/regexp/regexp1.html (2007) # /a?ⁿaⁿ/ defn from https://swtch.com/~rsc/regexp/regexp1.html # E.g., n = 3 #=> /a?a?a?aaa/ r = Regexp.compile("a?" * count + "a" * count) # Pathological case: # E.g., n = 3 #=> "aaa" r.match?("a" * count)

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

Regular Expression Denial of Service (ReDoS)

Slide 32

Slide 32 text

• A compact DSL for: • Writing programs to pattern match strings • Instructions for building a state machine • Describing a regular language • Sort of: Ruby adds extensions that aren’t regular What are regular expressions?

Slide 33

Slide 33 text

- Pink Floyd Welcome, my son. Welcome to the machine.

Slide 34

Slide 34 text

• Abstraction for recording the state an object is and how it transitions to other states • States represented by circles, called nodes • Transitions represented by directional arrows • Sometimes transitions have labels to make their selection conditional State Machines

Slide 35

Slide 35 text

• Two types: NFA and DFA • NFA: Nondeterministic Finite Automata • States can have overlapping transitions • What’s nondeterministic is which one is chosen • DFA: Deterministic Finite Automata • For each (state, input_character) pair, there is only one possible transition Regex State Machines

Slide 36

Slide 36 text

State Machines for /a?a?aa/ NFA DFA

Slide 37

Slide 37 text

Possible Transitions on First ‘a’ NFA DFA

Slide 38

Slide 38 text

So… how do we decide which transition to take in a NFA?

Slide 39

Slide 39 text

- Gordon Gekko (Wall Street) Greed — for lack of a better word — is good. Greed is right. Greed works.

Slide 40

Slide 40 text

Greed in Action

Slide 41

Slide 41 text

DFA for /a?a?a?a?aaaa/ “aa” : Start ➡ S1 ➡ S2 ➡ 🛑 “abc”: Start ➡ S1 ➡ Start ➡ Start ➡ 🛑 “defg”: Start ➡ Start ➡ Start ➡ Start ➡ 🛑

Slide 42

Slide 42 text

DFA for /a?a?a?a?aaaa/ O(n) — Linear in size of match string in worst case!

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

How are we A ff ected?

Slide 46

Slide 46 text

• Form submission • Can’t rely on front-end validation or restrictions (e.g., max form f ield length) • Attacker can just POST the data directly to your server • Attacker could adjust element attributes in web developer tools • API calls made by users • Sourcing data from other locations We Handle Untrusted 3rd Party Data

Slide 47

Slide 47 text

• Never build a Regexp object from a user-supplied data • Process text data before feeding it to a regex • E.g., truncate it if excessively long • Ruby does this for Date parsing methods as of 3.0.3 • Don’t rely on front-end validation or restrictions! • Upgrade Rails and Ruby when security releases are cut • Ruby 3.2 introduces Regexp time limits, both global and per-Regexp What Can we do About It?

Slide 48

Slide 48 text

• Try to adjust your regex to something equivalent, but with limited backtracking: • Don’t nest quanti f iers (e.g, avoid something like /a(.+)*b/) • Specify upper-bounds on quanti f iers if you can • E.g., \w{1,5} instead of \w+ • Use atomic grouping: ?> • Try to think through your pathological cases and test them • Remember to test progressively longer match strings • Make sure your input strings exercise the backtracking behavior Think Like an Attacker

Slide 49

Slide 49 text

We made it

Slide 50

Slide 50 text

• How to talk about performance without benchmarking • How a regex engine compiles a pattern into a state machine • How two classes of state machines operate: NFA vs DFA • How attackers trigger a ReDoS and how to guard against them What We’ve Learned

Slide 51

Slide 51 text

• regex101.com: Debugger to see how your regex works • regexper.com: Build a visual representation of regex state machine • Test your regex with growing number of ReDoS detection tools: • regexploit: https://github.com/doyensec/regexploit • ReDoS checker: https://devina.io/redos-checker • Try regexp_parser gem dissect regex and match on components • Helpful for writing tests • https://github.com/ammar/regexp_parser Helpful Resources

Slide 52

Slide 52 text

• regular_expression: https://github.com/kddnewton/regular_expression • A regex engine written in Ruby (Shopify Hack Days project) • Russ Cox’s regex articles: https://swtch.com/~rsc/regexp/ • TRegex: https://github.com/oracle/graal/tree/master/regex • “Just-in-Time Compiling Ruby Regexps on Tru ff leRuby” RubyConf 2021 talk Educational Resources

Slide 53

Slide 53 text

Thank you for your time Kevin Menard [email protected] Twitter: @nirvdrum GitHub: @nirvdrum

Slide 54

Slide 54 text

• Ruby logo: © 2006, Yukihiro Matsumoto. • Licensed under CC BY - SA 2.5: https://creativecommons.org/licenses/by-sa/2.5/ • Tru ff leRuby logo: © 2017 Talkdesk, Inc. • Licensed under CC BY 4.0: https://creativecommons.org/licenses/by/4.0/ • Rails logo is in the public domain • CC0 1.0 Universal (CC0 1.0) Public Domain Dedication • YJIT logo: © 2021, Shopify, Inc. • Tapioca logo: © Shopify, Inc. • Sorbet logo: © Stripe • “2021” picture: © 2021, Matthew Henry https://burst.shopify.com/photos/the- year-2021-in-black-ink Image Licenses