Slide 1

Slide 1 text

2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Make Regexp#match Much Faster Hiroya Fujinami (makenowjust)

Slide 2

Slide 2 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Hiroya Fujinami ‣ Ph.D. student at SOKENDAI (NII) ‣ t @make_now_just g @makenowjust ‣ Ruby committer since 2022/12 - I contributed for speed up Regexp#match in CRuby ⚡ at Cookpad Inc. internship

Slide 3

Slide 3 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Agenda ‣ Regexp is powerful 
 Powerful features and applications in Ruby's Regexp ‣ ReDoS — a vulnerability about Regexp 
 What a vulnerability is ReDoS (Regular Expression Denial of Service)? ‣ How to speed up Regexp matching 
 The optimization by memoization makes Regexp matching faster. ‣ Future work 
 Talking about feature vision on Regexp matching

Slide 4

Slide 4 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Regexp is powerful 
 Powerful features and applications in Ruby's Regexp

Slide 5

Slide 5 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Ruby's Regexp is powerful! ‣ Ruby's Regexp has some rich features: - multiple character encoding support, - look-around operators, atomic groups, conditional branches, absent operator, - subexpression calls, and back-references (irregular extensions). → These features help our developments and hobbies. (?=foo) (?!foo) (?<=foo) (?foo) (?(x)foo|bar) look-around operators atomic group conditional branches (?foo){0}\g subexpression call (?foo*)\k back-reference (?~foo) absent operator

Slide 6

Slide 6 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan How powerful is it? ‣ The following Regexp computes (veri fi es) a sum equation of two binary digits. - e.g. "01+01=10" and "110+001=111" "01+01=00" /(?[01](?=(?:(?(?<=0))|(?(?<=1)))[01]*\+(?(?:\k|(?!\k)) [01])(?:(?(?<=0))|(?(?<=1)))[01]*=(?(?:\k|(?!\k))[01])(?:(? (?<=0))|(?(?<=1)))[01]*\z(?:(?\k\k\k| \k\k\k|\k\k\k|\k\k\k)| \k\k\k|\k\k\k|\k\k\k| \k\k\k)(?:\k(?:\k(?:\k(?!\k)|\k(?! \k))|\k(?:\k\k|\k(?!\k)))|\k(?:\k(?: \k\k|\k(?!\k))|\k(?:\k\k| \k\k))))\g|(?!\k)\+\k=\k){0}\A\g\z/ Theoretically, this language (a set of strings) is neither regular nor context-free.

Slide 7

Slide 7 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan How powerful is it? ‣ A brainf*ck implementation in Ruby's Regexp exists. - https://github.com/shinh/hack/blob/master/bf_rb_reg/bf.rb - BF_REG =~ bf + BF_SUFFIX executes a bf program on Regexp matching 
 (loops may stop after 256 iterations.) ‣ Conclusion → ultimately powerful!

Slide 8

Slide 8 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan With great power 
 comes great responsibility.

Slide 9

Slide 9 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan ReDoS — a vulnerability about Regexp 
 What a vulnerability is ReDoS (Regular Expression Denial of Service)?

Slide 10

Slide 10 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan ReDoS (Regular Expression Denial of Service) ‣ A vulnerability about Regexp matching. ‣ The time taken for Regexp matching can be explosive. - time ruby -e '/\A(a|a)*\z/ =~ "a" * 30 + "b"' 
 30.99s user 0.09s system 99% cpu 31.188 total - The Regexp /\A(a|a)*\z/ takes an exponential matching time 
 against such an input string "a" * n + "b".

Slide 11

Slide 11 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan What a problem is Regexp matching explosion? One scenario: ‣ A web app uses /\A[^@]+@[^@]+([.][^@]+)+\z/ 
 for validating E-mail address. ‣ Then, one user sends "a@" + "." * 50 + "@" to this web app. ‣ For processing this request, web app takes some minutes 
 because the Regexp matching time is explosive. ‣ Other users see a loading window or 500 Internal Server Error ¯\_(π)_/¯. → ReDoS gives a bad experience to users and hurts a business opportunity.

Slide 12

Slide 12 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan ReDoS is a threat in the real world. ‣ For example, Cloud fl are was down for 27 minutes due to ReDoS. 
 https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/ ‣ Recently, many ReDoS vulnerabilities in Ruby gems are reported. - CVE-2022-24836 (Nokogiri), CVE-2022-30122 (Rack), etc. 
 (See details in Japanese, https://zenn.dev/ooooooo_q/articles/ruby_3_2_redos) ‣ ReDoS vulnerabilities in Ruby's core libraries are also reported. → We need to prevent ReDoS in Ruby itself.

Slide 13

Slide 13 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan How to prevent ReDoS ‣ Some regexes cause ReDoS and some do not. ‣ ReDoS can be prevented if we are careful...? ‣ One way: using a ReDoS detection tool. - https://github.com/makenowjust-labs/recheck ‣ Another way: improving Regexp matching implementation (today's talk). - Ruby 3.2+ has the improved Regexp implementation that prevents ReDoS.

Slide 14

Slide 14 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Improvement result (1) polynomial case https://www.ruby-lang.org/ja/news/2022/12/25/ruby-3-2-0-released/

Slide 15

Slide 15 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Improvement result (2) exponential case https://www.ruby-lang.org/ja/news/2022/12/25/ruby-3-2-0-released/

Slide 16

Slide 16 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan How to speed up Regexp matching ⚡ The optimization by memoization makes Regexp matching faster.

Slide 17

Slide 17 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan VM-based Regexp engine ‣ CRuby's Regexp engine (Onigmo) is VM-based. ‣ VM-based means - Regexp is compiled into byte codes, and - byte code execution (matching) uses backtracking. ‣ push @label pushes a @label and the current position 
 into the stack for backtracking. begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end /\A(?:a|a)*\z/ Regexp byte codes compile

Slide 18

Slide 18 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Memoization ‣ Memoization is a technique that records states and their results, 
 and omits computations for the same state. ‣ On the VM-based matching, a pair of the current PC (program counter) 
 and the position on the input string is recorded as a state. - If the same state is reached again, it is not necessary to record the result, 
 since the matching has failed once from that state. ‣ Starting on the next slide, I will explain how VM-based matching works.

Slide 19

Slide 19 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (1) ‣ Matching against the string "aab" is started. ‣ Initially, the position is 0, and the stack is empty. begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 0 Position Stack []

Slide 20

Slide 20 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (2) ‣ The position is 0, then begin-buf (\A) test is passed. begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 0 Position Stack []

Slide 21

Slide 21 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (3) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack. begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 0 Position Stack [[@loop-end, 0], [@branch, 0]]

Slide 22

Slide 22 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (4) begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0]] ‣ The position 0 points a character 'a', then exact1 'a' test is passed. ‣ Further, exact1 'a' advances the position. 0

Slide 23

Slide 23 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (5) ‣ jump @branch-end and jump @loop-begin update the PC (program counter). ‣ Then, the PC is @loop-begin (push @loop-end). begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0]]

Slide 24

Slide 24 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (6) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack. begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]]

Slide 25

Slide 25 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (7) begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]] ‣ The position 1 points a character 'a', then exact1 'a' test is passed. ‣ Further, exact1 'a' advances the position. 1

Slide 26

Slide 26 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (8) ‣ jump @branch-end and jump @loop-begin update the PC. ‣ Then, the PC is @loop-begin (push @loop-end). begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]] 2

Slide 27

Slide 27 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (9) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack. begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1], [@loop-end, 2], [@branch, 2]] 2

Slide 28

Slide 28 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (10) begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1], [@loop-end, 2]] 2 Backtrack [@branch, 2] pop! ‣ The position 2 points a character 'b', then exact1 'a' test is failed. ‣ VM pops the label and the position from the stack 
 and does backtrack.

Slide 29

Slide 29 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan ‣ The position 2 points a character 'b', then exact1 'a' test is failed. ‣ VM pops the label and the position from the stack 
 and does backtrack. Example: VM-based Regexp matching (11) begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@branch, 1]] 2 Backtrack [@loop-end, 2] pop!

Slide 30

Slide 30 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan ‣ The position 2 is not the end position, then end-buf (\z) test is failed. ‣ VM pops the label and the position from the stack 
 and does backtrack. Example: VM-based Regexp matching (12) begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1]] 1 Backtrack [@branch, 1] pop! 2

Slide 31

Slide 31 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (13) begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1]] ‣ The position 1 points a character 'a', then exact1 'a' test is passed. ‣ Further, exact1 'a' advances the position. 1

Slide 32

Slide 32 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (14) ‣ jump @loop-begin updates the PC. ‣ Then, the PC is @loop-begin (push @loop-end). begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1]]

Slide 33

Slide 33 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Example: VM-based Regexp matching (15) ‣ push @loop-end and push @branch push 
 pairs of labels and the current position into the stack. ‣ But, we have already reached this situation at (9). 
 → Memoization begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 2 Position Stack [[@loop-end, 0], [@branch, 0], [@loop-end, 1], [@loop-end, 2], 
 [@branch, 2]] (9)

Slide 34

Slide 34 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Memoization ‣ In the previous example, (9) and (15) look similar situation except for the stack. - Both PCs are @loop-begin (push @loop-end), and positions are 2. ‣ That is, we already know the matching will be failed from (15). ‣ By recording the PC and the position pairs once reached, 
 we can reduce unnecessary backtracking. → Let's introduce memoization! (9) (15)

Slide 35

Slide 35 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan ‣ [push @loop-end, 2] is already reached (memoized) at (9). ‣ Thus, immediately VM does backtrack. ‣ After this, many backtracks are omitted 
 by memoization, and the matching fails. ‣ Details are described in this article (in Japanese). 
 https://techlife.cookpad.com/entry/2022/12/12/162023 After introducing memoization begin-buf @loop-begin 
 push @loop-end push @branch exact1 'a' jump @branch-end @branch exact1 'a' @branch-end jump @loop-begin @loop-end end-buf end "aab" Input byte codes 1 Position Stack [[@loop-end, 0], [@branch, 0]] Backtrack [@loop-end, 1] pop! 2

Slide 36

Slide 36 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Theoretical background of memoization ‣ [David et al, SP '21] studied memoization for Regexp matching. ‣ According to the paper, 
 memoization makes matchings linear time against input string length . - Matching by backtracking may be exponential or polynomial time. - Thus, linear time is much faster. O(n) n

Slide 37

Slide 37 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Pros of memoization ‣ Memoization is implemented to the original VM directly. ‣ Therefore, - we can guarantee high-level backward compatibility, and - the optimization can be implemented with fewer modi fi cations. ‣ Actual PR: https://github.com/ruby/ruby/pull/6486 - Diff +744 -12

Slide 38

Slide 38 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Cons of memoization (1): Memory consumption ‣ Memory consumption is greater for memoization than not using memoization. - A memoization table is a bit-array, and the memory consumption 
 will increase by (string length x number of branch instructions) / 8. • Typically, the number of branch instructions is about 80 at most. - Allocating memoization table is delayed until the number of backtracks 
 exceeds a certain number. ‣ Thus, we conclude memory consumption is not a big problem.

Slide 39

Slide 39 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Cons of memoization (2): Unsupported features ‣ Memoization is not enabled when the following features are used. - look-around operators, atomic groups, conditional branches, absent operator, - subexpression calls, back-references, etc. ‣ There are also strange limitations due to the implementation. - Nested ranged repetitions are not supported. - Nested null loops are not supported too. ‣ Not all Regexps can be optimized by memoization.

Slide 40

Slide 40 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Regexp.linear_time? ‣ Ruby 3.2 also introduced Regexp.linear_time?. ‣ Regexp.linear_time?(re) checks 
 whether the given re can be optimized by memoization or not. - e.g. Regexp.linear_time?(/^(a|a)*$/) #=> true 
 e.g. Regexp.linear_time?(/^(a*)\1*$/) #=> false

Slide 41

Slide 41 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Future work 
 Talking about feature vision on Regexp matching

Slide 42

Slide 42 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Future work ‣ Showing a warning if memoization is not enabled. - Adding a new Rubocop rule. - Adding a new warning to Ruby. ‣ Introducing a new backtrack-less Regexp engine. → DFA-based Regexp engine

Slide 43

Slide 43 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan DFA-based Regexp engine ‣ DFA (Deterministic Finite-state Automaton) ‣ This can be done in linear time for matching. ‣ Go and Rust (modern languages) use such Regexp engines. ‣ Irregular extensions (subexpression calls, back-references) cannot be supported. ‣ We would like to use a DFA-based engine in Ruby if possible.

Slide 44

Slide 44 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Implementation plan ‣ We would reuse existing components as much as possible for backward compatibility. - Regexp parser, character class inclusion test, ignore-case expansion, etc. (Just a personal vision...) Compiler Matcher Parser ignore-case 
 expansion char-class 
 inclusion DFA-based 
 Matcher

Slide 45

Slide 45 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Implementation plan ‣ The recent paper [Moseley et al, PLDI '23] uses complex data structures. - It seems hard to implement it in C. ‣ Therefore, we would like to be able to implement a Regexp engine in Ruby. - It is getting ready to write fast code in Ruby (e.g. RJIT). ‣ To-do for implementing a Regexp engine in Ruby - Expose a Regexp internal APIs (parser, character-class, etc.) to Ruby. - Allow a Regexp engine to be replaced from the Ruby side. (Just a personal vision...)

Slide 46

Slide 46 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan TRegex ‣ TRegex is a Regexp matching engine used by Truf fl eRuby. ‣ It is DFA-based and JIT-enabled [Daloze and Haider, RubyKaigi '21]. ‣ However, TRegex is - implemented in Java (on GraalVM), and - not based on a CRuby's Regex engine 
 (Thus, a backward compatibility issue may exist...?)

Slide 47

Slide 47 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan Summary of today's talk ‣ ReDoS is a vulnerability about Regexp matching. ‣ To prevent ReDoS, the optimization by memoization is 
 introduced by makenowjust (me). ‣ Memoization speeds up matchings in many cases, 
 but in some cases memoization is not enabled. - Some extensions (look-around, back-references, etc.) are not supported. - There are some limitations (nested null loops, etc.) due to the implementation. ‣ Next: a new DFA-based Regexp engine written in Ruby...? Thank you!

Slide 48

Slide 48 text

Make Regexp#match much faster - Hiroya Fujinami (@makenowjust) 2023/5/11 RubyKaigi 2023 at Matsumoto, Japan References ‣ Davis, James C., Francisco Servant, and Dongyoon Lee. "Using selective memoization to defeat regular expression denial of service (ReDoS)." 2021 IEEE symposium on security and privacy (SP). IEEE, 2021. ‣ Dan Moseley, Mario Nishio, Jose Perez Rodriguez, Olli Saarikivi, Stephen Toub, Margus Veanes, Tiki Wan, Eric Xu. "Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking Semantics" Proceeding of ACM SIGPLAN 2023 conference on Programming Language Design and Implementation. 2023. ‣ Benoit Daloze, and Josef Haider. "Just-in-Time Compiling Ruby Regexps on Truf fl eRuby" RubyKaigi 2021. 2021.