Slide 1

Slide 1 text

Hiroya Fujinami - 2024/8/31 @ RubyKaigi 2024 follow up Regular Expressions, REXML, Automata Learning

Slide 2

Slide 2 text

Who am I? • Hiroy a Fujin a mi / Ph.D student a t NII / STORES, Inc. •  @m a ke_now_just,  @m a kenowjust • Ruby committer (regex memoiz a tion) • RubyK a igi 2024 t a lk "M a ke Your Own Regex Engine!" 2

Slide 3

Slide 3 text

Regular Expressions 3

Slide 4

Slide 4 text

The fact: Regex matching may not terminate. 4 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 5

Slide 5 text

Null-Loop with Lookahead • Why does not this m a tching termin a te? •ruby -e '/((?=(\3a|a))(?=(\2a|a)))*/ =~ "aaaa"' • On this m a tching, the c a pture st a tus oscill a tes. • (\2="a",\3="aa"), (\2="aaa",\3="aaaa"), (\2="a",\3="aa"), ... • Then, the m a tching does not termin a te. 5 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 6

Slide 6 text

How to Fix? • Recently, the cl a ss of "regex + b a ck-reference + look a he a d" is shown a s NLOG. • NLOG c a n simul a te on Poly. We think it is slow, but it is not yet implemeted. 6 Yuy a Uez a to. "Regul a r Expressions with B a ckreferences a nd Look a he a ds C a pture NLOG." ICALP 2024 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 7

Slide 7 text

REXML 7

Slide 8

Slide 8 text

My REXML Works 8 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 9

Slide 9 text

rexml-css_selector • A REXML extension for supporting CSS selectors. • CSS selector: p > a:first-child • This implement a tion is more generic. 9 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 10

Slide 10 text

rexml-css_selector + Prism 10 Regul a r Expressions / REXML / Autom a t a Le a rning call[name="require"] > arguments > string:first-child Hiroy a Fujin a mi

Slide 11

Slide 11 text

Automata Learning 11

Slide 12

Slide 12 text

Infer an Automaton from a Program 12 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 13

Slide 13 text

Hiroy a Fujin a mi Infer an Automaton from a Program 13 Regul a r Expressions / REXML / Autom a t a Le a rning

Slide 14

Slide 14 text

Infer an Automaton from a Program 14 Regul a r Expressions / REXML / Autom a t a Le a rning ???? Hiroy a Fujin a mi

Slide 15

Slide 15 text

The Ultimate Goal 15 Regul a r Expressions / REXML / Autom a t a Le a rning ???? ???? Prism p a rse.y di ff (model checking) bug ex a ctly s a me (m a them a tic a l result) Hiroy a Fujin a mi

Slide 16

Slide 16 text

The Ultimate Goal: Demo 16 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 17

Slide 17 text

The Ultimate Goal: Demo 17 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 18

Slide 18 text

The Ultimate Goal: Demo 17 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 19

Slide 19 text

Angluin's L* algorithm 18 Regul a r Expressions / REXML / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton Hiroy a Fujin a mi

Slide 20

Slide 20 text

Angluin's L* algorithm 18 Regul a r Expressions / REXML / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton a progr a m to le a rn Hiroy a Fujin a mi

Slide 21

Slide 21 text

Angluin's L* algorithm 18 Regul a r Expressions / REXML / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton a progr a m to le a rn a n equiv a lence query Hiroy a Fujin a mi

Slide 22

Slide 22 text

Angluin's L* algorithm 18 Regul a r Expressions / REXML / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton a progr a m to le a rn a n equiv a lence query te a cher Hiroy a Fujin a mi

Slide 23

Slide 23 text

19 Regul a r Expressions / REXML / Autom a t a Le a rning te a cher le a rner h1 Hiroy a Fujin a mi

Slide 24

Slide 24 text

19 Regul a r Expressions / REXML / Autom a t a Le a rning te a cher le a rner h1 equivalence(h1) "111" Hiroy a Fujin a mi

Slide 25

Slide 25 text

19 Regul a r Expressions / REXML / Autom a t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi

Slide 26

Slide 26 text

19 Regul a r Expressions / REXML / Autom a t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi program("11") false

Slide 27

Slide 27 text

19 Regul a r Expressions / REXML / Autom a t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi program("11") false h3

Slide 28

Slide 28 text

19 Regul a r Expressions / REXML / Autom a t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi program("11") false program("10") false h3

Slide 29

Slide 29 text

Problems of L* algorithm Regul a r Expressions / REXML / Autom a t a Le a rning • L* c a n infer only regul a r l a ngu a ges. • However, progr a m l a ngu a ge synt a x is context-free. • → VPA or procedur a l 20 Hiroy a Fujin a mi No a m Chomsky Chomsky Hier a rchy

Slide 30

Slide 30 text

VPA (Visibly Pushdown Automata) Regul a r Expressions / REXML / Autom a t a Le a rning • Pushdown a utom a t a with nest st a rt a nd end ch a r a cters a re le a rn a ble. • This kind of PDA is c a lled VPA (visibly pushdown a utom a t a ). 21 Hiroy a Fujin a mi

Slide 31

Slide 31 text

VPA: Example 22 Regul a r Expressions / REXML / Autom a t a Le a rning alphabet: %w[1 +] call: %w[(], return: %w[)] Hiroy a Fujin a mi

Slide 32

Slide 32 text

SPA (System of Procedural Automata) Regul a r Expressions / REXML / Autom a t a Le a rning • A system of a utom a t a ; they c a n c a ll e a ch other recursively. • A le a rning a lgorithm for SPA is known. 23 A B Hiroy a Fujin a mi

Slide 33

Slide 33 text

Lernen: automata learning library written in Ruby Regul a r Expressions / REXML / Autom a t a Le a rning • https://github.com/m a kenowjust/lernen • Implemented a lgorithms: L*, KV, a nd L#, for DFA, Moore, Me a ly, a nd VPA • Very e a sy to use!! 24 Hiroy a Fujin a mi

Slide 34

Slide 34 text

Lernen: Examples (DFA) 25 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 35

Slide 35 text

Lernen: Examples (DFA) 25 Regul a r Expressions / REXML / Autom a t a Le a rning flowchart TD 0((0)) 1((1)) 2((2)) 3(((3))) 0 -- "'0'" --> 1 0 -- "'1'" --> 0 1 -- "'0'" --> 2 1 -- "'1'" --> 1 2 -- "'0'" --> 3 2 -- "'1'" --> 2 3 -- "'0'" --> 0 3 -- "'1'" --> 3 Hiroy a Fujin a mi

Slide 36

Slide 36 text

Lernen: Examples (VPA) 26 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 37

Slide 37 text

Lernen: Examples (VPA) 26 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 38

Slide 38 text

Lernen: Examples (compare a program and a regexp) 27 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 39

Slide 39 text

Lernen: Examples (compare a program and a regexp) 27 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 40

Slide 40 text

Lernen: Examples (compare a program and a regexp) 27 Regul a r Expressions / REXML / Autom a t a Le a rning ...check_equivalence(...) # => "http://%" URI.parse("http://%") # raises URI::InvalidURIError URI.regexp(%w[http https]) .match?("http://%") # => true Hiroy a Fujin a mi

Slide 41

Slide 41 text

Lernen: automata learning library written in Ruby Regul a r Expressions / REXML / Autom a t a Le a rning • https://github.com/m a kenowjust/lernen • Implemented a lgorithms: L*, KV, a nd L#, for DFA, Moore, Me a ly, a nd VPA • Very e a sy to use!! • Let's try it!!! →→→ 28 Hiroy a Fujin a mi

Slide 42

Slide 42 text

Summary Regul a r Expressions / REXML / Autom a t a Le a rning • Regul a r Expressions: Null-loop with look a he a d is problem a tic. • REXML: I cre a ted rexml-css_selector libr a ry. • Autom a t a Le a rning: I cre a ted lernen libr a ry. I believe a utom a t a le a rning is one of the future forms of progr a mming. Let's try it! 29 Hiroy a Fujin a mi

Slide 43

Slide 43 text

uri_parse_automaton 30 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi

Slide 44

Slide 44 text

uri_regexp_automaton 31 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi