Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Regular Expressions, REXML,
Automata Learning

Regular Expressions, REXML,
Automata Learning

A presentation slide for RubyKaigi 2024 follow up
https://rhc.connpass.com/event/320709/

makenowjust/lernen - GitHub
https://github.com/makenowjust/lernen

TSUYUSATO Kitsune

August 31, 2024
Tweet

More Decks by TSUYUSATO Kitsune

Other Decks in Programming

Transcript

  1. Who am I? • Hiroy a Fujin a mi /

    Ph.D student a t NII / STORES, Inc. •  @m a ke_now_just,  @m a kenowjust • Ruby committer (regex memoiz a tion) • RubyK a igi 2024 t a lk "M a ke Your Own Regex Engine!" 2
  2. The fact: Regex matching may not terminate. 4 Regul a

    r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  3. Null-Loop with Lookahead • Why does not this m a

    tching termin a te? •ruby -e '/((?=(\3a|a))(?=(\2a|a)))*/ =~ "aaaa"' • On this m a tching, the c a pture st a tus oscill a tes. • (\2="a",\3="aa"), (\2="aaa",\3="aaaa"), (\2="a",\3="aa"), ... • Then, the m a tching does not termin a te. 5 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  4. How to Fix? • Recently, the cl a ss of

    "regex + b a ck-reference + look a he a d" is shown a s NLOG. • NLOG c a n simul a te on Poly. We think it is slow, but it is not yet implemeted. 6 Yuy a Uez a to. "Regul a r Expressions with B a ckreferences a nd Look a he a ds C a pture NLOG." ICALP 2024 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  5. My REXML Works 8 Regul a r Expressions / REXML

    / Autom a t a Le a rning Hiroy a Fujin a mi
  6. rexml-css_selector • A REXML extension for supporting CSS selectors. •

    CSS selector: p > a:first-child • This implement a tion is more generic. 9 Regul a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  7. rexml-css_selector + Prism 10 Regul a r Expressions / REXML

    / Autom a t a Le a rning call[name="require"] > arguments > string:first-child Hiroy a Fujin a mi
  8. Infer an Automaton from a Program 12 Regul a r

    Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  9. Hiroy a Fujin a mi Infer an Automaton from a

    Program 13 Regul a r Expressions / REXML / Autom a t a Le a rning
  10. Infer an Automaton from a Program 14 Regul a r

    Expressions / REXML / Autom a t a Le a rning ???? Hiroy a Fujin a mi
  11. The Ultimate Goal 15 Regul a r Expressions / REXML

    / Autom a t a Le a rning ???? ???? Prism p a rse.y di ff (model checking) bug ex a ctly s a me (m a them a tic a l result) Hiroy a Fujin a mi
  12. The Ultimate Goal: Demo 16 Regul a r Expressions /

    REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  13. The Ultimate Goal: Demo 17 Regul a r Expressions /

    REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  14. The Ultimate Goal: Demo 17 Regul a r Expressions /

    REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  15. Angluin's L* algorithm 18 Regul a r Expressions / REXML

    / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton Hiroy a Fujin a mi
  16. Angluin's L* algorithm 18 Regul a r Expressions / REXML

    / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton a progr a m to le a rn Hiroy a Fujin a mi
  17. Angluin's L* algorithm 18 Regul a r Expressions / REXML

    / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton a progr a m to le a rn a n equiv a lence query Hiroy a Fujin a mi
  18. Angluin's L* algorithm 18 Regul a r Expressions / REXML

    / Autom a t a Le a rning Automata Learning Algorithm def lstar: ( program: ^(String) -> bool, equivalence: ^(Automaton) -> true | String, ) -> Automaton a progr a m to le a rn a n equiv a lence query te a cher Hiroy a Fujin a mi
  19. 19 Regul a r Expressions / REXML / Autom a

    t a Le a rning te a cher le a rner h1 Hiroy a Fujin a mi
  20. 19 Regul a r Expressions / REXML / Autom a

    t a Le a rning te a cher le a rner h1 equivalence(h1) "111" Hiroy a Fujin a mi
  21. 19 Regul a r Expressions / REXML / Autom a

    t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi
  22. 19 Regul a r Expressions / REXML / Autom a

    t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi program("11") false
  23. 19 Regul a r Expressions / REXML / Autom a

    t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi program("11") false h3
  24. 19 Regul a r Expressions / REXML / Autom a

    t a Le a rning te a cher le a rner h1 equivalence(h1) "111" h2 Hiroy a Fujin a mi program("11") false program("10") false h3
  25. Problems of L* algorithm Regul a r Expressions / REXML

    / Autom a t a Le a rning • L* c a n infer only regul a r l a ngu a ges. • However, progr a m l a ngu a ge synt a x is context-free. • → VPA or procedur a l 20 Hiroy a Fujin a mi No a m Chomsky Chomsky Hier a rchy
  26. VPA (Visibly Pushdown Automata) Regul a r Expressions / REXML

    / Autom a t a Le a rning • Pushdown a utom a t a with nest st a rt a nd end ch a r a cters a re le a rn a ble. • This kind of PDA is c a lled VPA (visibly pushdown a utom a t a ). 21 Hiroy a Fujin a mi
  27. VPA: Example 22 Regul a r Expressions / REXML /

    Autom a t a Le a rning alphabet: %w[1 +] call: %w[(], return: %w[)] Hiroy a Fujin a mi
  28. SPA (System of Procedural Automata) Regul a r Expressions /

    REXML / Autom a t a Le a rning • A system of a utom a t a ; they c a n c a ll e a ch other recursively. • A le a rning a lgorithm for SPA is known. 23 A B Hiroy a Fujin a mi
  29. Lernen: automata learning library written in Ruby Regul a r

    Expressions / REXML / Autom a t a Le a rning • https://github.com/m a kenowjust/lernen • Implemented a lgorithms: L*, KV, a nd L#, for DFA, Moore, Me a ly, a nd VPA • Very e a sy to use!! 24 Hiroy a Fujin a mi
  30. Lernen: Examples (DFA) 25 Regul a r Expressions / REXML

    / Autom a t a Le a rning Hiroy a Fujin a mi
  31. Lernen: Examples (DFA) 25 Regul a r Expressions / REXML

    / Autom a t a Le a rning flowchart TD 0((0)) 1((1)) 2((2)) 3(((3))) 0 -- "'0'" --> 1 0 -- "'1'" --> 0 1 -- "'0'" --> 2 1 -- "'1'" --> 1 2 -- "'0'" --> 3 2 -- "'1'" --> 2 3 -- "'0'" --> 0 3 -- "'1'" --> 3 Hiroy a Fujin a mi
  32. Lernen: Examples (VPA) 26 Regul a r Expressions / REXML

    / Autom a t a Le a rning Hiroy a Fujin a mi
  33. Lernen: Examples (VPA) 26 Regul a r Expressions / REXML

    / Autom a t a Le a rning Hiroy a Fujin a mi
  34. Lernen: Examples (compare a program and a regexp) 27 Regul

    a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  35. Lernen: Examples (compare a program and a regexp) 27 Regul

    a r Expressions / REXML / Autom a t a Le a rning Hiroy a Fujin a mi
  36. Lernen: Examples (compare a program and a regexp) 27 Regul

    a r Expressions / REXML / Autom a t a Le a rning ...check_equivalence(...) # => "http://%" URI.parse("http://%") # raises URI::InvalidURIError URI.regexp(%w[http https]) .match?("http://%") # => true Hiroy a Fujin a mi
  37. Lernen: automata learning library written in Ruby Regul a r

    Expressions / REXML / Autom a t a Le a rning • https://github.com/m a kenowjust/lernen • Implemented a lgorithms: L*, KV, a nd L#, for DFA, Moore, Me a ly, a nd VPA • Very e a sy to use!! • Let's try it!!! →→→ 28 Hiroy a Fujin a mi
  38. Summary Regul a r Expressions / REXML / Autom a

    t a Le a rning • Regul a r Expressions: Null-loop with look a he a d is problem a tic. • REXML: I cre a ted rexml-css_selector libr a ry. • Autom a t a Le a rning: I cre a ted lernen libr a ry. I believe a utom a t a le a rning is one of the future forms of progr a mming. Let's try it! 29 Hiroy a Fujin a mi