Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A parser based syntax highlighter

A parser based syntax highlighter

RubyKaigi 2018
May 31st


May 31, 2018

More Decks by pocke

Other Decks in Programming


  1. self.inspect • Masataka Pocke Kuwabara • Software Engineer for Cookpad

    Inc. • Technical Advisor for SideCi, Inc. • A Vimmer
  2. In the VimConf • I told “Iro.vim”. ◦ “Iro” means

    “color” in English • https://github.com/pocke/iro.vim • It is a Vim plugin for Syntax Highlighting. • It is written in Ruby mainly.
  3. Regional Rubykaigi in Okinawa #2 • I told the same

    theme in the conference. ◦ VimConf: For Vim users. ◦ Okinawa RubyKaigi: For Ruby users.
  4. Today’s Agenda • Existing syntax highlighters implementation. • Problems of

    Existing syntax highlighter. • Introduce Iro. • Advantage of Iro. • Implementation of Iro. • The future of Iro.
  5. Explore Highlighter definitions for Ruby • Look Atom's highlighter definitions

    for Ruby code. • Atom uses “cson” for highlighter definition. ◦ “cson” is “CoffeeScript-Object-Notation”
  6. Syntax highlighters have two problems • Hard to read code

    of highlighters. • They does not highlight correctly.
  7. Regular Expression is difficult. • Some people, when confronted with

    a problem, think “I know, I’ll use regular expressions.” Now they have two problems. Jamie Zawinski http://regex.info/blog/2006-09-15/247
  8. Other examples • Complex here document ◦ e.g. Here document

    in a string interpolation in a here document • Tricky code ◦ e.g. ????::?:, % %s%% %%%% or def end(def:def def;end)end ◦ You can find them from my CFP. ▪ http://pocke.hatenablog.com/entry/2018/05/27/152708
  9. Why do they mistake highlighting? • They re-implement parser. ◦

    So the implementations have difference between highlighter and language parser. • Regexp is not enough to parse programing languages. ◦ Regexp: for Regular Language ◦ Many Programing Languages: Context Free Language ◦ But many editor's implementations extend regexp, so probably it's enough.
  10. Introduce Iro • Iro is a gem that is Ripper

    based syntax highlighter. • https://github.com/pocke/iro • $ gem install iro • Live demo: https://ruby-highlight.herokuapp.com/
  11. What's Ripper? • A Ruby standard library. ◦ So you

    can use Ripper without gem install. • A Parser of Ruby. • It shares parse.y with Ruby interpretor. ◦ So Ripper understands Ruby syntax correctly.
  12. Difference of Iro and Iro.vim • Iro is a gem.

    ◦ Ruby code -> Iro -> highlighting information • Iro.vim is a Vim plugin. ◦ Iro.vim passes Ruby code to Iro, and gets the information from Iro. ◦ Iro.vim has Python and YAML support also.
  13. Iro.vim When code is changed Highlight Iro Parse the code

    Ruby source code Highlight positions
  14. Advantages of Iro • Easy to read Iro's code. •

    Iro can highlight correctly. • Highlighting local variables. • One implementation for many editors.
  15. Iro is written in Ruby • Iro uses Ripper instead

    of regexp. • So I do not need re-implement parser with regular expression. • Detachment of Parser from Syntax Highlighter.
  16. Iro can highlight correctly. • Iro uses Ripper. • Ripper

    is the CRuby parser. • So Iro can highlight code same as CRuby.
  17. Iro highlights local variables • Iro can highlight local variables.

    ◦ Vim, Atom and VSCode don’t have this feature, but RubyMine can highlight them. Iro Existing Highlighter
  18. One implementation for many editors • Each existing syntax highlighter

    has a different regexp evaluator. • So each highlighter should define different syntax definition.
  19. One implementation for many editors • Iro is a gem,

    and Iro is a protocol. • So you can implement a highlighter for your editor with the gem! • But currently Vim implementation only exists.
  20. Iro.vim and Iro Iro.vim When code is changed Highlight Iro

    Parse the code Ruby source code Highlight positions
  21. Iro emacs and Iro Iro Emacs When code is changed

    Highlight Iro Parse the code Ruby source code Highlight positions
  22. Iro VSCode and Iro Iro VSCode When code is changed

    Highlight Iro Parse the code Ruby source code Highlight positions
  23. Iro.vim and Iro.py Iro VSCode When code is changed Highlight

    Iro.py Parse the code Python source code Highlight positions
  24. Understanding Ripper behaviour • Let know Ripper behaviour with Ripper.lex

    and Ripper.sexp methods. ◦ Ripper.lex: Tokenize with meta information ◦ Ripper.sexp: Parser to S-Expression • And Event driven API.
  25. Ripper.lex $ ruby -rripper -e 'pp Ripper.lex(File.read("hello.rb"))' [[[1, 0], :on_kw,

    "def", EXPR_FNAME], [[1, 3], :on_sp, " ", EXPR_FNAME], [[1, 4], :on_ident, "hello", EXPR_ENDFN], [[1, 9], :on_lparen, "(", EXPR_BEG|EXPR_LABEL], [[1, 10], :on_ident, "name", EXPR_ARG], [[1, 14], :on_rparen, ")", EXPR_ENDFN], [[1, 15], :on_ignored_nl, "\n", EXPR_BEG], [[2, 0], :on_sp, " ", EXPR_BEG], [[2, 2], :on_ident, "puts", EXPR_CMDARG], [[2, 6], :on_sp, " ", EXPR_CMDARG], [[2, 7], :on_tstring_beg, "\"", EXPR_CMDARG], [[2, 8], :on_tstring_content, "hello, ", EXPR_CMDARG],
  26. Ripper.lex output [[[1, 0], :on_kw, "def", EXPR_FNAME], [[1, 3], :on_sp,

    " ", EXPR_FNAME], [[1, 4], :on_ident, "hello", EXPR_ENDFN], … • It is an array of ◦ Position ◦ Scanner event name ▪ pp Ripper::SCANNER_EVENTS ◦ source code ◦ lex state (since Ruby 2.5)
  27. What does Iro uses? • Iro uses the position and

    source code for highlighting position. • Iro uses the event name for highlighting group. Example: [[1, 0], :on_kw, "def", EXPR_FNAME] :on_kw -> Iro highlight it as Keyword. [1, 0] and "def" -> Iro highlight line 1, column 0, size 3.
  28. Ripper.sexp $ ruby -rripper -e 'pp Ripper.sexp(File.read("test.rb"))' [:program, [[:def, [:@ident,

    "hello", [1, 4]], [:paren, [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]], [:bodystmt, [[:command, [:@ident, "puts", [2, 2]], [:args_add_block, [[:string_literal, [:string_content, [:@tstring_content, "hello, ", [2, 8]],
  29. Ripper.sexp • It is a S-expression. • [:TYPE, child_sexp1, child_sexp2,

    ...] • child_sexp is a S-expression or a lex output.
  30. Iro does not use #sexp and #lex • I described

    them, but Iro does not use them. • Iro uses event driven API instead.
  31. Event Driven Ripper API • Ripper provides event driven API.

    • Ripper calls on_TYPE method when it visits TYPE. ◦ e.g. Ripper call on_kw method when it reads kw token.
  32. Example: scanner event require 'ripper' class MyRipper < Ripper def

    on_kw(tok) puts "type: kw, source: #{tok}, position: #{lineno}, #{column}" end end MyRipper.parse(ARGF.read)
  33. Output $ ruby myripper.rb hello.rb type: kw, source: def, position:

    1, 0 type: kw, source: end, position: 3, 0
  34. Example: parser event require 'ripper' class MyRipper < Ripper::SexpBuilderPP def

    on_def(name, params, body) p name p params p body end end MyRipper.parse(ARGF.read)
  35. Output $ ruby test.rb hello.rb [:@ident, "hello", [1, 4]] [:paren,

    [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]] [:bodystmt, [[:command, [:@ident, "puts", [2, 2]], [:args_add_block, [[:string_literal, [:string_content, [:@tstring_content, "hello, ", ...
  36. Inline highlighting • Highlight code in code ◦ For example:

    # Here is Ruby code <<~SQL # Here is SQL SELECT * FROM users; SQL
  37. More languages support • Currently Iro.vim supports Ruby, YAML and

    Python. ◦ “Iro.vim” has python and YAML support, so I’d like to extract the implementation to a gem or something. • I’d like to add support Slim, Markdown. ◦ Because Slim is a difficult language. ◦ Markdown has inline code block.
  38. More editors support • Currently Iro supports Vim and HTML

    only. • I believe we can use Iro in other editors. ◦ e.g. Emacs, Atom or something.
  39. FAQ

  40. How about performance? • I can use Iro.vim comfortable. ◦

    ⭕ 3,000 lines ◦ ❌ 30,000 lines • But I haven't compare performance with other implementations. ◦ Because I'm not sure how to compare highlighter's performance.
  41. Does Iro work on broken Ruby file? • Code has

    syntax errors while editing. • So syntax highlighters should be able to highlight broken code. • Iro can highlight in almost cases.
  42. Conclusion • Iro is a Ripper based syntax highlighter. ◦

    It can highlight correctly. • You can try using Iro now! ◦ For Vimmer: https://github.com/pocke/iro.vim ◦ Web demo: https://ruby-highlight.herokuapp.com Thank you for listening!
  43. Events at our Booth 【Day 2】 12:00~13:00 Q&A with @pocke

    15:20~15:50 Global Office Hours 【Day 3】 12:00~13:00 Q&A with @wyhaines 15:20~15:50 Ruby interpreter development live by @ko1 & @mame Cookpad X RubyKaigi 2018: Day 2 Party ⏰ June 1st, 19:30 - 21:30 (opens 19:00) Free (Registration required) Show up to this booth at 18:40 if you want to head with us!