A parser based syntax highlighter

A parser based syntax highlighter

RubyKaigi 2018
May 31st

7bc6612fa20296bf652f6b0357db81c1?s=128

pocke

May 31, 2018
Tweet

Transcript

  1. A parser based syntax highlighter RubyKaigi 2018 May 31st

  2. self.inspect • Masataka Pocke Kuwabara • Software Engineer for Cookpad

    Inc. • Technical Advisor for SideCi, Inc. • A Vimmer
  3. Previous talks

  4. None
  5. I told about a syntax highlighter in VimConf 2017 https://speakerdeck.com/pocke/the-new-syntax-highlighter-for-vim

  6. In the VimConf • I told “Iro.vim”. ◦ “Iro” means

    “color” in English • https://github.com/pocke/iro.vim • It is a Vim plugin for Syntax Highlighting. • It is written in Ruby mainly.
  7. Regional Rubykaigi in Okinawa #2 http://ruby.okinawa/okrk02/

  8. Regional Rubykaigi in Okinawa #2 • I told the same

    theme in the conference. ◦ VimConf: For Vim users. ◦ Okinawa RubyKaigi: For Ruby users.
  9. Today

  10. Today: RubyKaigi 2018 • I’m talking the same content (Iro)

    as Okinawa, but it is deeper.
  11. Today’s Agenda • Existing syntax highlighters implementation. • Problems of

    Existing syntax highlighter. • Introduce Iro. • Advantage of Iro. • Implementation of Iro. • The future of Iro.
  12. Existing Syntax Highlighters Implementation

  13. Regular Explession • Many syntax highlighters are implemented with regular

    expressions. ◦ Vim, Emacs, Atom, etcetc...
  14. Explore Highlighter definitions for Ruby • Look Atom's highlighter definitions

    for Ruby code. • Atom uses “cson” for highlighter definition. ◦ “cson” is “CoffeeScript-Object-Notation”
  15. atom/language-ruby: method definition • https://github.com/atom/language-ruby

  16. Problems of Existing syntax highlighters

  17. Syntax highlighters have two problems • Hard to read code

    of highlighters. • They does not highlight correctly.
  18. Problem: Hard to Read

  19. Regular Expression is difficult. • Some people, when confronted with

    a problem, think “I know, I’ll use regular expressions.” Now they have two problems. Jamie Zawinski http://regex.info/blog/2006-09-15/247
  20. Do you understand the definitions easily? • https://github.com/vim-ruby/vim-ruby/blob/m aster/syntax/ruby.vim •

    https://github.com/atom/language-ruby/blob/ master/grammars/ruby.cson
  21. Problem: not correct

  22. Existing highlighters mistakes highlight sometimes • If highlighting is broken,

    we cannot understand meaning of the code easily.
  23. Example: broken highlighting • It is hard to understand.

  24. Other examples • Complex here document ◦ e.g. Here document

    in a string interpolation in a here document • Tricky code ◦ e.g. ????::?:, % %s%% %%%% or def end(def:def def;end)end ◦ You can find them from my CFP. ▪ http://pocke.hatenablog.com/entry/2018/05/27/152708
  25. Why do they mistake highlighting? • They re-implement parser. ◦

    So the implementations have difference between highlighter and language parser. • Regexp is not enough to parse programing languages. ◦ Regexp: for Regular Language ◦ Many Programing Languages: Context Free Language ◦ But many editor's implementations extend regexp, so probably it's enough.
  26. Introduce Iro

  27. Introduce Iro • Iro is a gem that is Ripper

    based syntax highlighter. • https://github.com/pocke/iro • $ gem install iro • Live demo: https://ruby-highlight.herokuapp.com/
  28. What's Ripper? • A Ruby standard library. ◦ So you

    can use Ripper without gem install. • A Parser of Ruby. • It shares parse.y with Ruby interpretor. ◦ So Ripper understands Ruby syntax correctly.
  29. Difference of Iro and Iro.vim • Iro is a gem.

    ◦ Ruby code -> Iro -> highlighting information • Iro.vim is a Vim plugin. ◦ Iro.vim passes Ruby code to Iro, and gets the information from Iro. ◦ Iro.vim has Python and YAML support also.
  30. Iro.vim When code is changed Highlight Iro Parse the code

    Ruby source code Highlight positions
  31. Advantages of Iro

  32. Advantages of Iro • Easy to read Iro's code. •

    Iro can highlight correctly. • Highlighting local variables. • One implementation for many editors.
  33. Easy to read Iro’s code

  34. Iro is written in Ruby • Iro uses Ripper instead

    of regexp. • So I do not need re-implement parser with regular expression. • Detachment of Parser from Syntax Highlighter.
  35. Iro can highlight correctly

  36. Iro can highlight correctly. • Iro uses Ripper. • Ripper

    is the CRuby parser. • So Iro can highlight code same as CRuby.
  37. Highlighting local variables

  38. Iro highlights local variables • Iro can highlight local variables.

    ◦ Vim, Atom and VSCode don’t have this feature, but RubyMine can highlight them. Iro Existing Highlighter
  39. One implementation for many editors

  40. One implementation for many editors • Each existing syntax highlighter

    has a different regexp evaluator. • So each highlighter should define different syntax definition.
  41. One implementation for many editors • Iro is a gem,

    and Iro is a protocol. • So you can implement a highlighter for your editor with the gem! • But currently Vim implementation only exists.
  42. Iro.vim and Iro Iro.vim When code is changed Highlight Iro

    Parse the code Ruby source code Highlight positions
  43. Iro emacs and Iro Iro Emacs When code is changed

    Highlight Iro Parse the code Ruby source code Highlight positions
  44. Iro VSCode and Iro Iro VSCode When code is changed

    Highlight Iro Parse the code Ruby source code Highlight positions
  45. Iro.vim and Iro.py Iro VSCode When code is changed Highlight

    Iro.py Parse the code Python source code Highlight positions
  46. Implementation of Iro

  47. Highlighting hello.rb by Iro def hello(name) puts "hello, #{name}" end

  48. Understanding Ripper behaviour • Let know Ripper behaviour with Ripper.lex

    and Ripper.sexp methods. ◦ Ripper.lex: Tokenize with meta information ◦ Ripper.sexp: Parser to S-Expression • And Event driven API.
  49. Ripper.lex

  50. Ripper.lex $ ruby -rripper -e 'pp Ripper.lex(File.read("hello.rb"))' [[[1, 0], :on_kw,

    "def", EXPR_FNAME], [[1, 3], :on_sp, " ", EXPR_FNAME], [[1, 4], :on_ident, "hello", EXPR_ENDFN], [[1, 9], :on_lparen, "(", EXPR_BEG|EXPR_LABEL], [[1, 10], :on_ident, "name", EXPR_ARG], [[1, 14], :on_rparen, ")", EXPR_ENDFN], [[1, 15], :on_ignored_nl, "\n", EXPR_BEG], [[2, 0], :on_sp, " ", EXPR_BEG], [[2, 2], :on_ident, "puts", EXPR_CMDARG], [[2, 6], :on_sp, " ", EXPR_CMDARG], [[2, 7], :on_tstring_beg, "\"", EXPR_CMDARG], [[2, 8], :on_tstring_content, "hello, ", EXPR_CMDARG],
  51. Ripper.lex output [[[1, 0], :on_kw, "def", EXPR_FNAME], [[1, 3], :on_sp,

    " ", EXPR_FNAME], [[1, 4], :on_ident, "hello", EXPR_ENDFN], … • It is an array of ◦ Position ◦ Scanner event name ▪ pp Ripper::SCANNER_EVENTS ◦ source code ◦ lex state (since Ruby 2.5)
  52. What does Iro uses? • Iro uses the position and

    source code for highlighting position. • Iro uses the event name for highlighting group. Example: [[1, 0], :on_kw, "def", EXPR_FNAME] :on_kw -> Iro highlight it as Keyword. [1, 0] and "def" -> Iro highlight line 1, column 0, size 3.
  53. Ripper.sexp

  54. Ripper.sexp $ ruby -rripper -e 'pp Ripper.sexp(File.read("test.rb"))' [:program, [[:def, [:@ident,

    "hello", [1, 4]], [:paren, [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]], [:bodystmt, [[:command, [:@ident, "puts", [2, 2]], [:args_add_block, [[:string_literal, [:string_content, [:@tstring_content, "hello, ", [2, 8]],
  55. Ripper.sexp • It is a S-expression. • [:TYPE, child_sexp1, child_sexp2,

    ...] • child_sexp is a S-expression or a lex output.
  56. Event driven Ripper API

  57. Iro does not use #sexp and #lex • I described

    them, but Iro does not use them. • Iro uses event driven API instead.
  58. Event Driven Ripper API • Ripper provides event driven API.

    • Ripper calls on_TYPE method when it visits TYPE. ◦ e.g. Ripper call on_kw method when it reads kw token.
  59. Example: scanner event require 'ripper' class MyRipper < Ripper def

    on_kw(tok) puts "type: kw, source: #{tok}, position: #{lineno}, #{column}" end end MyRipper.parse(ARGF.read)
  60. Output $ ruby myripper.rb hello.rb type: kw, source: def, position:

    1, 0 type: kw, source: end, position: 3, 0
  61. Example: parser event require 'ripper' class MyRipper < Ripper::SexpBuilderPP def

    on_def(name, params, body) p name p params p body end end MyRipper.parse(ARGF.read)
  62. Output $ ruby test.rb hello.rb [:@ident, "hello", [1, 4]] [:paren,

    [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]] [:bodystmt, [[:command, [:@ident, "puts", [2, 2]], [:args_add_block, [[:string_literal, [:string_content, [:@tstring_content, "hello, ", ...
  63. Implementation of Iro

  64. See pocke/iro/lib/iro/ruby/parser.rb https://github.com/pocke/iro/blob/master/lib/iro/ ruby/parser.rb

  65. The future of Iro

  66. Inline highlighting • Highlight code in code ◦ For example:

    # Here is Ruby code <<~SQL # Here is SQL SELECT * FROM users; SQL
  67. More languages support • Currently Iro.vim supports Ruby, YAML and

    Python. ◦ “Iro.vim” has python and YAML support, so I’d like to extract the implementation to a gem or something. • I’d like to add support Slim, Markdown. ◦ Because Slim is a difficult language. ◦ Markdown has inline code block.
  68. More editors support • Currently Iro supports Vim and HTML

    only. • I believe we can use Iro in other editors. ◦ e.g. Emacs, Atom or something.
  69. FAQ

  70. How about performance? • I can use Iro.vim comfortable. ◦

    ⭕ 3,000 lines ◦ ❌ 30,000 lines • But I haven't compare performance with other implementations. ◦ Because I'm not sure how to compare highlighter's performance.
  71. Does Iro work on broken Ruby file? • Code has

    syntax errors while editing. • So syntax highlighters should be able to highlight broken code. • Iro can highlight in almost cases.
  72. Example • It has a syntax error, but Iro highlight

    it.
  73. Conclusion

  74. Conclusion • Iro is a Ripper based syntax highlighter. ◦

    It can highlight correctly. • You can try using Iro now! ◦ For Vimmer: https://github.com/pocke/iro.vim ◦ Web demo: https://ruby-highlight.herokuapp.com Thank you for listening!
  75. Events at our Booth 【Day 2】 12:00~13:00 Q&A with @pocke

    15:20~15:50 Global Office Hours 【Day 3】 12:00~13:00 Q&A with @wyhaines 15:20~15:50 Ruby interpreter development live by @ko1 & @mame Cookpad X RubyKaigi 2018: Day 2 Party ⏰ June 1st, 19:30 - 21:30 (opens 19:00) Free (Registration required) Show up to this booth at 18:40 if you want to head with us!