$30 off During Our Annual Pro Sale. View Details »

A parser based syntax highlighter

A parser based syntax highlighter

RubyKaigi 2018
May 31st

pocke

May 31, 2018
Tweet

More Decks by pocke

Other Decks in Programming

Transcript

  1. A parser based
    syntax
    highlighter
    RubyKaigi 2018
    May 31st

    View Slide

  2. self.inspect
    ● Masataka Pocke Kuwabara
    ● Software Engineer for Cookpad Inc.
    ● Technical Advisor for SideCi, Inc.
    ● A Vimmer

    View Slide

  3. Previous talks

    View Slide

  4. View Slide

  5. I told about a syntax highlighter
    in VimConf 2017
    https://speakerdeck.com/pocke/the-new-syntax-highlighter-for-vim

    View Slide

  6. In the VimConf
    ● I told “Iro.vim”.
    ○ “Iro” means “color” in English
    ● https://github.com/pocke/iro.vim
    ● It is a Vim plugin for Syntax Highlighting.
    ● It is written in Ruby mainly.

    View Slide

  7. Regional Rubykaigi in Okinawa #2
    http://ruby.okinawa/okrk02/

    View Slide

  8. Regional Rubykaigi in Okinawa #2
    ● I told the same theme in the conference.
    ○ VimConf: For Vim users.
    ○ Okinawa RubyKaigi: For Ruby users.

    View Slide

  9. Today

    View Slide

  10. Today: RubyKaigi 2018
    ● I’m talking the same content (Iro) as Okinawa,
    but it is deeper.

    View Slide

  11. Today’s Agenda
    ● Existing syntax highlighters implementation.
    ● Problems of Existing syntax highlighter.
    ● Introduce Iro.
    ● Advantage of Iro.
    ● Implementation of Iro.
    ● The future of Iro.

    View Slide

  12. Existing Syntax Highlighters
    Implementation

    View Slide

  13. Regular Explession
    ● Many syntax highlighters are implemented with
    regular expressions.
    ○ Vim, Emacs, Atom, etcetc...

    View Slide

  14. Explore Highlighter definitions for Ruby
    ● Look Atom's highlighter definitions for Ruby
    code.
    ● Atom uses “cson” for highlighter definition.
    ○ “cson” is “CoffeeScript-Object-Notation”

    View Slide

  15. atom/language-ruby: method definition
    ● https://github.com/atom/language-ruby

    View Slide

  16. Problems of Existing syntax
    highlighters

    View Slide

  17. Syntax highlighters have two problems
    ● Hard to read code of highlighters.
    ● They does not highlight correctly.

    View Slide

  18. Problem:
    Hard to Read

    View Slide

  19. Regular Expression is difficult.
    ● Some people, when confronted with a problem,
    think “I know, I’ll use regular expressions.” Now
    they have two problems.
    Jamie Zawinski
    http://regex.info/blog/2006-09-15/247

    View Slide

  20. Do you understand the definitions easily?
    ● https://github.com/vim-ruby/vim-ruby/blob/m
    aster/syntax/ruby.vim
    ● https://github.com/atom/language-ruby/blob/
    master/grammars/ruby.cson

    View Slide

  21. Problem:
    not correct

    View Slide

  22. Existing highlighters mistakes highlight
    sometimes
    ● If highlighting is broken, we cannot understand
    meaning of the code easily.

    View Slide

  23. Example: broken highlighting
    ● It is hard to understand.

    View Slide

  24. Other examples
    ● Complex here document
    ○ e.g. Here document in a string interpolation in a here
    document
    ● Tricky code
    ○ e.g. ????::?:, % %s%% %%%% or def end(def:def
    def;end)end
    ○ You can find them from my CFP.
    ■ http://pocke.hatenablog.com/entry/2018/05/27/152708

    View Slide

  25. Why do they mistake highlighting?
    ● They re-implement parser.
    ○ So the implementations have difference between
    highlighter and language parser.
    ● Regexp is not enough to parse programing
    languages.
    ○ Regexp: for Regular Language
    ○ Many Programing Languages: Context Free Language
    ○ But many editor's implementations extend regexp, so
    probably it's enough.

    View Slide

  26. Introduce Iro

    View Slide

  27. Introduce Iro
    ● Iro is a gem that is Ripper based syntax
    highlighter.
    ● https://github.com/pocke/iro
    ● $ gem install iro
    ● Live demo: https://ruby-highlight.herokuapp.com/

    View Slide

  28. What's Ripper?
    ● A Ruby standard library.
    ○ So you can use Ripper without gem install.
    ● A Parser of Ruby.
    ● It shares parse.y with Ruby interpretor.
    ○ So Ripper understands Ruby syntax correctly.

    View Slide

  29. Difference of Iro and Iro.vim
    ● Iro is a gem.
    ○ Ruby code -> Iro -> highlighting information
    ● Iro.vim is a Vim plugin.
    ○ Iro.vim passes Ruby code to Iro, and gets the
    information from Iro.
    ○ Iro.vim has Python and YAML support also.

    View Slide

  30. Iro.vim
    When code is
    changed
    Highlight
    Iro
    Parse the code
    Ruby source code
    Highlight positions

    View Slide

  31. Advantages of Iro

    View Slide

  32. Advantages of Iro
    ● Easy to read Iro's code.
    ● Iro can highlight correctly.
    ● Highlighting local variables.
    ● One implementation for many editors.

    View Slide

  33. Easy to read
    Iro’s code

    View Slide

  34. Iro is written in Ruby
    ● Iro uses Ripper instead of regexp.
    ● So I do not need re-implement parser with
    regular expression.
    ● Detachment of Parser from Syntax Highlighter.

    View Slide

  35. Iro can
    highlight
    correctly

    View Slide

  36. Iro can highlight correctly.
    ● Iro uses Ripper.
    ● Ripper is the CRuby parser.
    ● So Iro can highlight code same as CRuby.

    View Slide

  37. Highlighting
    local variables

    View Slide

  38. Iro highlights local variables
    ● Iro can highlight local variables.
    ○ Vim, Atom and VSCode don’t have this feature, but
    RubyMine can highlight them.
    Iro Existing Highlighter

    View Slide

  39. One
    implementation
    for many
    editors

    View Slide

  40. One implementation for many editors
    ● Each existing syntax highlighter has a different
    regexp evaluator.
    ● So each highlighter should define different
    syntax definition.

    View Slide

  41. One implementation for many editors
    ● Iro is a gem, and Iro is a protocol.
    ● So you can implement a highlighter for your
    editor with the gem!
    ● But currently Vim implementation only exists.

    View Slide

  42. Iro.vim and Iro
    Iro.vim
    When code is
    changed
    Highlight
    Iro
    Parse the code
    Ruby source code
    Highlight positions

    View Slide

  43. Iro emacs and Iro
    Iro Emacs
    When code is
    changed
    Highlight
    Iro
    Parse the code
    Ruby source code
    Highlight positions

    View Slide

  44. Iro VSCode and Iro
    Iro VSCode
    When code is
    changed
    Highlight
    Iro
    Parse the code
    Ruby source code
    Highlight positions

    View Slide

  45. Iro.vim and Iro.py
    Iro VSCode
    When code is
    changed
    Highlight
    Iro.py
    Parse the code
    Python source code
    Highlight positions

    View Slide

  46. Implementation of Iro

    View Slide

  47. Highlighting hello.rb by Iro
    def hello(name)
    puts "hello, #{name}"
    end

    View Slide

  48. Understanding Ripper behaviour
    ● Let know Ripper behaviour with Ripper.lex
    and Ripper.sexp methods.
    ○ Ripper.lex: Tokenize with meta information
    ○ Ripper.sexp: Parser to S-Expression
    ● And Event driven API.

    View Slide

  49. Ripper.lex

    View Slide

  50. Ripper.lex
    $ ruby -rripper -e
    'pp Ripper.lex(File.read("hello.rb"))'
    [[[1, 0], :on_kw, "def", EXPR_FNAME],
    [[1, 3], :on_sp, " ", EXPR_FNAME],
    [[1, 4], :on_ident, "hello", EXPR_ENDFN],
    [[1, 9], :on_lparen, "(", EXPR_BEG|EXPR_LABEL],
    [[1, 10], :on_ident, "name", EXPR_ARG],
    [[1, 14], :on_rparen, ")", EXPR_ENDFN],
    [[1, 15], :on_ignored_nl, "\n", EXPR_BEG],
    [[2, 0], :on_sp, " ", EXPR_BEG],
    [[2, 2], :on_ident, "puts", EXPR_CMDARG],
    [[2, 6], :on_sp, " ", EXPR_CMDARG],
    [[2, 7], :on_tstring_beg, "\"", EXPR_CMDARG],
    [[2, 8], :on_tstring_content, "hello, ", EXPR_CMDARG],

    View Slide

  51. Ripper.lex output
    [[[1, 0], :on_kw, "def", EXPR_FNAME],
    [[1, 3], :on_sp, " ", EXPR_FNAME],
    [[1, 4], :on_ident, "hello", EXPR_ENDFN], …
    ● It is an array of
    ○ Position
    ○ Scanner event name
    ■ pp Ripper::SCANNER_EVENTS
    ○ source code
    ○ lex state (since Ruby 2.5)

    View Slide

  52. What does Iro uses?
    ● Iro uses the position and source code for
    highlighting position.
    ● Iro uses the event name for highlighting group.
    Example:
    [[1, 0], :on_kw, "def", EXPR_FNAME]
    :on_kw -> Iro highlight it as Keyword.
    [1, 0] and "def" -> Iro highlight line 1, column 0,
    size 3.

    View Slide

  53. Ripper.sexp

    View Slide

  54. Ripper.sexp
    $ ruby -rripper -e 'pp
    Ripper.sexp(File.read("test.rb"))'
    [:program,
    [[:def,
    [:@ident, "hello", [1, 4]],
    [:paren,
    [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]],
    [:bodystmt,
    [[:command,
    [:@ident, "puts", [2, 2]],
    [:args_add_block,
    [[:string_literal,
    [:string_content,
    [:@tstring_content, "hello, ", [2, 8]],

    View Slide

  55. Ripper.sexp
    ● It is a S-expression.
    ● [:TYPE, child_sexp1, child_sexp2, ...]
    ● child_sexp is a S-expression or a lex output.

    View Slide

  56. Event driven
    Ripper API

    View Slide

  57. Iro does not use #sexp and #lex
    ● I described them, but Iro does not use them.
    ● Iro uses event driven API instead.

    View Slide

  58. Event Driven Ripper API
    ● Ripper provides event driven API.
    ● Ripper calls on_TYPE method when it visits
    TYPE.
    ○ e.g. Ripper call on_kw method when it reads kw token.

    View Slide

  59. Example: scanner event
    require 'ripper'
    class MyRipper < Ripper
    def on_kw(tok)
    puts "type: kw, source: #{tok},
    position: #{lineno}, #{column}"
    end
    end
    MyRipper.parse(ARGF.read)

    View Slide

  60. Output
    $ ruby myripper.rb hello.rb
    type: kw, source: def, position: 1, 0
    type: kw, source: end, position: 3, 0

    View Slide

  61. Example: parser event
    require 'ripper'
    class MyRipper < Ripper::SexpBuilderPP
    def on_def(name, params, body)
    p name
    p params
    p body
    end
    end
    MyRipper.parse(ARGF.read)

    View Slide

  62. Output
    $ ruby test.rb hello.rb
    [:@ident, "hello", [1, 4]]
    [:paren, [:params, [[:@ident,
    "name", [1, 10]]], nil, nil, nil,
    nil, nil, nil]]
    [:bodystmt, [[:command, [:@ident,
    "puts", [2, 2]], [:args_add_block,
    [[:string_literal, [:string_content,
    [:@tstring_content, "hello, ", ...

    View Slide

  63. Implementation
    of Iro

    View Slide

  64. See pocke/iro/lib/iro/ruby/parser.rb
    https://github.com/pocke/iro/blob/master/lib/iro/
    ruby/parser.rb

    View Slide

  65. The future of Iro

    View Slide

  66. Inline highlighting
    ● Highlight code in code
    ○ For example:
    # Here is Ruby code
    <<~SQL
    # Here is SQL
    SELECT * FROM users;
    SQL

    View Slide

  67. More languages support
    ● Currently Iro.vim supports Ruby, YAML and
    Python.
    ○ “Iro.vim” has python and YAML support, so I’d like to
    extract the implementation to a gem or something.
    ● I’d like to add support Slim, Markdown.
    ○ Because Slim is a difficult language.
    ○ Markdown has inline code block.

    View Slide

  68. More editors support
    ● Currently Iro supports Vim and HTML only.
    ● I believe we can use Iro in other editors.
    ○ e.g. Emacs, Atom or something.

    View Slide

  69. FAQ

    View Slide

  70. How about performance?
    ● I can use Iro.vim comfortable.
    ○ ⭕ 3,000 lines
    ○ ❌ 30,000 lines
    ● But I haven't compare performance with other
    implementations.
    ○ Because I'm not sure how to compare highlighter's
    performance.

    View Slide

  71. Does Iro work on broken Ruby file?
    ● Code has syntax errors while editing.
    ● So syntax highlighters should be able to
    highlight broken code.
    ● Iro can highlight in almost cases.

    View Slide

  72. Example
    ● It has a syntax error, but Iro highlight it.

    View Slide

  73. Conclusion

    View Slide

  74. Conclusion
    ● Iro is a Ripper based syntax highlighter.
    ○ It can highlight correctly.
    ● You can try using Iro now!
    ○ For Vimmer: https://github.com/pocke/iro.vim
    ○ Web demo: https://ruby-highlight.herokuapp.com
    Thank you for listening!

    View Slide

  75. Events at our Booth
    【Day 2】
    12:00~13:00 Q&A with @pocke
    15:20~15:50 Global Office Hours
    【Day 3】
    12:00~13:00 Q&A with @wyhaines
    15:20~15:50 Ruby interpreter development live by @ko1 & @mame
    Cookpad X RubyKaigi 2018: Day 2 Party
    ⏰ June 1st, 19:30 - 21:30 (opens 19:00)
    Free (Registration required)
    Show up to this booth at 18:40 if you want to head with us!

    View Slide