Slide 1

Slide 1 text

A parser based syntax highlighter RubyKaigi 2018 May 31st

Slide 2

Slide 2 text

self.inspect ● Masataka Pocke Kuwabara ● Software Engineer for Cookpad Inc. ● Technical Advisor for SideCi, Inc. ● A Vimmer

Slide 3

Slide 3 text

Previous talks

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

I told about a syntax highlighter in VimConf 2017 https://speakerdeck.com/pocke/the-new-syntax-highlighter-for-vim

Slide 6

Slide 6 text

In the VimConf ● I told “Iro.vim”. ○ “Iro” means “color” in English ● https://github.com/pocke/iro.vim ● It is a Vim plugin for Syntax Highlighting. ● It is written in Ruby mainly.

Slide 7

Slide 7 text

Regional Rubykaigi in Okinawa #2 http://ruby.okinawa/okrk02/

Slide 8

Slide 8 text

Regional Rubykaigi in Okinawa #2 ● I told the same theme in the conference. ○ VimConf: For Vim users. ○ Okinawa RubyKaigi: For Ruby users.

Slide 9

Slide 9 text

Today

Slide 10

Slide 10 text

Today: RubyKaigi 2018 ● I’m talking the same content (Iro) as Okinawa, but it is deeper.

Slide 11

Slide 11 text

Today’s Agenda ● Existing syntax highlighters implementation. ● Problems of Existing syntax highlighter. ● Introduce Iro. ● Advantage of Iro. ● Implementation of Iro. ● The future of Iro.

Slide 12

Slide 12 text

Existing Syntax Highlighters Implementation

Slide 13

Slide 13 text

Regular Explession ● Many syntax highlighters are implemented with regular expressions. ○ Vim, Emacs, Atom, etcetc...

Slide 14

Slide 14 text

Explore Highlighter definitions for Ruby ● Look Atom's highlighter definitions for Ruby code. ● Atom uses “cson” for highlighter definition. ○ “cson” is “CoffeeScript-Object-Notation”

Slide 15

Slide 15 text

atom/language-ruby: method definition ● https://github.com/atom/language-ruby

Slide 16

Slide 16 text

Problems of Existing syntax highlighters

Slide 17

Slide 17 text

Syntax highlighters have two problems ● Hard to read code of highlighters. ● They does not highlight correctly.

Slide 18

Slide 18 text

Problem: Hard to Read

Slide 19

Slide 19 text

Regular Expression is difficult. ● Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems. Jamie Zawinski http://regex.info/blog/2006-09-15/247

Slide 20

Slide 20 text

Do you understand the definitions easily? ● https://github.com/vim-ruby/vim-ruby/blob/m aster/syntax/ruby.vim ● https://github.com/atom/language-ruby/blob/ master/grammars/ruby.cson

Slide 21

Slide 21 text

Problem: not correct

Slide 22

Slide 22 text

Existing highlighters mistakes highlight sometimes ● If highlighting is broken, we cannot understand meaning of the code easily.

Slide 23

Slide 23 text

Example: broken highlighting ● It is hard to understand.

Slide 24

Slide 24 text

Other examples ● Complex here document ○ e.g. Here document in a string interpolation in a here document ● Tricky code ○ e.g. ????::?:, % %s%% %%%% or def end(def:def def;end)end ○ You can find them from my CFP. ■ http://pocke.hatenablog.com/entry/2018/05/27/152708

Slide 25

Slide 25 text

Why do they mistake highlighting? ● They re-implement parser. ○ So the implementations have difference between highlighter and language parser. ● Regexp is not enough to parse programing languages. ○ Regexp: for Regular Language ○ Many Programing Languages: Context Free Language ○ But many editor's implementations extend regexp, so probably it's enough.

Slide 26

Slide 26 text

Introduce Iro

Slide 27

Slide 27 text

Introduce Iro ● Iro is a gem that is Ripper based syntax highlighter. ● https://github.com/pocke/iro ● $ gem install iro ● Live demo: https://ruby-highlight.herokuapp.com/

Slide 28

Slide 28 text

What's Ripper? ● A Ruby standard library. ○ So you can use Ripper without gem install. ● A Parser of Ruby. ● It shares parse.y with Ruby interpretor. ○ So Ripper understands Ruby syntax correctly.

Slide 29

Slide 29 text

Difference of Iro and Iro.vim ● Iro is a gem. ○ Ruby code -> Iro -> highlighting information ● Iro.vim is a Vim plugin. ○ Iro.vim passes Ruby code to Iro, and gets the information from Iro. ○ Iro.vim has Python and YAML support also.

Slide 30

Slide 30 text

Iro.vim When code is changed Highlight Iro Parse the code Ruby source code Highlight positions

Slide 31

Slide 31 text

Advantages of Iro

Slide 32

Slide 32 text

Advantages of Iro ● Easy to read Iro's code. ● Iro can highlight correctly. ● Highlighting local variables. ● One implementation for many editors.

Slide 33

Slide 33 text

Easy to read Iro’s code

Slide 34

Slide 34 text

Iro is written in Ruby ● Iro uses Ripper instead of regexp. ● So I do not need re-implement parser with regular expression. ● Detachment of Parser from Syntax Highlighter.

Slide 35

Slide 35 text

Iro can highlight correctly

Slide 36

Slide 36 text

Iro can highlight correctly. ● Iro uses Ripper. ● Ripper is the CRuby parser. ● So Iro can highlight code same as CRuby.

Slide 37

Slide 37 text

Highlighting local variables

Slide 38

Slide 38 text

Iro highlights local variables ● Iro can highlight local variables. ○ Vim, Atom and VSCode don’t have this feature, but RubyMine can highlight them. Iro Existing Highlighter

Slide 39

Slide 39 text

One implementation for many editors

Slide 40

Slide 40 text

One implementation for many editors ● Each existing syntax highlighter has a different regexp evaluator. ● So each highlighter should define different syntax definition.

Slide 41

Slide 41 text

One implementation for many editors ● Iro is a gem, and Iro is a protocol. ● So you can implement a highlighter for your editor with the gem! ● But currently Vim implementation only exists.

Slide 42

Slide 42 text

Iro.vim and Iro Iro.vim When code is changed Highlight Iro Parse the code Ruby source code Highlight positions

Slide 43

Slide 43 text

Iro emacs and Iro Iro Emacs When code is changed Highlight Iro Parse the code Ruby source code Highlight positions

Slide 44

Slide 44 text

Iro VSCode and Iro Iro VSCode When code is changed Highlight Iro Parse the code Ruby source code Highlight positions

Slide 45

Slide 45 text

Iro.vim and Iro.py Iro VSCode When code is changed Highlight Iro.py Parse the code Python source code Highlight positions

Slide 46

Slide 46 text

Implementation of Iro

Slide 47

Slide 47 text

Highlighting hello.rb by Iro def hello(name) puts "hello, #{name}" end

Slide 48

Slide 48 text

Understanding Ripper behaviour ● Let know Ripper behaviour with Ripper.lex and Ripper.sexp methods. ○ Ripper.lex: Tokenize with meta information ○ Ripper.sexp: Parser to S-Expression ● And Event driven API.

Slide 49

Slide 49 text

Ripper.lex

Slide 50

Slide 50 text

Ripper.lex $ ruby -rripper -e 'pp Ripper.lex(File.read("hello.rb"))' [[[1, 0], :on_kw, "def", EXPR_FNAME], [[1, 3], :on_sp, " ", EXPR_FNAME], [[1, 4], :on_ident, "hello", EXPR_ENDFN], [[1, 9], :on_lparen, "(", EXPR_BEG|EXPR_LABEL], [[1, 10], :on_ident, "name", EXPR_ARG], [[1, 14], :on_rparen, ")", EXPR_ENDFN], [[1, 15], :on_ignored_nl, "\n", EXPR_BEG], [[2, 0], :on_sp, " ", EXPR_BEG], [[2, 2], :on_ident, "puts", EXPR_CMDARG], [[2, 6], :on_sp, " ", EXPR_CMDARG], [[2, 7], :on_tstring_beg, "\"", EXPR_CMDARG], [[2, 8], :on_tstring_content, "hello, ", EXPR_CMDARG],

Slide 51

Slide 51 text

Ripper.lex output [[[1, 0], :on_kw, "def", EXPR_FNAME], [[1, 3], :on_sp, " ", EXPR_FNAME], [[1, 4], :on_ident, "hello", EXPR_ENDFN], … ● It is an array of ○ Position ○ Scanner event name ■ pp Ripper::SCANNER_EVENTS ○ source code ○ lex state (since Ruby 2.5)

Slide 52

Slide 52 text

What does Iro uses? ● Iro uses the position and source code for highlighting position. ● Iro uses the event name for highlighting group. Example: [[1, 0], :on_kw, "def", EXPR_FNAME] :on_kw -> Iro highlight it as Keyword. [1, 0] and "def" -> Iro highlight line 1, column 0, size 3.

Slide 53

Slide 53 text

Ripper.sexp

Slide 54

Slide 54 text

Ripper.sexp $ ruby -rripper -e 'pp Ripper.sexp(File.read("test.rb"))' [:program, [[:def, [:@ident, "hello", [1, 4]], [:paren, [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]], [:bodystmt, [[:command, [:@ident, "puts", [2, 2]], [:args_add_block, [[:string_literal, [:string_content, [:@tstring_content, "hello, ", [2, 8]],

Slide 55

Slide 55 text

Ripper.sexp ● It is a S-expression. ● [:TYPE, child_sexp1, child_sexp2, ...] ● child_sexp is a S-expression or a lex output.

Slide 56

Slide 56 text

Event driven Ripper API

Slide 57

Slide 57 text

Iro does not use #sexp and #lex ● I described them, but Iro does not use them. ● Iro uses event driven API instead.

Slide 58

Slide 58 text

Event Driven Ripper API ● Ripper provides event driven API. ● Ripper calls on_TYPE method when it visits TYPE. ○ e.g. Ripper call on_kw method when it reads kw token.

Slide 59

Slide 59 text

Example: scanner event require 'ripper' class MyRipper < Ripper def on_kw(tok) puts "type: kw, source: #{tok}, position: #{lineno}, #{column}" end end MyRipper.parse(ARGF.read)

Slide 60

Slide 60 text

Output $ ruby myripper.rb hello.rb type: kw, source: def, position: 1, 0 type: kw, source: end, position: 3, 0

Slide 61

Slide 61 text

Example: parser event require 'ripper' class MyRipper < Ripper::SexpBuilderPP def on_def(name, params, body) p name p params p body end end MyRipper.parse(ARGF.read)

Slide 62

Slide 62 text

Output $ ruby test.rb hello.rb [:@ident, "hello", [1, 4]] [:paren, [:params, [[:@ident, "name", [1, 10]]], nil, nil, nil, nil, nil, nil]] [:bodystmt, [[:command, [:@ident, "puts", [2, 2]], [:args_add_block, [[:string_literal, [:string_content, [:@tstring_content, "hello, ", ...

Slide 63

Slide 63 text

Implementation of Iro

Slide 64

Slide 64 text

See pocke/iro/lib/iro/ruby/parser.rb https://github.com/pocke/iro/blob/master/lib/iro/ ruby/parser.rb

Slide 65

Slide 65 text

The future of Iro

Slide 66

Slide 66 text

Inline highlighting ● Highlight code in code ○ For example: # Here is Ruby code <<~SQL # Here is SQL SELECT * FROM users; SQL

Slide 67

Slide 67 text

More languages support ● Currently Iro.vim supports Ruby, YAML and Python. ○ “Iro.vim” has python and YAML support, so I’d like to extract the implementation to a gem or something. ● I’d like to add support Slim, Markdown. ○ Because Slim is a difficult language. ○ Markdown has inline code block.

Slide 68

Slide 68 text

More editors support ● Currently Iro supports Vim and HTML only. ● I believe we can use Iro in other editors. ○ e.g. Emacs, Atom or something.

Slide 69

Slide 69 text

FAQ

Slide 70

Slide 70 text

How about performance? ● I can use Iro.vim comfortable. ○ ⭕ 3,000 lines ○ ❌ 30,000 lines ● But I haven't compare performance with other implementations. ○ Because I'm not sure how to compare highlighter's performance.

Slide 71

Slide 71 text

Does Iro work on broken Ruby file? ● Code has syntax errors while editing. ● So syntax highlighters should be able to highlight broken code. ● Iro can highlight in almost cases.

Slide 72

Slide 72 text

Example ● It has a syntax error, but Iro highlight it.

Slide 73

Slide 73 text

Conclusion

Slide 74

Slide 74 text

Conclusion ● Iro is a Ripper based syntax highlighter. ○ It can highlight correctly. ● You can try using Iro now! ○ For Vimmer: https://github.com/pocke/iro.vim ○ Web demo: https://ruby-highlight.herokuapp.com Thank you for listening!

Slide 75

Slide 75 text

Events at our Booth 【Day 2】 12:00~13:00 Q&A with @pocke 15:20~15:50 Global Office Hours 【Day 3】 12:00~13:00 Q&A with @wyhaines 15:20~15:50 Ruby interpreter development live by @ko1 & @mame Cookpad X RubyKaigi 2018: Day 2 Party ⏰ June 1st, 19:30 - 21:30 (opens 19:00) Free (Registration required) Show up to this booth at 18:40 if you want to head with us!