Type Profiler: An Analysis to guess type signatures

Slide 1

Slide 1 text

Type Profiler: An analysis to guess type signatures Yusuke Endoh (@mametter) Cookpad Inc. RubyKaigi 2018 (2018/06/01)

Slide 2

Slide 2 text

Yusuke Endoh (@mametter) • A full-time MRI committer @ Cookpad – w/ Koichi Sasada

Slide 3

Slide 3 text

Recent achievement for Ruby 2.6 • Endless range [Feature #12912] (1..) Endless!

Slide 4

Slide 4 text

Endless range • Take an array without the first element ary=["a","b","c"] ary[1..-1] #=> ["b","c"] ary.drop(1) #=> ["b","c"] ary[1..] #=> ["b","c"]

Slide 5

Slide 5 text

Endless range • Loop from 1 to infinity i=1; loop { ……; i+=1 } (1..Float::INFINITY).each {……} 1.step {|i|……} (1..).each {|i|……}

Slide 6

Slide 6 text

Endless range • each_with_index from index 1 i=1; ary.each { ……; i+=1 } ary.each.with_index(1){|x,i|……} ary.zip(1..) {|x,i|……}

Slide 7

Slide 7 text

Endless range ✓Has been already committed in trunk ✓Will be included in Ruby 2.6 • Stay tuned! ary[1..] (1..).each {……} ary.zip(1..) {|x,i|……}

Slide 8

Slide 8 text

Beginless range...? • Just have implemented yesterday [Feature #14799] (..1) Beginless!

Slide 9

Slide 9 text

Today’s theme • Ruby3's type. • Some people held some meetings to discuss Ruby3's type – Matz, soutaro, akr, ko1, mame – Main objective: clarify matz's hidden requirements (and compromises) for Ruby3's type • (Not to decide everything behind closed door) • We'll explain the (current) requirements

Slide 10

Slide 10 text

Agenda • A whirlwind tour of already-proposed "type systems" for Ruby • Type DB: A key concept of Ruby3's type system • A missing part: Type profiler

Slide 11

Slide 11 text

A whirlwind tour of already-proposed "type systems" for Ruby

Slide 12

Slide 12 text

Type-related systems for Ruby • Steep – Static type check • RDL – (Semi) static type check • contracts.ruby – Only dynamic check of arguments/return values • dry-types – Only dynamic checks of typed structs • RubyTypeInference (by JetBrains) – Type information extractor by dynamic analysis • Sorbet (by Stripe)

Slide 13

Slide 13 text

RDL: Types for Ruby • Most famous in academic world – Jeff Foster at Univ. of Maryland – Accepted in OOPSLA, PLDI, and POPL! • The gem is available – https://github.com/plum-umd/rdl • We evaluated RDL – thought writing type annotations for OptCarrot

Slide 14

Slide 14 text

Basis for RDL # load RDL library require "rdl" class NES # activate type annotations for RDL extend RDL::Annotate # type annotation before method definition type "(?Array) -> self", typecheck: :call def initialize(conf = ARGV) ...

Slide 15

Slide 15 text

RDL type annotation • Accepts one optional parameter typed Array of String • Returns self – Always "self" for initialize method type "(?Array) -> self", typecheck: :call def initialize(conf = ARGV) ...

Slide 16

Slide 16 text

RDL type annotation • "typecheck" controls type check timing – :call: when this method is called – :now: when this method is defined – :XXX: when "RDL.do_typecheck :XXX" is done – nil: no "static check" is done • Used to type-check code that uses the method • Still "run-time check" is done type "(?Array) -> self", typecheck: :call def initialize(conf = ARGV) ...

Slide 17

Slide 17 text

Annotation for instance variables • Needs type annotations for all instance variables class NES # activate type annotations for RDL extend RDL::Annotate var_type :@cpu, "%any" type "() -> %any", typecheck: :call def reset @cpu.reset #=> receiver type %any not supported yet ...

Slide 18

Slide 18 text

Annotation for instance variables • Needs type annotations for all instance variables class NES # activate type annotations for RDL extend RDL::Annotate var_type :@cpu, "[reset: () -> %any]" type "() -> %any", typecheck: :call def reset @cpu.reset #=> receiver type [reset: () -> %any] not sup ...

Slide 19

Slide 19 text

Annotation for instance variables • Needs type annotations for all instance variables class NES # activate type annotations for RDL extend RDL::Annotate var_type :@cpu, "Optcarrot::CPU" type "() -> %any", typecheck: :call def reset @cpu.reset # error: no type information for # instance method `Optcarrot::CPU#reset'

Slide 20

Slide 20 text

Annotation for instance variables • Succeeded to type check class NES # activate type annotations for RDL extend RDL::Annotate type "Optcarrot::CPU","reset","()->%any" var_type :@cpu, "Optcarrot::CPU" type "() -> %any", typecheck: :call def reset @cpu.reset ...

Slide 21

Slide 21 text

Requires many annotations... type "() -> %bot", typecheck: :call def reset @cpu.reset @apu.reset @ppu.reset @rom.reset @pads.reset @cpu.boot @rom.load_battery end

Slide 22

Slide 22 text

Requires many annotations... type "() -> %bot", typecheck: nil def reset @cpu.reset @apu.reset @ppu.reset @rom.reset @pads.reset @cpu.boot @rom.load_battery end No static check

Slide 23

Slide 23 text

… still does not work type "() -> %bot", typecheck: nil def reset ... @rom.load_battery #=> [65533] end # Optcarrot::CPU#reset: Return type error.… # Method type: # *() -> %bot # Actual return type: # Array # Actual return value: # [65533]

Slide 24

Slide 24 text

Why? • typecheck:nil doesn't mean no check – Still dynamic check is done • %bot means "no-return" – Always raises exception, process exit, etc. – But this method returns [65533] – In short, this is my bug in the annotation type "() -> %bot", typecheck: nil def reset ... @rom.load_battery #=> [65533] end

Slide 25

Slide 25 text

Lessons: void type • In Ruby, a lot of methods return meaningless value – No intention to allow users to use the value • What type should we use in this case? – %any, or return nil explicitly? • We need a "void" type – %any for the method; it can return anything – "don't use" for users of the method def reset LIBRARY_INTERNAL_ARRAY. each { … } end

Slide 26

Slide 26 text

RDL's programmable annotation • RDL supports meta-programming symbols.each do |id| attr_reader_type, id, "String" attr_reader id end

Slide 27

Slide 27 text

RDL's programmable annotation • RDL supports pre-condition check – This can be also used to make type annotation automatically • I like this feature, but matz doesn't – He wants to avoid type annotations embedded in the code – He likes separated, non-Ruby type definition language (as Steep) pre(:belongs_to) do |name| …… type name, "() -> #{klass}" end

Slide 28

Slide 28 text

Summary: RDL • Semi-static type check – The timing is configurable • It checks the method body – Not only dynamic check of arguments/return values • The implementation is mature – Many features actually works, great! • Need type annotations • Supports meta-programming

Slide 29

Slide 29 text

Steep • Snip: You did listen to soutaro's talk • Completely static type check • Separated type definition language – .rbi – But also requires (minimal?) type annotation embedded in .rb files

Slide 30

Slide 30 text

Digest: contracts.ruby require 'contracts' class Example include Contracts::Core include Contracts::Builtin Contract Num => Num def double(x) x * 2 end end • RDL-like type annotation – Run-time type check

Slide 31

Slide 31 text

Digest: dry-types require 'dry-types' require 'dry-struct' module Types include Dry::Types.module end class User < Dry::Struct attribute :name, Types::String attribute :age, Types::Integer end • Can define structs with typed fields – Run-time type check – "type_struct" gem is similar

Slide 32

Slide 32 text

Digest: RubyTypeInference • Type information extractor by dynamic analysis – Run test suites under monitoring of TracePoint API – Hooks method call/return events, logs the passed values, and aggregate them to type information – Used by RubyMine IDE

Slide 33

Slide 33 text

Digest: RubyTypeInference https://speakerdeck.com/valich/automated-type-contracts-generation-1

Slide 34

Slide 34 text

Summary of Type Systems Objective Targets Annotations Steep Static type check Method body Separated (mainly) RDL Semi-static type check Method body Embedded in code contracts. ruby Dynamic type check Arguments and return values Embedded in code dry-types Typed structs Only Dry::Struct classes Embedded in code RubyType Inference Extract type information Arguments and return values N/A

Slide 35

Slide 35 text

Type DB: A key concept of Ruby3's Type System

Slide 36

Slide 36 text

Idea • Separated type definition file is good • But meta-programming like attr_* is difficult to support – Users will try to generate it programmatically • We may want to keep code position – To show lineno of code in type error report – Hard to manually keep the correspondence between type definition and code position in .rbi file – We may also want to keep other information

Slide 37

Slide 37 text

Type DB Type DB Steep type definition typecheck Steep RDL/Sorbet type annotation RDL typecheck better error report Ruby interpreter IDE

Slide 38

Slide 38 text

How to create Type DB Type DB Steep type definition Ruby code write manually compile stdlib Already included RubyTypeInference automatically extract by dynamic analysis Type Profiler

Slide 39

Slide 39 text

A missing part: Type Profiler

Slide 40

Slide 40 text

Type Profiler • Another way to extract type information from Ruby code – Alternative "RubyTypeInference" • Is not a type inference – Type inference of Ruby is hopeless – Conservative static type inference can extracts little information • Type profiler "guesses" type information – It may extract wrong type information – Assumes that user checks the result

Slide 41

Slide 41 text

Type Profilers • There is no "one-for-all" type profiler – Static type profiling cannot handle ActiveRecord – Dynamic type profiling cannot extract syntactic features (like void type) • We need a variety of type profilers – For ActiveRecord by reading DB schema – Extracting from RDoc/YARD

Slide 42

Slide 42 text

In this talk • We prototyped three more generic type profilers – Static analysis 1 (SA1) • Mainly for used-defined classes – Static analysis 2 (SA2) • Mainly for builtin classes – Dynamic analysis (DA) • Enhancement of "RubyTypeInference"

Slide 43

Slide 43 text

SA1: Idea • Guess a type of formal parameters based on called method names class FooBar def foo(...); ...; end def bar(...); ...; end end def func(x) #=> x:FooBar x.foo(1) x.bar(2) end

Slide 44

Slide 44 text

SA1: Prototyped algorithm • Gather method definitions in each class/modules – FooBar={foo,bar} • Gather method calls for each parameters – x={foo,bar} – Remove general methods (like #[] and #+) to reduce false positive – Arity, parameter and return types aren't used • Assign a class that all methods match class FooBar def foo(...);...;end def bar(...);...;end end def func(x) x.foo(1) x.bar(2) end

Slide 45

Slide 45 text

SA1: Evaluation • Experimented SA1 with WEBrick – As a sample code that has many user- defined classes • Manually checked the guessed result – Found some common guessing failures • Wrong result / no-match result – No quantitative evaluation yet

Slide 46

Slide 46 text

SA1: Problem 1 • A parameter is not used • Many methods are affected def do_GET(req, res) raise HTTPStatus::NotFound, "not found." end DefaultFileHandler#do_GET(req:#{}, res:HTTPResponse) FileHandler#do_GET(req:#{}, res:#{}) AbstractServlet#do_GET(req:#{}, res:#{}) ProcHandler#do_GET(request:#{}, response:#{}) ERBHandler#do_GET(req:#{}, res:HTTPResponse)

Slide 47

Slide 47 text

SA1: Problem 2 • Incomplete guessing • Cause – the method calls req.request_uri – Both HTTPResponse and HTTPRequest provides request_uri HTTPProxyServer#perform_proxy_request( req: HTTPResponse | HTTPRequest, res: WEBrick::HTTPResponse, req_class:#{new}, :nil)

Slide 48

Slide 48 text

(Argurable) solution? • Exploit the name of parameter – Create a mapping from parameter name to type after profiling • "req"  HTTPRequest – Revise guessed types using the mapping • Fixed! DefaultFileHandler#do_GET(req:HTTPRequest, res:HTTPResponse) FileHandler#do_GET(req:HTTPRequest, res:HTTPResponse) AbstractServlet#do_GET(req:HTTPRequest, res:HTTPResponse) ProcHandler#do_GET(request:#{}, response:#{}) ERBHandler#do_GET(req:HTTPRequest, res:HTTPResponse) CGIHandler#do_GET(req:HTTPRequest, res:HTTPResponse)

Slide 49

Slide 49 text

SA1: Problem 3 • Cannot guess return type • Can guess in only limited cases – Returns formal parameter – Returns a literal or "Foo.new" – Returns an expression which is already included Type DB • See actual usage of the method? – Requires inter-procedural or whole-program analysis!

Slide 50

Slide 50 text

SA1: Pros/Cons • Pros – No need to run tests – Can guess void type • Cons – Hard when parameters are not used • This is not a rare case – Heuristic may work, but cause wrong guessing

Slide 51

Slide 51 text

SA2: Idea • I believe this method expects Numeric! def add_42(x) #=> (x:Num)=>Num x + 42 end

Slide 52

Slide 52 text

SA2: Prototyped algorithm • Limited type DB of stdlib – Num#+(Num)  Num – Str#+(Str)  Str, etc. • "Unification-based type-inference" inspired algorithm – searches "α#+(Num)  β" – Matches "Num#+(Num)  Num" • Type substitution: α=Num, β=Num x + 42

Slide 53

Slide 53 text

SA2: Prototyped algorithm (2) • When multiple candidates found – matches: • Num#<<(Num)  Num • Str#<<(Num)  Str • Array[α]#<<(α)  Array[α] – Just take union types of them • (Overloaded types might be better) def push_42(x) x << 42 end #=> (x:(Num|Str|Array))=>(Num|Str|Array) x << 42

Slide 54

Slide 54 text

SA2: Evaluation • Experimented SA1 with OptCarrot – As a sample code that uses many builtin types • Manually checked the guessed result – Found some common guessing failures • Wrong result / no-match result – No quantitative evaluation yet

Slide 55

Slide 55 text

SA2: Problem 1 • Surprising result – Counterintuitive, but actually it works with @fetch:Array[Num|Str] def peek16(addr) @fetch[addr] + (@fetch[addr + 1] << 8) end # Optcarrot::CPU#peek16(Num) => (Num|Str)

Slide 56

Slide 56 text

SA2: Problem 2 • Difficult to handle type parameters – Requires constraint-based type-inference @ary = [] # Array[α] @ary[0] = 1 # unified to Array[Num] @ary[1] = "str" # cannot unify Num and Str

Slide 57

Slide 57 text

SA2: Pros/Cons • Pros – No need to run tests – Can guess void type – Can guess parameters that is not used as a receiver • Cons – Cause wrong guessing – Hard to handle type parameters (Array[α]) – Hard to scale • The bigger type DB is, more wrong results will happen

Slide 58

Slide 58 text

DA: Idea • Recording actual inputs/output of methods by using TracePoint API – The same as RubyTypeInference • Additional features – Support block types • Required enhancement of TracePoint API – Support container types: Array[Int] • By sampling elements

Slide 59

Slide 59 text

DA: Evaluation • Evaluated with OptCarrot and WEBrick • It works easily and robust

Slide 60

Slide 60 text

DA: Problem 1 • Very slow (in some cases) – Recording OptCarrot may take hours – Element-sampling for Array made it faster, but still take a few minutes • Without tracing, it runs in a few seconds – It may depend on application • Profiling WEBrick is not so slow

Slide 61

Slide 61 text

DA: Problem 2 • Cannot guess void type – Many methods returns garbage – DA cannot distinguish garbage and intended return value • SA can guess void type by heuristic – Integer#times, Array#each, etc. – if statement that has no "else" – while and until statements – Multiple assignment • (Steep scaffold now supports some of them)

Slide 62

Slide 62 text

DA: Problem 3 • Some tests confuse the result – Need to ignore error-handling tests by cooperating test framework assert_raise(TypeError) { … }

Slide 63

Slide 63 text

DA: Pros/Cons • Pros – Easy to implement, and robust – It can profile any programs • Including meta-programming like ActiveRecord • Cons – Need to run tests; it might be very slow – Hard to handle void type – TracePoint API is not enough yet – Need to cooperate with test frameworks

Slide 64

Slide 64 text

Conclusion • Reviewed already-proposed type systems for Ruby – Whose implementations are available • Type DB: Ruby3's key concept • Some prototypes and experiments of type profilers – Need more improvements / experiments!