Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Do LLMs dream of Type Inference?

Do LLMs dream of Type Inference?

Presented in RubyConf 2025 at Chicago.
https://www.rubyevents.org/events/rubyconf-2024

Avatar for kokuyouwind

kokuyouwind

November 15, 2024
Tweet

More Decks by kokuyouwind

Other Decks in Programming

Transcript

  1. Why Type Inference with LLMs? class Bird; end class Duck

    < Bird def cry; puts "Quack"; end end class Goose < Bird def cry; puts "Gabble"; end end def make_sound(bird) bird.cry end make_sound(Duck.new) make_sound(Goose.new) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 What is the argument type of make_sound method? 2
  2. Traditional Approach: Algorithmic make_sound(Duck.new) make_sound(Goose.new) 1 2 Called with Duck

    Called with Goose The argument type of make_sound is (Duck | Goose) 3
  3. Human Approach: Heuristics class Bird; end def make_sound(bird) bird.cry end

    1 2 3 4 5 The argument name is bird. The argument type of make_sound is Bird There is Bird class. 4
  4. Algorithms are great at logic, but lack heuristic understanding. LLMs

    offer the potential for human-like type inference. 5
  5. I developed as a tool to guess RBS types using

    LLMs. https://rubykaigi.org/2024/presentations/kokuyouwind.html#day1 6
  6. RBS Goose: Current State Ruby RBS some small : How

    capable is : ? We will need some metrics of RBS Goose performance. 10
  7. Explain how RBS Goose works with LLM and evaluate Better

    results than traditional methods in several cases Share the idea of a type inference benchmark I planned Referring to previous studies Today's Focus 11
  8. Outline Basics of Type System and Type Inference RBS Goose

    Architecture and Evaluation Evaluation Method in Previous Studies The idea of TypeEvalRb Conclusion 12
  9. Outline Basics of Type System and Type Inference RBS Goose

    Architecture and Evaluation Evaluation Method in Previous Studies The idea of TypeEvalRb Conclusion 13
  10. Type System A mechanism to classify the components of a

    program Strings, numbers, etc. To prevent invalid operations Ruby is a dynamically typed language 1 + 'a' : TypeError is raised at runtime 1 + 'a' if false : TypeError is not raised 14
  11. Static Type Checking A mechanism to detect type errors before

    execution Need to know the type of each part of the code Ruby does not use type annotations in its code Define types with RBS / Checking with Steep (Other options include RBI / Sorbet, and RDL, but we will not cover in this session) 15
  12. Static Type Checking: Examples For 1 + 'a', we can

    detect a type error if we know... 1 is an Integer 'a' is a String Integer#+ cannot accept a String class Integer def +: (Integer) -> Integer # ... end 1 2 3 4 https://github.com/ruby/rbs/blob/d9000d23/core/integer.rbs 16
  13. Type Inference Mechanism to infer types of code without explicit

    annotations For performing static type checks To generate types for Ruby code without type definitions TypeProf: Ruby / RBS type inference tool Tracking data flow in variable assignments and method calls (Dataflow Analysis) 17
  14. Tricky Case - Generalization class Bird; end class Duck <

    Bird def cry; puts "Quack"; end end class Goose < Bird def cry; puts "Gabble"; end end def make_sound(bird) bird.cry end make_sound(Duck.new) make_sound(Goose.new) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class Bird end class Duck < Bird def cry: -> nil end class Goose < Bird def cry: -> nil end class Object def make_sound: (Duck | Goose) -> nil end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lib/bird.rb sig/bird.rbs The argument type of make_sound is infered as a union of subtypes. TypeProf 18
  15. Tricky Case - Dynamic definition class Dynamic ['foo', 'bar'].each do

    |x| define_method("print_#{x}") do puts x end end end d = Dynamic.new d.print_foo #=> 'foo' d.print_bar #=> 'bar' 1 2 3 4 5 6 7 8 9 10 11 class Dynamic end 1 2 lib/dynamic.rb sig/dynamic.rbs [error] undefined method: Dynamic#print_foo [error] undefined method: Dynamic#print_bar TypeProf 19
  16. Outline Basics of Type System and Type Inference RBS Goose

    Architecture and Evaluation Evaluation Method in Previous Studies The idea of TypeEvalRb Conclusion 20
  17. RBS Goose class Bird; end class Duck < Bird def

    cry; puts "Quack"; end end class Goose < Bird def cry; puts "Gabble"; end end def make_sound(bird) bird.cry end make_sound(Duck.new) make_sound(Goose.new) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class Bird end class Duck < Bird def cry: () -> void end class Goose < Bird def cry: () -> void end class Object def make_sound: (Bird arg) -> void end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lib/bird.rb sig/bird.rbs Generate RBS type definitions from Ruby code using LLMs 21
  18. LLM Technique: Few-shot Prompting Few-shot Prompt (provide some examples) Zero-shot

    Prompt (provide no examples) (Prompt) Answer color code. Q: red A: #FF0000 Q: blue A: (Output) #0000FF 1 2 3 4 5 6 7 8 9 (Prompt) Answer color code for blue. (Output) The color code for blue depends on the system you're using: HEX: #0000FF RGB: (0, 0, 255) CMYK: (100%, 100%, 0%, 0%) HSL: (240°, 100%, 50%) Pantone: PMS 2935 C (approximation) Would you like codes for a specific shade of blue? 1 2 3 4 5 6 7 8 9 10 11 12 23
  19. Step.1 Generate RBS prototype Ruby RBS Refined RBS rbs prototype

    examples Prompt LLM (e.g. ChatGPT) 25 class Bird end class Duck < Bird def cry: () -> untyped end class Goose < Bird def cry: () -> untyped end class Object def make_sound: (untyped bird) -> untyped end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 sig/bird.rbs
  20. rbs prototype (or other tools) Ruby RBS Refined RBS examples

    Prompt LLM (e.g. ChatGPT) 26 class Example1 attr_reader :quantity def initialize(quantity:) @quantity = quantity end def quantity=(quantity) @quantity = quantity end end 1 2 3 4 5 6 7 8 9 10 11 lib/example1.rb class Example1 @quantity: untyped attr_reader quantity: untyped def initialize: (quantity: untyped) -> void def quantity=: (untyped quantity) -> void end 1 2 3 4 5 6 7 8 9 sig/example1.rbs class Example1 @quantity: Integer attr_reader quantity: Integer def initialize: (quantity: Integer) -> void def quantity=: (Integer quantity) -> void end 1 2 3 4 5 6 7 8 9 refined/sig/example1.rbs Step.2 Load Few-shot Examples
  21. Ruby RBS Refined RBS rbs prototype examples Prompt LLM (e.g.

    ChatGPT) 27 When ruby source codes and RBS type signatures are given, refine each RBS type signatures. ======== Input ======== ```lib/example1.rb ... ``` ```sig/example1.rbs ... ``` ======== Output ======== ```sig/example1.rbs ... ``` ======== Input ======== ```lib/bird.rb ... ``` ```sig/bird.rbs ... ``` ======== Output ======== 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Examples Ruby Code LLM Infer RBS Prototype Step.3 Construct Prompt
  22. Step.4 Parse response and output Ruby RBS Refined RBS rbs

    prototype examples Prompt LLM (e.g. ChatGPT) 28 ```sig/bird.rbs class Bird end class Duck < Bird def cry: () -> void end class Goose < Bird def cry: () -> void end class Object def make_sound: (Bird arg) -> void end ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  23. Key Points LLMs are not inherently familiar with RBS grammar

    Pre-generate RBS prototypes Framing the task as a fill-in-the-blanks problem for untyped Use Few-shot prompting To format the output for easy parsing Illustrate RBS unique grammar (such as attr_reader) 29
  24. RBS Goose Results - Generarization class Bird; end class Duck

    < Bird def cry; puts "Quack"; end end class Goose < Bird def cry; puts "Gabble"; end end def make_sound(bird) bird.cry end # The following is not # provided to RBS Goose # make_sound(Duck.new) # make_sound(Goose.new) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 class Bird end class Duck < Bird def cry: () -> void end class Goose < Bird def cry: () -> void end class Object def make_sound: (Bird arg) -> void end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lib/bird.rb sig/bird.rbs The argument of make_sound is inferred to be Bird. 30
  25. RBS Goose Results - Dynamic definition class Dynamic ['foo', 'bar'].each

    do |x| define_method("print_#{x}") do puts x end end end # The following is not # provided to RBS Goose # d = Dynamic.new # d.print_foo #=> 'foo' # d.print_bar #=> 'bar' 1 2 3 4 5 6 7 8 9 10 11 12 13 class Dynamic def print_foo: () -> void def print_bar: () -> void end 1 2 3 4 5 lib/dynamic.rb sig/dynamic.rbs Correctly infer dynamic method definitions 31
  26. RBS Goose Results - Proc Arguments def call(f) f.call() end

    f = -> { 'hello' } p call(f) 1 2 3 4 5 6 # Wrong Syntax def call: (() -> String f) -> String 1 2 lib/call.rb # Correct Syntax def call: (^-> String f) -> String 1 2 TypeProf Correct Syntax Wrong Syntax 32
  27. Manual evaluation has limitations ProcType OptionalType RecordType TuppleType AttributeDefinition Generics

    Mixin Member Visibility Ruby on Rails ActiveSupport ActiveModel Refinement Quine method_missing delegete 33
  28. We need better methods to evaluate type inference. Let's look

    at how previous studies have evaluated this. 34
  29. Outline Basics of Type System and Type Inference RBS Goose

    Architecture and Evaluation Evaluation Method in Previous Studies The idea of TypeEvalRb Conclusion 35
  30. Evaluation Method in Previous Studies This session will focus on

    below two studies Study 1: Evaluation of SimTyper(Ruby Type Inference Tool) Study 2: TypeEvalPy (Python Type Inference Benchmark) 36
  31. Ruby type inference tool Constraint-based inference Built on RDL, one

    of the Ruby type checker incompatible with RBS Kazerounian, SimTyper: sound type inference for Ruby using type equality prediction, 2021, OOPSLA 2024 https://dl.acm.org/doi/10.1145/3485483 Previous Study 1: SimTyper 37
  32. SimTyper - Evaluation Method Compare expected and inferred types for

    each argument, return value, and variable def foo: (Array[String], Array[Integer]) -> Array[String] def foo: (Array[String], Array[String]) -> void expected: inferred: Match Match up to Parameter Different 38
  33. SimTyper - Test Data The following were used as type

    inference test data Four Ruby on Rails apps typed in RDL ( , , , ) Four Ruby libraries with YARD documentation ( , , , ) code.org Discourse Journey Talks TZInfo MiniMagick Ronin Money 39
  34. SimTyper - Artifacts The reproducion data is provided ... as

    a VM image 😢 https://zenodo.org/records/5449078 41
  35. TypeEvalPy - Abstract Micro-benchmarks for type inference in Python Small

    test cases, categorized by grammatical elements, etc. Evaluation method is almost the same as SimTyper compares for each parameter, return, and variable only Exact matches counted Venkatesh, TypeEvalPy: A Micro-benchmarking Framework for Python Type Inference Tools, 2023, ICSE 2024 https://arxiv.org/abs/2312.16882 42
  36. TypeEvalPy: TestCase https://github.com/secure-software-engineering/TypeEvalPy/tree/main/micro-benchmark/python_features/args/assigned_call def param_func(): return "Hello from param_func" def

    func(a): return a() b = param_func c = func(b) 1 2 3 4 5 6 7 8 9 10 main.py [{"file": "main.py", "line_number": 4, "col_offset": 5, "function": "param_func", "type": ["str"]}, {"file": "main.py", "line_number": 8, "col_offset": 10, "parameter": "a", "function": "func", "type": [ "callable"]}, // ... 1 2 3 4 5 6 7 8 9 10 11 12 main_gt.json 44
  37. TypeEvalPy: Results Category Total facts Scalpel args 43 15 assignments

    82 23 builtins 68 0 classes 122 24 decorators 58 19 ... Aggregate by category, measure strengths and weaknesses. 45
  38. What we can learn from Previous Studies Compare expected and

    inferred types for each argument, return value, and variable The number of matches can be used as metrics Two types of test data Real-world code: measures practical performance Micro benchmark: clarify the strengths and weaknesses 46
  39. Outline Basics of Type System and Type Inference RBS Goose

    Architecture and Evaluation Evaluation Method in Previous Studies The idea of TypeEvalRb Conclusion 47
  40. TypeEvalRb - Architecture Comparator Test data Expected RBS Types Ruby

    Code Inferred RBS Types Benchmark Result Aggregate Match / Unmatch 48
  41. TypeEvalRb - Comparation Construct Comparison Tree from two RBS::Environment #

    load expected/sig/bird.rbs to RBS::Environment > loader = RBS::EnvironmentLoader.new > loader.add(path: Pathname('expected/sig/bird.rbs')) > env = RBS::Environment.from_loader(loader).resolve_type_names => #<RBS::Environment @declarations=(409 items)...> # RBS::Environment contains ALL types includes stdlib, etc. > env.class_decls.count => 330 # Extract Goose class > goose = env.class_decls[RBS::Namespace.parse('::Goose').to_type_name] => #<RBS::Environment::ClassEntry:0x000000011e478d70 @decls=...> 1 2 3 4 5 6 7 8 9 10 11 12 13 49
  42. TypeEvalRb - Comparation Goose's ClassEntry is... so deeply nested 😅

    > pp goose #<RBS::Environment::ClassEntry:0x000000011f239a40 @decls= [#<struct RBS::Environment::MultiEntry::D decl= #<RBS::AST::Declarations::Class:0x0000000128d7dd08 @annotations=[], @comment=nil, @location= #<RBS::Location:371300 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=8:0, pos=61...105, children=keyword,name,end,?type_params,?lt source="class Goose < Bird">, @members= [#<RBS::AST::Members::MethodDefinition:0x0000000128d7dd58 @annotations=[], @comment=nil, @kind=:instance, @location= #<RBS::Location:371360 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:2, pos=82...101, children=keyword,name,?kind,?overloading,?visibility source="def cry: () -> void">, @name=:cry, @overloading=false, @overloads= [#<RBS::AST::Members::MethodDefinition::Overload:0x000000011f23a968 @annotations=[], @method_type= #<RBS::MethodType:0x0000000128d7dda8 @block=nil, @location= #<RBS::Location:371420 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:11, pos=91...101, children=type,?type_params source="() -> void">, @type= #<RBS::Types::Function:0x0000000128d7ddf8 @optional_keywords={}, @optional_positionals=[], @required_keywords={}, @required_positionals=[], @rest_keywords=nil, @rest_positionals=nil, @return_type= #<RBS::Types::Bases::Void:0x0000000128892af0 @location= #<RBS::Location:371440 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=9:17, pos=97...101, children= source="void">>, @trailing_positionals=[]>, @type_params=[]>>], @visibility=nil>], @name=#<RBS::TypeName:0x000000011f23abc0 @kind=:class, @name=:Goose, @namespace=#<RBS::Namespace:0x000000011f23abe8 @absolute=true, @path=[]>>, @super_class= #<RBS::AST::Declarations::Class::Super:0x000000011f23a9e0 @args=[], @location= #<RBS::Location:371540 buffer=/Users/kokuyou/repos/type_eval_rb/spec/fixtures/examples/bird/refined/sig/bird.rbs, start=8:14, pos=75...79, children=name,?args source="Bird">, @name= #<RBS::TypeName:0x000000011f23b160 @kind=:class, @name=:Bird, @namespace=#<RBS::Namespace:0x0000000100cdf6a8 @absolute=true, @path=[]>>>, @type_params=[]>, outer=[]>], @name=#<RBS::TypeName:0x000000011f239a68 @kind=:class, @name=:Goose, @namespace=#<RBS::Namespace:0x000000011f23abe8 @absolute=true, @path=[]>>, @primary=nil> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 50
  43. TypeEvalRb - Comparation Take only defined classes and build a

    tree structure (lack many things) > compare_bird => ComparisonTree( class_nodes=[ ClassNode(typename=::Bird, instance_variables=[ ], methods=[ ]) ClassNode(typename=::Duck, instance_variables=[ ], methods=[ MethodNode(name=cry, parameters=[ ], return_type=TypeNode( expected="void", actual="untyped") )]) ClassNode(typename=::Goose, instance_variables=[ ], methods=[ MethodNode(name=cry, parameters=[ ], return_type=TypeNode( expected="void", actual="untyped") )]) ]) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 51
  44. TypeEvalRb - Test Data Micro-benchmark data like TypeEvalPy Small test

    data classified by grammatical elements, etc. For detailed evaluation of strengths and weaknesses Real-world data, similar to that used to evaluate SimTyper Libraries and Rails applications with RBS type definitions For evaluation of practical performance 52
  45. TypeEvalRb - Microbenchmark Test Data Exploring the possibility of using

    the GitHub Copilot Workspace for data preparation. https://githubnext.com/projects/copilot-workspace 53
  46. Outline Basics of Type System and Type Inference RBS Goose

    Architecture and Evaluation Evaluation Method in Previous Studies The idea of TypeEvalRb Conclusion 55
  47. Conclusion Shared how RBS Goose works and evaluation results Better

    results than traditional methods in some cases Surveyed evaluation methods in previous studies Count matches between expected and inferred types Both Micro-Benchmark and real-world data are useful Shared idea of TypeEvalRb, type inference benchmark To reveal inference performance and for future improvement 56