checking Detect invalid method calls, etc., before they are executed use steep check, etc Improved development experience, including complements More accurate completion based on type use steep-vscode or TypeProf for IDE, etc 型検査による安全な開発 不正なメソッド呼び出しなどを実⾏前に検知できる steep check などが利⽤できる 補完などの開発体験の向上 型に基づいてより正確に補完できる steep-vscode やTypeProf for IDE が利⽤できる
piyo].each do |v| attr_accessor v end end # Required for Type Level Exec config = Config.configure do |c| c.hoge = 1 c.fuga = 'a' c.piyo = :piyo end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 config.rb class Config def self.configure: { (Config) -> :piyo } end # Typed, but # No attribute accessors # (and I want the return value to be void.) 1 2 3 4 5 6 7 config.rbs Ruby type level execution 型レベル実⾏ RBS
2 3 lib/person.rb class PersonName attr_reader :value end 1 2 3 lib/person_name.rb class Person # Not a String @name: PersonName end 1 2 3 4 sig/person.rbs
at once AI can make comprehensive decisions from all codes Small Project can be stores in 128K tokens Unrealistic as of last year, as 4K was the max すべてのRuby ファイルをまとめて1 度に推論させる ⼩さなプロジェクトなら128K トークンに収まる 昨年時点では4K トークンが最⼤だったため⾮現実的だった AI が全てのコードを⾒て総合的に判断できる
ENV.fetch('OPENAI_ACCESS_TOKEN') RbsGoose.configure do |c| # Use the provided configuration methods c.use_open_ai(api_key) # or directly configure an instance of Langchain::LLM c.llm.client = ::Langchain::LLM::OpenAI.new(api_key: ) # or Local Server such as Ollama c.llm.client = ::Langchain::LLM::Ollama.new( url: "http://localhost:11434" ) end 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ref: RbsGooseTest Rakefile Setups
CI. Web mocks such as VCR gem can be used Make it an exact match, including Request Body Temperature should be set to 0 for reproducibility RBS Goose Testing RBS Goose のテスト LLM API は費⽤が⾼く応答も遅い CI と致命的に相性が悪い VCR gem などの Web モックを利⽤すると良い リクエストボディを含めた厳密⼀致を指定する 再現性のために、temperature は0 にする
VCR の利⽤ # spec/rbs_goose/type_inferrer_spec.rb RSpec.describe RbsGoose::TypeInferrer, :configure do it 'returns refined rbs' do VCR.use_cassette('openai/infer') do expect(described_class.new.infer).to eq(refined_rbs_list) end end end 1 2 3 4 5 6 7 8 9 ref: spec/rbs_goose/type_inferrer_spec.rb
Let RBS Goose guess a small example involving metaprogramming The base RBS is generated by each of the three methods Tried OpenAI and Anthropic models + CodeGemma (local LLM) steep check + Quality checks by read the RBS Check if there are any untyped left that can be detailed, etc. メタプログラミングを含む⼩さな例を推測させた ベースとなるRBS は、事前に解説した3 種類の⼿法で⽣成 OpenAI とAnthropic の各モデル + CodeGemma( ローカルLLM) を試した kokuyouwind/rbs_goose_test case1 steep check の確認に加えて、⽬視での品質確認を実施 まだ具体化できるuntyped が残されていないか、などを確認
Model Size prototype rb base prototype runtime base Typeprof base OpenAI GPT-3.5 Turbo Small OpenAI GPT-4 Turbo Large OpenAI GPT-4 Omni Large Anthropic Claude 3 Haiku Small Anthropic Claude 3 Sonnet Medium Anthropic Claude 3 Opus Large Ollama(Local) CodeGemma Small Perfect Perfect Almost Perfect Perfect Perfect Perfect Perfect Perfect Perfect Almost Almost Almost Not Good Perfect Almost Almost Almost Not Good Not Good Not Good
rb + Anthropic Claude 3 Sonnet rbs prototype rb + Anthropic Claude 3 Opus There was a case of fabricating the return type of LangChain::LLM::OpenAI#chat LangChain::LLM::OpenAI#chat の返り値の型を捏造することがあった
Platform Model Size prototype rb base prototype runtime base Typeprof base OpenAI GPT-3.5 Turbo Small 2.2 2.2 4.8 OpenAI GPT-4 Turbo Large 7.6 11.4 7.2 OpenAI GPT-4 Omni Large 1.8 1.7 1.9 Anthropic Claude 3 Haiku Small 3.3 3.3 2.9 Anthropic Claude 3 Sonnet Medium 3.5 8.6 3.0 Anthropic Claude 3 Opus Large 14.6 13.0 13.1 Ollama(Local) CodeGemma Small 7.1 7.4 4.1
base OpenAI GPT-3.5 Turbo Small (2.2) (2.2) (4.8) OpenAI GPT-4 Turbo Large (7.6) (11.4) (7.2) OpenAI GPT-4 Omni Large (1.8) (1.7) (1.9) Anthropic Claude 3 Haiku Small (3.3) (3.3) (2.9) Anthropic Claude 3 Sonnet Medium (3.5) - (3.0) Anthropic Claude 3 Opus Large (14.6) (7.4) (4.1) Perfect Perfect Almost Perfect Perfect Perfect Perfect Perfect Perfect Perfect Almost Almost Almost Perfect Almost Almost Almost Result 1: Time (Perfect or Almost) 結果 1: 実⾏時間(Perfect かAlmost のもののみ)
the same for all methods Looks good to focus on rbs prototype rb For the model, the GPT system clearly performed better GPT-4 Omni was the fastest but ideal output rbs prototype rb + GPT-4 Omni combination looks good 元となるRBS ⽣成⼿法はどれにしても⼤差なかった GPT-4 Omni が最速なのに理想的な出⼒だった 実⾏が⼿軽で速い rbs prototype rb に絞っても良さそう rbs prototype rb + GPT-4 Omni の組み合わせが良さそう モデルは GPT 系の成績が明らかに良かった
Small 4.3 0.44 OpenAI GPT-4 Turbo Large 69.2 12.6 OpenAI GPT-4 Omni Large 52.5 7.86 Anthropic Claude 3 Haiku Small 33.4 0.65 Anthropic Claude 3 Sonnet Medium 55.5 7.88 Anthropic Claude 3 Opus Large 90.7 35.72 Ollama(Local) codegemma Small 95.9 N/A Poor Almost Almost Poor Almost Almost Subtle Result 2: Generated RBS Quality 結果 2: ⽣成されたRBS の質
special cases such as Struct well Necessary to include it in the example, or require Fine Tuning The 1:1 assumption of ruby and rbs was not a good rbs_rails, typeprof, etc. generate RBS at the top level I still want a fix for type errors Struct などの特殊ケースのRBS をうまく扱えない rbs_rails やtypeprof などはトップレベルにRBS を⽣成するので対応が取れない example に含めるか, Fine Tuning を⾏う必要がありそう やっぱり型エラーの⾃動修正が欲しい ruby とrbs を⼀対⼀の前提にしたのはあまり良くなかった
the RBS Goose Explained how to compose the prompt and the intent Some tips for development with LLM were presented RBS Goose is still experimental LLM could be used to do some interesting things RBS Goose を作った事例を紹介した プロンプトの構成⽅法と、その意図について解説した LLM を使った開発のTips をいくつか紹介した LLM を使うと⾯⽩いことができるかも、というのが伝わると嬉しい RBS Goose はまだ実験段階
I tried and it completes quite well. GitHub Copilot GitHub Copilot を試したら、結構補完してくれそう Editing entire projects with AI could work well like or Open Interpreter Copilot Workspace Open Interpreter や Copilot Workspace など、 AI でプロジェクト全体を編集する戦略もやりやすくなりそう
a dead duck or a goose that lays golden eggs. So I'll keep at it a little longer before I cooks my own goose. RBS Goose が Dead Duck になる ( 失敗に終わる) のか、 それとも⾦の卵を⽣むガチョウになるのかはまだわからないので、 Cook my own goose( ⾃分で成功の機会を捨てる) 前に もう少し続けてみたいと思う。