citation fuzzy matchで学ぶstructured outputs

Slide 1

Slide 1 text

citation fuzzy matchで学ぶ structured outputs 清水厚志（Atsushi Shimizu）

Slide 2

Slide 2 text

2 清水厚志（Atsushi Shimizu）株式会社HBA ICTソリューション本部テクニカルエキスパート 2005年入社（20年目）/ 45歳、3人兄弟の父親好きなこと • 検索技術とAI コミュニティ歴 • Elasticsearch勉強会（2023/04～） • JAWS-UG（2023/09～） • ChatGPT Community(JP) など発表は個人の見解に基づくものであり、所属組織を代表するものではありません。 @shimizuxa 自己紹介今月から、LangChainを使うPRJへ参画できた

Slide 3

Slide 3 text

3 citation fuzzy match の概要【概要】 • Open AI の function calling を利用した仕組みで、質問と参考文書を与え、LLMより得た回答を分割し、参考文書の出典とセットで構造化する。 https://python.langchain.com/api_reference/langchain/chains/langchain.chains.openai_functions.citation_fuzzy_match.create_citation_fuzzy_match_runnable.html 実行結果の例

Slide 4

Slide 4 text

4 citation fuzzy match の概要 https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/openai_functions/citation_fuzzy_match.py chain構成の実装 structured outputsで出力を構造化させる Chain定義引用を動作させるためのプロンプト

Slide 5

Slide 5 text

5 citation fuzzy match の概要実行結果の構造 https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/chains/openai_functions/citation_fuzzy_match.py QuestionAnswer - question: 元の質問 - answer: FactWithEvicenceのリスト FactWithEvicence - fact: 回答 - substring_quote: 出典のリスト Pythonインスタンスが返ってくる

Slide 6

Slide 6 text

6 structured outputs の概要 structured outputの仕組み（citation fuzzy matchで実装されている方式） https://docs.llamaindex.ai/en/stable/module_guides/querying/structured_outputs/ 質問と参照文書 function callingの仕組みで、指示された応答の型定義を読みだす応答指示の形式に応じた JSON形式の応答

Slide 7

Slide 7 text

7 citation fuzzy matchを使えるモデル Amazon Bedrockで、OpenAI以外のモデルで動くか確認してみた Model name 結果 Claude3.5 Sonnet anthropic.claude-3-5-sonnet-20241022-v2:0 ○ Claude3.5 Haiku anthropic.claude-3-5-haiku-20241022-v1:0 ○ Llama3.1 8B meta.llama3-1-8b-instruct-v1:0 × Llama3.1 70B meta.llama3-1-70b-instruct-v1:0 × Llama3.1 405B meta.llama3-1-405b-instruct-v1:0 × Command R cohere.command-r-v1:0 × Command R+ cohere.command-r-plus-v1:0 × Mistral Large mistral.mistral-large-2402-v1:0 × Mistral Small mistral.mistral-small-2402-v1:0 × Tool use(=function calling)対応のモデルからピックアップ ChatBedrock.with_structured_output モデル名に「claude-3」を含まない場合はエラー Claude系は正常動作した！

Slide 8

Slide 8 text

8 Test Run with Amazon Bedrock 実行結果（Claude3.5 Sonnet） [llm/end] [chain:RunnableSequence > llm:ChatBedrock] [3.60s] Exiting LLM run with output: { "generations": [ [ { "text": "", "generation_info": null, "type": "ChatGeneration", "message": { "lc": 1, "type": "constructor", "id": [ "langchain", "schema", "messages", "AIMessage" ], "kwargs": { "content": "", "additional_kwargs": { "usage": { "prompt_tokens": 790, "completion_tokens": 146, "total_tokens": 936 }, "stop_reason": "tool_use", "model_id": "anthropic.claude-3-5-sonnet-20241022-v2:0" }, "response_metadata": { "usage": { "prompt_tokens": 790, "completion_tokens": 146, "total_tokens": 936 }, "stop_reason": "tool_use", "model_id": "anthropic.claude-3-5-sonnet-20241022-v2:0" }, "type": "ai", "id": "run-af4752c3-92f8-44e7-a0b3-448d21bb139a-0", "tool_calls": [ { "name": "QuestionAnswer", "args": { "question": "今の日本の総理大臣は、何代目？", "answer": [ { "fact": "現在の日本の総理大臣は第101代の岸田文雄である。", "substring_quote": [ "第101代岸田文雄（在任: 2021年〈令和3年〉11月10日 - ）" ] } ] }, "id": "toolu_bdrk_01NRBB4L9nYCXkrkfJiCGGE6", "type": "tool_call" } ], "usage_metadata": { "input_tokens": 790, "output_tokens": 146, "total_tokens": 936 }, "invalid_tool_calls": [] } } } ] ], "llm_output": { "usage": { "prompt_tokens": 790, "completion_tokens": 146, "total_tokens": 936 }, "stop_reason": "tool_use", "model_id": "anthropic.claude-3-5-sonnet-20241022-v2:0" }, "run": null, "type": "LLMResult" } 構造化結果がいい感じ Toolを呼ぶ判断結果となっている

Slide 9

Slide 9 text

9 ・citation fuzzy matchは、簡単に出典付きの回答を得られるお手軽chainである。・内部実装で使われているstructured outputは、LLMの出力結果を構造化されたPythonオブジェクトとして受け取ることができる。・Open AI向けの機能だが、Claude3系など、一部他社モデルでも利用できるようになってきた。まとめ

Slide 10

Slide 10 text

No content