Bedrock Agentsレスポンス解析によるAgentのOps

©Mitsubishi Electric Corporation B e d r o c k
A g e n t s レスポンス解析による A g e n t の O p s A I 戦略プロジェクトグループ塚田真規 2 0 2 5 / 2 / 1 4

©Mitsubishi Electric Corporation 自己紹介 2 @m_tsukada •
2024 Japan AWS All Certifications Engineers • 名前： • 塚田真規 (つかだまさき) • 所属： • 三菱電機株式会社 • AI戦略プロジェクトグループ (横浜市みなとみらい) ※Bedrock〇クイズ正解者！

©Mitsubishi Electric Corporation A I エージェン
トとは？ 3 • タスク遂行のために計画と行動を繰り返してタスクを遂行するAIシステム 1. 「何をするか計画(Reasoning)」 2. 「計画に基づいて行動(Acting)」 • Amazon Bedrock Agents • AIエージェント機能をフルマネージドで提供 • 選択可能な操作 1. Bedrock Knowledge basesを用いた情報検索 2. アクショングループ (Lambda関数) 3. ユーザへの追加質問 4. 専門エージェントの呼び出し ※Multi-agent collaboration機能 AWS Cloud Lambda Bedrock Agents OpenSearch Service Bedrock Knowledge bases Lambda DynamoDB Web検索

A g e n t s の動作見える化 4 AWS Cloud Lambda Bedrock Agents OpenSearch Service Bedrock Knowledge bases Lambda DynamoDB Web検索 AWS Cloud Bedrock Agents OpenSearch Service Bedrock Knowledge bases Lambda DynamoDB AWS Cloud Lambda Bedrock Agents Web検索関連ドキュメントとデータベースを確認して回答する Web検索した結果をそのまま回答する「魚の採り方を教えて！」「銛や槍を使って魚を突いてください」 Bedrock Agentsがどのように行動したかユーザからは見えない Bedrock Agentsの動作を可視化＆評価し、AgentのOpsを挑戦！

A g e n t s の動作検証 5 Bedrock Agentsのレスポンスを解析 Bedrock Agentの動作をトレース・可視化 LLMアプリケーション開発支援ツール • Webダッシュボード • トレース管理 • プロンプト管理 • データセット管理 • LLM as a Judge評価(有償版) など Bedrock Agentの動作結果を評価 LLMアプリケーションの評価支援ツール • RAG • Agents/Tool use cases • Natural Language Comparison • SQL など

©Mitsubishi Electric Corporation R a g a s によ
るエージェント評価 6 Ragasではエージェント評価向けに3つのメトリクスを提供 Tool Call Accuracy • Agentが利用したツールの呼び出し精度を0~1の範囲で評価 Topic Adherence Tool Call Accuracy Agent Goal Accuracy Agent Goal Accuracy • 「期待する結果」と「Agentの出力」を比較し、0 or 1のバイナリ値で評価今回は対象外 get_sales {“id”: “1”} update_sales {“id”: “1”, “sales”: “12,345”} 期待するツール呼び出し get_sales {“id”: “1”} update_sales {“id”: “1”, “sales”: “12,345”} check_sales {“id”: “1”, “sales”: “10,000”} 実際のツール呼び出し必要なツールが、正しい順序で呼ばれたかチェックツール名/引数の文字列比較は「完全一致」か「Semantic Similarity」のいずれかから選択可能(今回はSemantic Similarityを利用)

A g e n t s のレスポンス解析 ~ 1 ~ • InvokeAgent()のレスポンスからBedrock Agentsの動作を確認 7 { "accessDeniedException": { }, "badGatewayException": { }, "chunk": { … }, …….. "trace": { <TracePartオブジェクト> }, … } 参考： https://docs.aws.amazon.com/ja_jp/bedrock/latest/userguide/trace-events.html InvokeAgentのレスポンスボディ TracePartオブジェクト Traceオブジェクトの OrchestrationTrace { "agentId": "string", "agentName": "string", "collaboratorName": "string", "agentAliasId": "string", "sessionId": "string", "agentVersion": "string", "trace": { <Traceオブジェクト> }, "callerChain": [{ "agentAliasArn": "agent alias arn" }] } { "modelInvocationInput": { // … }, "modelInvocationOutput": { "metadata": { "usage": { "inputToken":: int, "outputToken":: int }, "rawResponse": { "content": “….“ }, "rationale": { ... }, "invocationInput": { ... }, "observation": { ... } }

©Mitsubishi Electric Corporation { "modelInvocationInput": { // … }, "modelInvocationOutput":
{ "metadata": { "usage": { "inputToken":: int, "outputToken":: int }, "rawResponse": { "content": “….“ }, "rationale": { ... }, “invocationInput”: { … }, "observation": { ... } } B e d r o c k A g e n t s のレスポンス解析 ~ 2 ~ • Bedrock AgentのTool Useの確認 8 TraceオブジェクトのOrchestrationTrace { "message": { "role": "assistant", "content": [ { "text": “…..“ }, { "text": null, "toolUse": { "name":"GET__x_amz_knowledgebase_HKT0EAPIPL__Search", "input": { “searchQuery": "Sunset Books" }, "toolUseId": "tooluse_lImR0tzVRZ62YdMEoREiQA" }, … } ]}} Tool Use情報

A g e n t s のレスポンス解析 ~ 3 ~ 9 { "modelInvocationInput": { // … }, "modelInvocationOutput": { "metadata": { "usage": { "inputToken":: int, "outputToken":: int }, "rawResponse": { "content": “….“ }, "rationale": { ... }, “invocationInput”: { … }, "observation": { ... } } { "type": "FINISH", "traceId": "46c795d7-6e43-4b38-8b41-2c21e3f72c03-2", "finalResponse": { "text": "The sales of Sunset Books in November 2024 were $7800." } } エージェントの最終応答 • Bedrock Agentの最終出力の確認 TraceオブジェクトのOrchestrationTrace

A g e n t s の構成と実施内容 10 AWS Cloud Lambda Bedrock Agents OpenSearch Service Bedrock Knowledge bases Lambda Web検索 get_sales ストアIDと年月から売り上げを取得 get_weather Webから天気を取得 store-info ストアID、名前、所在地、オーナー情報を保持メッセージリスト HumanMessage AIMessage ToolMessage AIMessage … ①エージェント実行 ②トレースデータ収集 ③メッセージリスト化 ④評価 Could you tell me the Sunset Books sales at 2024/11?

©Mitsubishi Electric Corporation 評価ケース 1 :
正しい結果 12 エージェントへの入力 Could you tell me the Sunset Books sales at 2024/11? エージェントの応答 The sales for Sunset Books in November 2024 were $7800. GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":"2","year":"2024","month":"11"} 実行結果期待する回答 7800 USD 期待する toolUse GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":"2","year":"2024","month":"11"} 実行結果 Agent Goal Accuracy 1.0 Tool Call Accuracy 1.0000000000000002 評価:ケース1 期待する結果のみを変更して、評価値の変化を確認

To o l U s e の引数に誤り 13 エージェントへの入力 Could you tell me the Sunset Books sales at 2024/11? エージェントの応答 The sales for Sunset Books in November 2024 were $7800. GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":"2","year":"2024","month":"11"} 実行結果評価:ケース2 ※Tool Useの引数を正しくないものに変更期待する回答 7800 USD 期待する toolUse GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":"3","year":"2024","month":"11"} 実行結果 Agent Goal Accuracy 1.0 Tool Call Accuracy 0.9219615108186483

異なるツールを利用 14 エージェントへの入力 Could you tell me the Sunset Books sales at 2024/11? エージェントの応答 The sales for Sunset Books in November 2024 were $7800. GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":"2","year":"2024","month":"11"} 実行結果評価:ケース3 ※Tool Useでツール名を別のツールに変更期待する回答 7800 USD 期待する toolUse GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_weather__get_weather {"city": "New York"} 実行結果 Agent Goal Accuracy 1.0 Tool Call Accuracy 0.0

エージェント応答に誤り 15 エージェントへの入力 Could you tell me the Sunset Books sales at 2024/11? エージェントの応答 The sales for Sunset Books in November 2024 were $7800. GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":"2","year":"2024","month":"11"} 実行結果評価:ケース4 ※期待するエージェント応答を異なるものに変更期待する回答 7800 Yen 期待する toolUse GET__x_amz_knowledgebase_HKT0EAPIPL__Search {"searchQuery": "Sunset Books"} get_sales__get_store_sales {"id":“2","year":"2024","month":"11"} 実行結果 Agent Goal Accuracy 0.0 Tool Call Accuracy 1.0000000000000002

©Mitsubishi Electric Corporation まとめ • Bedrock Agentsのレスポンスを解析し、どのように動作したかの可視化と期待と通りに動作したか評価を実施 16
Bedrock Agentの動作をトレース・可視化 Bedrock Agentの動作結果を評価 Bedrock Agentsが何をどのような順序で実施したか可視化 Bedrock Agentsの行動結果、最終応答が期待通りか定量的に評価 2つのフレームワークを活用して可視化と評価を実現自律的に動作するAIエージェントの可視化、評価するAgentのOpsの一端を実現

Bedrock Agentsレスポンス解析によるAgentのOps

Bedrock Agentsレスポンス解析によるAgentのOps

matsukada

More Decks by matsukada

Other Decks in Programming

Featured

Transcript

©Mitsubishi Electric Corporation B e d r o c k

©Mitsubishi Electric Corporation 自己紹介 2 @m_tsukada •

©Mitsubishi Electric Corporation A I エージェン

©Mitsubishi Electric Corporation B e d r o c k

©Mitsubishi Electric Corporation B e d r o c k

©Mitsubishi Electric Corporation R a g a s によ

©Mitsubishi Electric Corporation B e d r o c k

©Mitsubishi Electric Corporation { "modelInvocationInput": { // … }, "modelInvocationOutput":

©Mitsubishi Electric Corporation B e d r o c k

©Mitsubishi Electric Corporation B e d r o c k

©Mitsubishi Electric Corporation L a n g f u s

©Mitsubishi Electric Corporation 評価ケース 1 :

©Mitsubishi Electric Corporation 評価ケース 2 :

©Mitsubishi Electric Corporation 評価ケース 3 :

©Mitsubishi Electric Corporation 評価ケース 4 :

©Mitsubishi Electric Corporation まとめ • Bedrock Agentsのレスポンスを解析し、どのように動作したかの可視化と期待と通りに動作したか評価を実施 16