チャンクを覗いてみる [ { "key": "e0ce1044-2bc5-4961-bd51-010de2178217", "metadata": { "AMAZON_BEDROCK_TEXT": "This image shows a presentation slide titled ¥"Amazon Bedrock Guardrails¥" displayed on a large screen. The slide outlines several key features and functionalities of Amazon Bedrock, including configuring thresholds to filter undesirable content, identifying and correcting factual claims using automated reasoning, defining and disallowing denied topics, removing personally identifiable information, and filtering hallucinations. The slide is being viewed by a group of people, likely in a professional or academic setting, as indicated by the formal attire and attentive posture of the audience. The background includes architectural elements such as columns and a decorative ceiling.", "x-amz-bedrock-kb-source-file-mime-type": "image/jpeg", "x-amz-bedrock-kb-document-page-number": 0, "AMAZON_BEDROCK_METADATA":, "x-amz-bedrock-kb-data-source-id": "90YRNHYGUJ", "x-amz-bedrock-kb-source-file-modality": "IMAGE" } } ] Input • スライドの概略 • メタ情報も含まれる「背景には、柱や装飾的な天 井などの建築要素が含まれています。」 • スライドの文字を要約というよりかは、画像そのも のを要約 • チャンク数:1 • 画像単体ではチャンキング戦略があまり意味をな さない re:Inventのセッション (ANT 339) のスライドを撮影した画像 発見
チャンクを覗いてみる [ { "metadata": { “x-amz-bedrock-kb-chunk-start-time-in-millis”: 0, "AMAZON_BEDROCK_TEXT": "[spk_0]: Go, go, go.", "x-amz-bedrock-kb-chunk-end-time-in-millis": 10366, } },{ “metadata”: { "x-amz-bedrock-kb-chunk-start-time-in-millis": 0, "AMAZON_BEDROCK_TEXT": "At a technology conference, a Formula 1-style racing car with ¥"AWS¥" branding is displayed as part of a sports safety demonstration. The car is surrounded by a red carpet and professional lighting, with ¥"Sports Forum¥" and ¥"Evolution of Safety¥" banners visible. Four men in casual work attire are actively working on the vehicle, focusing on the tires and undercarriage. One man uses a power tool while others observe and assist. A voice commands ¥"Go, go, go,¥" indicating a timed demonstration. The scene showcases a collaboration between AWS, Formula 1 racing, and sports safety organizations, highlighting how advanced technology and data are being used to improve player safety in high-risk sports. The professional setup and focused teamwork emphasize the serious nature of this technological advancement in sports safety.", "x-amz-bedrock-kb-source-file-mime-type": "video/quicktime", "AMAZON_BEDROCK_METADATA": "x-amz-bedrock-kb-chunk-end-time-in-millis": 10366 } } ] Input • 別途2分程度の動画も処理 • サマリ・Audioの構成は同じ • チャンク数:2 • 動画のサマリ情報 • Audio情報 • 5,4,3,2,1の掛け声はトークン化されていない re:InventのSports Forum で撮影した動画(10秒) 発見
チャンクを覗いてみる re:Inventのセッション (SAS403-R) 冒頭約30分の録音 { “key”: “7c4a3adb-0f93-4f07-a8bb-cf1910708d8a”, “metadata”: { “x-amz-bedrock-kb-data-source-id”: “90YRNHYGUJ”, “x-amz-bedrock-kb-chunk-start-time-in-millis”: 406690, “x-amz-bedrock-kb-source-file-modality”: “AUDIO”, “x-amz-bedrock-kb-chunk-end-time-in-millis”: 458510, “AMAZON_BEDROCK_TEXT”: “Makes an MCPO MCP makes a call to a pool agent which is uh. To acknowledge this in terms The chief The multi-tenered rack system which you built in the previous slide which I put. You‘re kind of like making that, The database on that ask question if you’re trying to pull that. And then gives back that response to the orchestrated. The orchestrated agent does not see immediate response from that knowledge based agent.”, “AMAZON_BEDROCK_METADATA”:割愛, "x-amz-bedrock-kb-source-file-mime-type": "audio/mpeg" } }, • ところどころ誤字脱字が目立つ • 文脈でチャンクしてくれていそう • チャンク数:16 • 音声が全てチャンク化されている Input 発見