⼤規模⾔語モデルの拡張（RAG）が終わったかも知れない件について

by NearMeの技術発表資料です

Slide 1

Slide 1 text

0 ⼤規模⾔語モデルの拡張（RAG）が終わったかも知れない件について 2024-04-05 第85回NearMe技術勉強会 @yujiosaka

Slide 2

Slide 2 text

1 RAG 1 = Retrieved Augmented Generation

Slide 3

Slide 3 text

2 LLMの最適化 RAG ハイブリッドファインチューニングプロンプトエンジニアリング

Slide 4

Slide 4 text

3 プロンプトエンジニアリングとは System ニュース記事が提示されます。あなたのタスクは、政府に対して表明された意見とその感情を特定することです。このタスクにはステップバイステップで取り組み、時間をかけて、ステップを飛ばさないでください： 1. ニュース記事の段落を読む 2. その段落で意見が表明されているかどうかを判断する。そうでない場合は、次の段落に進む 3. 意見がある場合は、以下のキーを持つJSONを抽出する • opinion: 許容される値は "positive"、"negative"、"neutral" のいずれか • evidence: 意見を裏付ける文字列のリストを含む • speaker: 意見を表明した人物または政府機関 4. 意見がすでに述べられている場合でも、出来るだけ多くの証拠を集めること明瞭な指示考える時間を与える複雑なタスクを分解する

Slide 5

Slide 5 text

4 LLMの最適化 RAG ハイブリッドファインチューニングプロンプトエンジニアリング

Slide 6

Slide 6 text

5 ファインチューニングとは学習済みのモデルから1つ以上のパラメータを追加でトレーニングすること https://learnopencv.com/fine-tuning-llms-using-peft/

Slide 7

Slide 7 text

6 LLMの最適化 RAG ハイブリッドファインチューニングプロンプトエンジニアリング今日のお話しする内容

Slide 8

Slide 8 text

7 去年作ったやつ 7

Slide 9

Slide 9 text

8 ChatIQ https://github.com/yujiosaka/ChatIQ

Slide 10

Slide 10 text

Slide 11

Slide 11 text

10 ChatIQの仕組み 1. Post message 6. Reply message 2. Event API 5. Chat API 3. Chat API 4. Response

Slide 12

Slide 12 text

11 ChatIQの仕組み 1. Post message 10. Reply message 2. Event API 9. Chat API 3. Chat API 4. Response RAG 5. Question 6. Answer 7. Chat API 8. Response

Slide 13

Slide 13 text

12 ChatIQの仕組み 1. Post message 10. Reply message 2. Event API 9. Chat API 3. Chat API 4. Response 5. Question 6. Answer 7. Chat API 8. Response RAG Retrieve Documents Weaviate (VectorDB)

Slide 14

Slide 14 text

13 RAGのプロンプト System Given the following extracted parts of a long document and a question, create a final answer. Consider the timestamp, channel and user when providing your answer. Always include the permalink in your response. If you don't know the answer, just say that you don't know. Don't try to make up an answer. ______________________ {documents} Human {question}

Slide 15

Slide 15 text

14 RAGのプロンプト System Given the following extracted parts of a long document and a question, create a final answer. Consider the timestamp, channel and user when providing your answer. Always include the permalink in your response. If you don't know the answer, just say that you don't know. Don't try to make up an answer. ______________________ {“user”:“F0JD6RZU6”,“message”:“<@U06FKAYEHF> is using 14 inches Mac Book Pro”,“channel”: “C024BE91L”,“timestamp”:“2024-01-01T12:34:56.000200+00:00”} Human What’s the spec of <@U06FKAYEHF>’s laptop?

Slide 16

Slide 16 text

15 RAGの回答 <@U06FKAYEHF> is using Mac Book Pro

Slide 17

Slide 17 text

16 Document RAGの処理 VectorDB Documents Question Answer Indexing Retrieval Generation

Slide 18

Slide 18 text

17 VectorDB 17

Slide 19

Slide 19 text

18 Document 0.404 0.594 0.472 0.921 0.739 0.952 0.287 0.648 0.131 0.835 … VectorDBの仕組み Vector Documents Text Model Vector DB … … … … … … … …

Slide 20

Slide 20 text

19 Text model • Word2Vec • Sentence Transformer • OpenAI embedding API • Cohere embedding API cat dog house car game python … cat dog house car game python … x 99999 x 99999 x 300 Word2Vec Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do 0.28 0.31 0.11

Slide 21

Slide 21 text

20 Document Splitting • ⽂字数で分割 • トークン数で分割 • HTMLの要素で分割 • セマンティック（センテンス間の類似度）で分割 • ドキュメント間のオーバーラッピング • etc. Document Document Documents Documents ドキュメントが大きすぎると ● コンテキストウィンドウ（トークン数上限）に収まらない ● 詳細な情報がベクトルから失われるここのチューニングはかなりだるい

Slide 22

Slide 22 text

21 Document 0.404 0.594 0.472 0.921 0.739 0.952 0.287 0.648 0.131 0.835 … VectorDBの仕組み Vector Documents Text Model Vector DB … … … … … … … …

Slide 23

Slide 23 text

22 VectorDB • Pinecone（Proprietary） • Weaviate（OSS） • Milvus（OSS） • Qdrant（OSS） • Vespa（OSS） • Chroma（OSS） • Redis（OSS?） dog pet dog pet cat LSH, HNSW, IVH, Annoy, etc. k個の類似ドキュメントを見つける

Slide 24

Slide 24 text

23 VectorDBブーム

Slide 25

Slide 25 text

24 VectorDBブーム

Slide 26

Slide 26 text

Slide 27

Slide 27 text

26 RAG might be dead https://x.com/agishaun/status/1758561862764122191?s=20

Slide 28

Slide 28 text

27 https://docs.google.com/presentation/d/1mJUiPBdtf58NfuSEQ7pVSEQ2Oqmek7F1i4gBwR6JDss/edit#slide=id.g26c0cb8dc66_0_0

Slide 29

Slide 29 text

Slide 30

Slide 30 text

29 何故なのか 29

Slide 31

Slide 31 text

30 Context Window https://ogre51.medium.com/context-window-of-language-models-a530ffa49989

Slide 32

Slide 32 text

31 Context Window Gemini 1.5 Claude 3 https://ogre51.medium.com/context-window-of-language-models-a530ffa49989

Slide 33

Slide 33 text

32 全部プロンプトに突っ込めば良くね？ 32

Slide 34

Slide 34 text

33 1. ポールグレアムのエッセイに料理のレシピの秘密の⾷材の⽂章をランダムに挿⼊する 2. LLMに秘密の⾷材を答えさせ、いくつ正解するかを調べる Needle In A Haystack Test https://github.com/gkamradt/LLMTest_NeedleInAHaystack

Slide 35

Slide 35 text

34 Needles In A Haystack Test https://github.com/gkamradt/LLMTest_NeedleInAHaystack

Slide 36

Slide 36 text

35 親近性バイアス（Recency Bias） https://arxiv.org/pdf/2310.01427.pdf

Slide 37

Slide 37 text

36 RAGだって間違えるのでは？ 36

Slide 38

Slide 38 text

37 RAGによる間違い Document Question Retrieval Generation 三菱UFJ銀行はいつ誕生しましたか？ 2001年3月31日、大阪市此花区の西部臨海エリアにがオープンした。米国の映画会社「ユニバーサル・スタジオ」のテーマパークが初め… 三菱UFJ銀行は2021年 3月31日に開業しました

Slide 39

Slide 39 text

38 RAGの精度向上 38

Slide 40

Slide 40 text

39 Multi-Vector Retriever 1. 「テキストの全⽂」と「テキストの要約」をEmbedding（＝ベクトル化）する 2. 「テキストの要約」によって「テキストの全⽂」をIndexingする 3. 「テキストの要約」がヒットしたら、代わりに「テキストの全⽂」を返却する https://arxiv.org/pdf/2312.06648.pdf

Slide 41

Slide 41 text

40 Recursive Abstractive Processing for Tree-Organized Retrieval (RAPTOR) 1. ドキュメントをEmbeddingによってまとめてIndexingする 2. まとめたドキュメントを要約してIndexingする 3. 1-2を何度も繰り返す https://arxiv.org/abs/2401.18059

Slide 42

Slide 42 text

41 Self-Reﬂection RAG LLMに取得したドキュメントの関連性を評価させ、答えが出るまで何度も質問を書き直す https://arxiv.org/abs/2310.11511

Slide 43

Slide 43 text

42 Corrective Retrieval Augumented Generation (cRAG) 1. Retrieval Evaluatorによって、取得したドキュメントを「正しい」「曖昧」「誤り」の3つに分類する 2. 「正しい」ドキュメントは、分解‧再構築して回答する 3. 「誤り」ドキュメントは Web検索にかけて、検索結果を使って回答する 4. 「曖昧」ドキュメントは上記2つの結果を融合させて回答する https://arxiv.org/abs/2401.15884

Slide 44

Slide 44 text

43 RAGのこれから 43

Slide 45

Slide 45 text

44 何もかもプロンプトに突っ込むのはそもそも無理がある Question Routing VectorDB RelationalDB Internet 事前にありとあらゆるデータを取得しておくことはできないから、必要なデータソースを LLMに判断させる必要が自ずと手でくる

Slide 46

Slide 46 text

45 RAGの成功事例

Slide 47

Slide 47 text

46 RAG One Shot コードの複雑さ複雑になりがち簡単に実装できる拡張性データソースが増えても拡張できる単一のデータソースでないと難しい応答時間正確性を求めれば求めるほど遅くなる速い実行コスト必要なドキュメントのトークンだけ消費する不要なドキュメントのトークンを消費する正確性間違いはあるが、訂正することができる Hallucinationを防ぐことが難しいメンテナンス性 Chunk Size、Chunk Overlap等チューニングが必要何も考えずに突っ込めばいいセキュリティ権限のないデータソースへのアクセスを拒否できる権限を無視してアクセスできてしまう RAG vs. One Shot

Slide 48

Slide 48 text

47 ChatIQの実装 • 同⼀スレッド内の会話は全てプロンプトに突っ込む • 異なるスレッドの会話はRAGから取得 • Botが招待されているPublicチャンネルの会話にはアクセスできる • Privateであっても、同⼀チャンネルの会話にはアクセスできる

Slide 49

Slide 49 text

48 LLMの最適化ハイブリッドファインチューニング RAG プロンプトエンジニアリングモデルの振る舞いの最適化モデルの知識の最適化

Slide 50

Slide 50 text

49 知識の最適化にファインチューニングを使ってしまった失敗例 14万のSlackメッセージを使ってGPT3.5-turboをファインチューニングした結果 500文字のプロンプトエンジニアリングに関するブログ記事を書いて。分かった。午前中にやっとくわ。今すぐやれよ。 OK。

Slide 51

Slide 51 text

50 LLMの最適化ハイブリッドファインチューニング RAG プロンプトエンジニアリングモデルの振る舞いの最適化モデルの知識の最適化

Slide 52

Slide 52 text

51 Retrieval Augmented Fine-Tuning (RAFT) • RAGを前提としたファインチューニング⼿法 • ファインチューニングが「勉強して教科書持ち込みなし」、RAGが「勉強しないで教科書持ち込み」だとすれば、RAFTは「正しく教科書を引くための勉強」を指す • 正しいドキュメントに撹乱ドキュメントを混ぜて推論させた回答をファインチューニングに使⽤する https://arxiv.org/abs/2403.10131

Slide 53

Slide 53 text

52 RAFTのファインチューニングに使⽤する⽂章 Question: The Oberoi family is part of a hotel company that has a head office in what city? context: [The Oberoi family is an Indian family that is famous for its involvement in hotels, namely through The Oberoi Group]...[It is located in city center of Jakarta, near Mega Kuningan, adjacent to the sister JW Marriott Hotel. It is operated by The Ritz-Carlton Hotel Company. The complex has two towers that comprises a hotel and the Airlangga Apartment respectively]...[The Oberoi Group is a hotel company with its head office in Delhi.] Instruction: Given the question, context and answer above, provide a logical reasoning for that answer. Please use the format of: ##Reason: {reason} ##Answer: {answer}. CoT Answer: ##Reason: The document ##begin_quote## The Oberoi family is an Indian family that is famous for its involvement in hotels, namely through The Oberoi Group. ##end_quote## establishes that the Oberoi family is involved in the Oberoi group, and the document ##begin_quote## The Oberoi Group is a hotel company with its head office in Delhi. ##end_quote## establishes the head office of The Oberoi Group. Therefore, the Oberoi family is part of a hotel company whose head office is in Delhi. ##Answer: Delhi 正しいドキュメント残りは撹乱ドキュメント推論ステップも書かせる

Slide 54

Slide 54 text

53 LLMの最適化に銀の弾丸は（今のとこ）ない 53

Slide 55

Slide 55 text

54 参考 • A Survey of Techniques for Maximizing LLM Performance • Fine-tuning Large Language Models (LLMs) • Vectoring Words (Word Embeddings) • RAG from Scratch

Slide 56

Slide 56 text

55 Thank you