Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Foundation Models で オンデバイスRAGを試みる

Avatar for p0dee p0dee
October 01, 2025
320

Foundation Models で オンデバイスRAGを試みる

extension DC 2025 Day1@DeNA (2025/10/01) で話した内容の、スキップしたスライドも含めたノーカット版です。

紹介ツイート:https://x.com/ShunTakeishi/status/1958363232068128879
登壇者の試行錯誤ブログ:https://p0dee.com/blog/tag/foundation-models/

Avatar for p0dee

p0dee

October 01, 2025
Tweet

Transcript

  1. # !

  2. ౴͑ TOOL CALLING $ ࣭໰ ճ౴ Instruction: “Ϣʔβʔ͸ࣗ෼ͷ೔هʹ͍࣭ͭͯ໰Λ౤͔͚͛·͢ɻ ͋ͳͨ͸ؔ࿈͢ΔΩʔϫʔυΛ࢖༻ͯ͠೔هͷσʔλΛݕࡧ͠ɺ ͦͷ࣭໰ʹճ౴͠·͢ɻ”

    ݻఆͷμϛʔσʔλ ݕࡧΫΤϦ ֘౰৘ใ 2025೥07݄18೔ ࢄ൅Λ໎͍ͬͯΔ೔ه 2025೥6݄9೔ ࢄ൅ʹߦͬͨ೔ه 2025݄4݄08೔ ࢄ൅ʹߦͬͨ͜ͱΛ ༑ୡʹؾ͕͍ͭͯ΋Β͑ͨ (ຊจলུ)
  3. TOOL CALLING 2025೥07݄18೔ ࢄ൅Λ໎͍ͬͯΔ೔ه 2025೥6݄9೔ ࢄ൅ʹߦͬͨ೔ه 2025݄4݄08೔ ࢄ൅ʹߦͬͨ͜ͱΛ ༑ୡʹؾ͕͍ͭͯ΋Β͑ͨ (ຊจলུ)

    ݻఆͷμϛʔσʔλ “ࢄ൅” %'.͸࣭໰Λਖ਼͘͠ղऍ͠ɺద੾ͳݕࡧΫΤϦΛੜ੒Ͱ͖ͨ %'.͸ಘΒΕͨ೔هσʔλΛਖ਼͘͠ղऍ͠ղ౴Ͱ͖ͨ “൅Λ੾ͬͨ”
  4. ͋ͷ / Πʔ / ϋ / τʔ / ϰΥ /

    ͷ / ͖͢ / ͱ / ͓ͬ / ͨ / ෩ / ɺ / Ն / Ͱ΋ / ఈ / ʹ / ྫྷ / ͨ / ͞ / Λ / ΋ͭ / ੨͍ / ͦΒ / ɺ / ͏ / ͭ / ͘ / ͍͠ / ৿ / Ͱ / ০Β / Ε / ͨ / Ϟ / Ϧʔ / Φ / ࢢ / ɺ / ߫֎ / ͷ / ͗ / Β / ͗ / Β / ͻ͔Δ / ૲ / ͷ / ೾ / ɻ
  5. ͋ͷɹɹ[-0.231, 0.027, 0.210 ... -0.128, 0.172, -0.151] Πʔɹɹ[ 0.035, -0.122,

    0.064 ... -0.127, -0.109, -0.100] ϋɹɹɹ[-0.052, 0.195, 0.281 ... 0.003, 0.064, -0.048] τʔɹɹ[ 0.240, 0.309, 0.173 ... 0.095, 0.054, -0.172] … ͻ͔Δɹ[ 0.054, 0.104, -0.062 ... 0.048, 0.160, -0.001] ૲ɹɹɹ[ 0.055, -0.059, 0.142 ... -0.024, -0.082, -0.045] ͷɹɹɹ[-0.160, -0.062, 0.464 ... -0.056, 0.215, -0.091] ೾ɹɹɹ[ 0.041, 0.101, 0.223 ... 0.092, 0.027, 0.069]
  6. ͋ͷɹɹ[-0.231, 0.027, 0.210 ... -0.128, 0.172, -0.151] Πʔɹɹ[ 0.035, -0.122,

    0.064 ... -0.127, -0.109, -0.100] ϋɹɹɹ[-0.052, 0.195, 0.281 ... 0.003, 0.064, -0.048] τʔɹɹ[ 0.240, 0.309, 0.173 ... 0.095, 0.054, -0.172] … ͻ͔Δɹ[ 0.054, 0.104, -0.062 ... 0.048, 0.160, -0.001] ૲ɹɹɹ[ 0.055, -0.059, 0.142 ... -0.024, -0.082, -0.045] ͷɹɹɹ[-0.160, -0.062, 0.464 ... -0.056, 0.215, -0.091] ೾ɹɹɹ[ 0.041, 0.101, 0.223 ... 0.092, 0.027, 0.069] ɹɹɹɹ[ 0.022, -0.000, 0.034 ... 0.007, 0.019, -0.007] ϓʔϦϯάʢྫ: Mean poolingʣ ਖ਼نԽ (L2 normalization)
  7. !ɹɹɹ[ 0.022, -0.000, 0.034 ... 0.007, 0.019, -0.007] !ɹɹɹ[ 0.081,

    -0.033, -0.040 ... 0.056, -0.015, 0.017] !ɹɹɹ[ 0.015, -0.003, 0.040 ... 0.054, -0.005, -0.004] … !ɹɹɹ[ 0.057, 0.012, -0.050 ... 0.082, -0.002, 0.057] !ɹɹɹ[ 0.018, 0.021, -0.023 ... -0.070, -0.017, 0.006] !ɹɹɹ[-0.022, 0.005, -0.008 ... -0.020, 0.021, -0.044] 4UFQจॻ ೔هΤϯτϦ ͝ͱʹϕΫτϧԽʢ/-$POUFYUVBM&NCFEEJOHʣ
  8. !ɹɹɹ[ 0.022, -0.000, 0.034 ... 0.007, 0.019, -0.007] !ɹɹɹ[ 0.081,

    -0.033, -0.040 ... 0.056, -0.015, 0.017] !ɹɹɹ[ 0.015, -0.003, 0.040 ... 0.054, -0.005, -0.004] … !ɹɹɹ[ 0.057, 0.012, -0.050 ... 0.082, -0.002, 0.057] !ɹɹɹ[ 0.018, 0.021, -0.023 ... -0.070, -0.017, 0.006] !ɹɹɹ[-0.022, 0.005, -0.008 ... -0.020, 0.021, -0.044] ʁɹɹɹ[-0.062, -0.036, -0.046 ... -0.009, -0.020, 0.011] 4UFQ࣭໰ΫΤϦΛϕΫτϧԽ͠ɺ֤จॻϕΫτϧͱͷྨࣅ౓Λܭࢉ “ඒ͍͠৿Ͱ০ΒΕͨࢢ͸ʁ”
  9. ! 0.022 -0.000 ... 0.019 -0.007 ! 0.081 -0.033 ...

    -0.015 0.017 ! 0.015 -0.003 ... -0.005 -0.004 ɹɹɹɹɹɹɹɹɹɹɹ…ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ! 0.057 0.012 ... -0.002 0.057 ! 0.018 0.021 ... -0.017 0.006 ! -0.022 0.005 ... 0.021 -0.044 4UFQ࣭໰ΫΤϦΛϕΫτϧԽ͠ɺ֤จॻϕΫτϧͱͷྨࣅ౓Λܭࢉ -0.062 -0.036 -0.046 ... -0.009 -0.020 0.011 -0.012 0.802 0.029 ... 0.510 -0.001 -0.910 = ֤จॻΛࣔ͢ߦϕΫτϧɾ࣭໰ΫΤϦΛࣔ͢ྻϕΫτϧͷ಺ੵ͕ ίαΠϯྨࣅ౓ d ʹͳΔʢʹ͍ۙ΄Ͳྨࣅɺʹ͍ۙ΄Ͳ૬ҧʣ
  10. ! 0.022 -0.000 ... 0.019 -0.007 ! 0.081 -0.033 ...

    -0.015 0.017 ! 0.015 -0.003 ... -0.005 -0.004 ɹɹɹɹɹɹɹɹɹɹɹ…ɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹɹ ! 0.057 0.012 ... -0.002 0.057 ! 0.018 0.021 ... -0.017 0.006 ! -0.022 0.005 ... 0.021 -0.044 4UFQྨࣅ౓είΞͰιʔτ͠ɺ্ҐΛώοτͱ͢Δ -0.062 -0.036 -0.046 ... -0.009 -0.020 0.011 -0.012 0.802 0.029 ... 0.510 -0.001 -0.910 = ֤จॻΛࣔ͢ߦϕΫτϧɾ࣭໰ΫΤϦΛࣔ͢ྻϕΫτϧͷ಺ੵ͕ ίαΠϯྨࣅ౓ d ʹͳΔʢʹ͍ۙ΄Ͳྨࣅɺʹ͍ۙ΄Ͳ૬ҧʣ
  11. import Accelerate import CoreML internal import NaturalLanguage let embedding: NLContextualEmbedding

    = … // 1ςΩετΛϕΫτϧԽʢmean pooling + L2ਖ਼نԽʣ func tensorize(text: String, asColumnVector: Bool = false) throws -> MLTensor? { let result = try embedding.embeddingResult(for: text, language: language) let dim = embedding.dimension var sum = [Float](repeating: 0, count: dim) var count = 0 var index = text.startIndex result.enumerateTokenVectors(in: text.startIndex..<text.endIndex) { vecD, range in // Double -> Float ΛҰׅม׵ var vecF = [Float](repeating: 0, count: dim) vDSP_vdpsp(vecD, 1, &vecF, 1, vDSP_Length(dim)) // sum += vecF vDSP_vadd(sum, 1, vecF, 1, &sum, 1, vDSP_Length(dim)) count += 1 index = range.upperBound } guard count > 0 else { return nil } // mean poolingʢsum /= countʣ var invN = 1.0 / Float(count) vDSP_vsmul(sum, 1, &invN, &sum, 1, vDSP_Length(dim)) // L2 ਖ਼نԽ let normVector = l2Normalize(sum) let shape = asColumnVector ? [normVector.count, 1] : [1, normVector.count] return MLTensor(shape: shape, scalars: normVector) } // L2 ਖ਼نԽ private func l2Normalize(_ v: [Float]) -> [Float] { var vec = v var norm: Float = 0 vDSP_svesq(vec, 1, &norm, vDSP_Length(vec.count)) norm = sqrtf(norm) + 1e-12 vDSP_vsdiv(vec, 1, &norm, &vec, 1, vDSP_Length(vec.count)) return vec }
  12. var contents: [UUID : String] = [:] var tensors: [UUID

    : MLTensor] = [:] func embed(items: [String]) throws { var ret: [UUID : String] = [:] do { try items.forEach { item in let tensor = try encode(text: item) let uuid = UUID() tensors[uuid] = tensor ret[uuid] = item } } catch { // error handling } contents = ret } func search(query: String, topCount: Int) async -> [String] { // υΩϡϝϯτΛूੵͨ͠ D*M ߦྻ let flatteneds = tensors.values.map { $0.flattened() } let docsTensor = MLTensor(stacking: flatteneds) // ΫΤϦϕΫτϧ guard let queryTensor = try? encode(text: query, asColumnVector: true) else { return [] } let product = docsTensor.matmul(queryTensor) let calcScores = await product.shapedArray(of: Float.self).scalars let arr = Array(zip(tensors.map(\.key), calcScores)) // ྨࣅ౓Ͱιʔτ্͠ҐN݅Λநग़ let sorted = arr.sorted { $0.1 > $1.1 }.prefix(topCount) // ্ҐUUID͝ͱʹରԠ͢ΔυΩϡϝϯτΛ contents ͔Β୳ࡧ return ... }