Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ローカルLLM on iOS の現状まとめ

ローカルLLM on iOS の現状まとめ

「Mobile勉強会 Wantedly × チームラボ × Sansan #13」での発表資料です。

発表動画:
https://youtu.be/yyYVFpxbO1Q

shu223

March 05, 2024
Tweet

More Decks by shu223

Other Decks in Technology

Transcript

  1. ࣗݾ঺հ • అ मҰ • @shu223 (GitHub, Qiita, Zenn, note,

    𝕏, YouTube, Podcast, etc...) • ॻ੶ʢ঎ۀग़൛4࡭ɺݸਓग़൛ଟ਺ @BOOTHʣ:
  2. ༻ޠઆ໌ • ʲLLMʳ: Large Language Models / େن໛ݴޠϞσϧ • ʲϩʔΧϧLLMʳ:

    ϩʔΧϧ؀ڥͰಈ͘LLM • ʲΦϯσόΠεॲཧʳ: ॲཧ͕σόΠε಺෦Ͱ׬͍݁ͯ͠Δ ʢΫϥ΢υ΍֎෦αʔόʔʹґଘ͠ͳ͍ʣ
  3. ϩʔΧϧLLM on iOS • ΦϯσόΠεͰॲཧͯ͠·͢ • API͸ୟ͍͍ͯ·ͤΜ • iPhone 15

    Pro࢖༻ • ഒ଎࠶ੜ͍ͯ͠·ͤΜ • 8.6 tokens/sec
  4. llama.cpp ͱ Apple Silicon • Apple Silicon޲͚ʹ͸ARM NEONɺAccelerateɺMetalϑ ϨʔϜϫʔΫͰ࠷దԽ •

    ʮϩʔΧϧLLMΛಈ͔ͤΔmacOSΞϓϦʯͷଟ͕͘ llama.cppΛ಺෦Ͱར༻ • Ollama, LM Studio, LLMFarm, etc... • GGUFϑΥʔϚοτͷϞσϧΛ࢖༻͢Δ
  5. Core MLͱ͸ • ػցֶशϞσϧΛiOS, macOS, etc. ʹ૊ΈࠐΉͨΊͷApple ੡ͷϑϨʔϜϫʔΫ, ϞσϧϑΥʔϚοτ •

    CPUɾGPUɾNeural EngineΛར༻͠ɺϝϞϦ઎༗ྔͱిྗ ফඅྔΛ࠷খݶʹ཈͑ͭͭύϑΥʔϚϯεΛ࠷େݶʹߴΊΔ Α͏ʹઃܭ͞Ε͍ͯΔ
  6. !

  7. Core ML vs llama.cpp • Neural Engine Λ׆͔ͤΔ෼ɺCore ML͕༗རʁ •

    ֤छϩʔΧϧLLMΛʮ͙͢ʹࢼͤΔʯ఺Ͱ͸ѹ౗తʹ llama.cpp
  8. Ϟσϧͷ୳͠ํ • Hugging Face HubͰ୳͢ʢGGUF / Core MLʣ • LLMFarm

    ͷ͜͜ ɾɾɾಈ࡞ݕূࡁΈϞσϧ͕αΠζͱڞʹ ϦετΞοϓ͞Ε͍ͯΔ • llama.cpp ͷ README ͷ "Supported models" ΍ɺ͜ ͜ɾɾɾ֤छϞσϧͷiPhoneͰͷϕϯνϚʔΫ
  9. ࢼͯ͠ΈͨϞσϧͷྫ • Mistral 7B v0.1 ʢൺֱతখ͞ͳαΠζͰ༏लʣ • Q3_K_S (3.16GB) •

    Q4_K_S (4.14GB) • Calm 2 7B Chat ʢ೔ຊޠLLMʣ • Q3_K_S (3.47GB) • Q4_K_S (3.12GB) • Q4_K_M (3.47GB)ɾɾɾiPhone 15 ProͰΫϥογϡ
  10. Mistral 7B v0.1 • Q4_K_S • 4-bitྔࢠԽ • 4.14GB •

    ϩʔσΟϯά ໿15ඵ • ςΩετੜ੒଎౓ 8.66 t/s
  11. Calm2 7B Chat • Q3_K_S • 3-bitྔࢠԽ • 3.12GB •

    ϩʔσΟϯά ໿25ඵ • ςΩετੜ੒଎౓ 1.89 t/s
  12. ݱঢ়Ͱ͸ϓϩμΫτͰͷ࣮༻͸ݫͦ͠͏ • αΠζͷ໰୊ɿ 3-bit or 4-bit ྔࢠԽͨ͠ϞσϧͰ΋3GBʙ • ΞϓϦʹ૊ΈࠐΉΘ͚ʹ͸͍͔ͳ͍ʗϢʔβʔʹμ΢ϯϩ ʔυͤ͞ΔΘ͚ʹ΋͍͔ͳ͍

    • ॲཧ଎౓ͷ໰୊ɿ Ϟσϧͷϩʔυʹ͕͔͔࣌ؒΔʗਪ࿦଎౓ ΋·ͩݫ͍͠ → APIΛୟ͍ͨํ͕଎͍ • ࢖༻ϝϞϦྔͷ໰୊ɿ ਺GBඞཁ
  13. ϫΫϫΫ͔͠ͳ͍ • Ϟσϧੑೳɿ ΑΓগͳ͍ύϥϝʔλ਺ͰߴੑೳͳϞσϧ͕ ೔ʑੜ·Ε͍ͯΔ • ྔࢠԽख๏ɿ ೥ʑਐԽɺBitNetͳΔ΋ͷ΋ొ৔ • σόΠεੑೳɿ

    ϝϞϦ༰ྔ΋GPUɾNeural Engineͷੑೳ΋ ೥ʑਐԽ • ม׵ࡁΈϞσϧͷଟ༷ੑɾɾɾCore MLม׵ࡁΈͷLLM΋ॆ ࣮ͯ͘͠Δʢ͸ͣʣ
  14. Wrap up • iOSͰϩʔΧϧLLMΛಈ͔͢खஈ͸େ͖͘2௨Γ • llama.cppɿ ྔࢠԽࡁΈɾม׵ࡁΈϞσϧͷબ୒ࢶ͕๛෋ • Core MLɿ

    Neural Engine࢖͏ • iOSσόΠεͰಈ͔͢ʹ͸ʮݱঢ়Ͱ͸ʯσΧ͗͢Δ͠ॏ͗͢ Δ • ͕ɺେ͍ʹر๬͸͋Δʂ