Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ローカルLLM on iOS の現状まとめ
Search
shu223
March 05, 2024
Technology
0
5
ローカルLLM on iOS の現状まとめ
「Mobile勉強会 Wantedly × チームラボ × Sansan #13」での発表資料です。
発表動画:
https://youtu.be/yyYVFpxbO1Q
shu223
March 05, 2024
Tweet
Share
More Decks by shu223
See All by shu223
WhisperKit がだいぶ良いので紹介する
shu223
0
3
Animating Shapes with Simple Equations
shu223
0
3
ARKit in visionOS
shu223
0
2
Core ML版Stable DiffusionをiOSで快適に動かす
shu223
0
4
iOSではじめるフォトグラメトリ #iOSDC
shu223
0
4
MLOps for Core ML #iOSDC 2022
shu223
0
9
slidify-sample
shu223
1
1k
機械学習のブルーオーシャン Core ML by 堤 修一 #iOSDC Japan 2020
shu223
12
5.4k
エンジニアのための発信講座 #4 登壇しよう
shu223
0
100
Other Decks in Technology
See All in Technology
Building a RAG-poweredAI chat appwith Python and VS Code
pamelafox
0
160
JAWS-UG Bedrock Claude Night
yamahiro
3
720
実例で紹介するRAG導入時の知見と精度向上の勘所
yamahiro
5
1.7k
How to Lead? Testimonial of a Lead Android Engineer
oleur
1
110
DMM.com アルファ室採用案内資料
hsugita
1
230
ゼロから始めるVue.jsコミュニティ貢献 / first-vuejs-community-contribution-link-and-motivation
lmi
1
150
Cracking the KubeCon CfP
inductor
2
270
GrafanaMeetup_AmazonManagedGrafanaのアクセス制御機能とマルチテナント環境下でのアクセス制御について
daitak
0
400
Google Cloud Next '24でブログを10本書いた方法と勉強会を沸かせた方法
yasumuusan
0
330
Gitlab本から学んだこと - そーだいなるプレイバック / gitlab-book
soudai
7
1.3k
MLOpsの「壁」を乗り越える、LINEヤフーの Data Quality as Code
lycorptech_jp
PRO
8
630
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
5
37k
Featured
See All Featured
Gamification - CAS2011
davidbonilla
77
4.6k
Building Better People: How to give real-time feedback that sticks.
wjessup
356
18k
Designing with Data
zakiwarfel
96
4.8k
Building Flexible Design Systems
yeseniaperezcruz
320
37k
Raft: Consensus for Rubyists
vanstee
133
6.3k
Java REST API Framework Comparison - PWX 2021
mraible
PRO
18
6.9k
How to name files
jennybc
65
93k
Typedesign – Prime Four
hannesfritz
36
2.1k
A designer walks into a library…
pauljervisheath
201
23k
Rebuilding a faster, lazier Slack
samanthasiow
74
8.2k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
221
21k
Transcript
ϩʔΧϧLLM on iOS ͷݱঢ়·ͱΊ అ मҰʢ@shu223ʣ
ࣗݾհ • అ मҰ • @shu223 (GitHub, Qiita, Zenn, note,
𝕏, YouTube, Podcast, etc...) • ॻ੶ʢۀग़൛4ɺݸਓग़൛ଟ @BOOTHʣ:
ຊͷΞδΣϯμ • iOSΦϯσόΠεͰLLMΛಈ͔͢ํ๏ • ࣗͷΞϓϦʹΈࠐΉ • ݱঢ়ͱࠓޙͷల
༻ޠઆ໌ • ʲLLMʳ: Large Language Models / େنݴޠϞσϧ • ʲϩʔΧϧLLMʳ:
ϩʔΧϧڥͰಈ͘LLM • ʲΦϯσόΠεॲཧʳ: ॲཧ͕σόΠε෦Ͱ͍݁ͯ͠Δ ʢΫϥυ֎෦αʔόʔʹґଘ͠ͳ͍ʣ
ϩʔΧϧLLM on iOS • ΦϯσόΠεͰॲཧͯ͠·͢ • APIୟ͍͍ͯ·ͤΜ • iPhone 15
Pro༻ • ഒ࠶ੜ͍ͯ͠·ͤΜ • 8.6 tokens/sec
ϩʔΧϧLLMͷϝϦοτ • ΦϑϥΠϯͰಈ͘ • ϓϥΠόγʔ͕कΒΕΔʢσʔλ͕Ͳ͜ʹΞοϓ͞Εͳ ͍ʣ • ͲΕ͚ͩͬͯແྉ ϞόΠϧελϯυΞϩʔϯͰ࠷ઌͷػೳ͕ಈ࡞͢Δ͜ͱ ʹৗʹϩϚϯ͕͋Δ
iOSΦϯσόΠεͰLLM Λಈ͔͢ํ๏
iOSͰϩʔΧϧLLMΛಈ͔͢ํ๏ େ͖͚ͯ͘2ͭ • llama.cpp • Core ML
iOSͰϩʔΧϧLLMΛಈ͔͢ํ๏ େ͖͚ͯ͘2ͭ • llama.cpp • Core ML
llama.cpp • LLM͕ߴʹಈ͘ϥϯλΠϜ • C/C++ • Georgi Gerganov (GG) ͞Μ͕։ൃ
• GGML → GGUFϑΥʔϚοτ
llama.cpp ͱ Apple Silicon • Apple Silicon͚ʹARM NEONɺAccelerateɺMetalϑ ϨʔϜϫʔΫͰ࠷దԽ •
ʮϩʔΧϧLLMΛಈ͔ͤΔmacOSΞϓϦʯͷଟ͕͘ llama.cppΛ෦Ͱར༻ • Ollama, LM Studio, LLMFarm, etc... • GGUFϑΥʔϚοτͷϞσϧΛ༻͢Δ
llama.cpp ͱ iOS • "Apple Slicon͚࠷దԽ" MγϦʔζ͚ͩͰͳ͘ɺiPhone ͷAγϦʔζର
iOSͰϩʔΧϧLLMΛಈ͔͢ํ๏ େ͖͚ͯ͘2ͭ • llama.cpp • Core ML
Core MLͱ • ػցֶशϞσϧΛiOS, macOS, etc. ʹΈࠐΉͨΊͷApple ͷϑϨʔϜϫʔΫ, ϞσϧϑΥʔϚοτ •
CPUɾGPUɾNeural EngineΛར༻͠ɺϝϞϦ༗ྔͱిྗ ফඅྔΛ࠷খݶʹ͑ͭͭύϑΥʔϚϯεΛ࠷େݶʹߴΊΔ Α͏ʹઃܭ͞Ε͍ͯΔ
Neural EngineAPI͕ͳ͍ • Core MLΛར༻ͨ͠߹ͷΈNeural EngineΛར༻Ͱ͖Δ • → Apple SilliconʢiPhoneͷAγϦʔζؚΉʣͷੑೳΛ࠷
׆͔ͤΔͷCore MLʂ
Core ML vs llama.cpp • Neural Engine Λ׆͔ͤΔɺCore ML͕༗རʁ
!
LLMϞσϧΛCore MLʹม͢Δํ๏ • coremltoolsΛ͏ • ͍͠ʢྫɿcoremltoolsΛ༻͍ͨCore MLϞσϧͷม - Sansan
Tech Blog ʣ • Hugging Face͕ެ։͍ͯ͠Δมπʔϧ exporters Λ͏
! exporters • TransformersϞσϧΛCore MLʹม͢Δπʔϧ • coremltoolsΛϥοϓͨ͠ͷͰ͋Δ͕ɺมʹ͏ ৭ʑͳΛπʔϧଆͰٵऩͯ͘͠Ε͍ͯΔ • ཁ͜ͷπʔϧΛ͑coremltoolsΛͦͷ··͏ΑΓ
؆୯ʹTransformersϞσϧΛCore MLϞσϧʹมͰ͖ Δ
ʢิࢿྉʣ ! exporters ͷ͍ํ • هࣄɿ TransformersϞσϧΛCore MLʹม͢Δπʔϧ exporters Λࢼ͢
• LLMϞσϧΛCore MLʹม͢Δ͜ͱʹޭ
ʢ͘ͳ͖ͬͯͨͷͰதུʣ ͜ͷྲྀΕͰݴ͍͍ͨ͜ͱɿ Core MLϞσϧͷมπʔϧ͋ Δ͕ɺมࡁΈϞσϧ΄ͱΜͲެ։͞Ε͓ͯΒͣɺྔࢠԽ ࣗͰ৭ʑ͕ΜΔඞཁ͕͋Δ
llama.cpp ͚ͷϞσϧͲ ͏͔ʁ ΄ͱΜͲͷϩʔΧϧLLM͕৭ʑͳύλʔ ϯͰྔࢠԽ͞ΕGGUFϑΥʔϚοτͰެ ։͞Ε͍ͯΔʢTheBloke ͕༗໊ʣ
Core ML vs llama.cpp • Neural Engine Λ׆͔ͤΔɺCore ML͕༗རʁ •
֤छϩʔΧϧLLMΛʮ͙͢ʹࢼͤΔʯͰѹతʹ llama.cpp
͜͜·Ͱͷ·ͱΊ • iOSͰϩʔΧϧLLMΛಈ͔͢खஈͱͯ͠llama.cppͱCore ML͕͋Δ • ͲͪΒApple Siliconʹ࠷దԽ͞Ε͍ͯΔ͕ɺNeural EngineΛ׆͔ͤΔͷCore MLͷΈ •
llama.cppྔࢠԽࡁΈɾมࡁΈͷϞσϧͷબࢶ͕๛ ʹ͋Δ
ࣗͷΞϓϦʹΈࠐΉ
llama.cpp • खܰʹΞϓϦʹΈࠐΊΔΑ͏ʹຊՈϦϙδτϦʹSwift Package͕༻ҙ͞Ε͍ͯΔ • ͦͷSwift Packageͷ͍ํΛࣔ͢αϯϓϧಉϦϙδτϦʹ ༻ҙ͞Ε͍ͯΔ • examples/llama.swiftui
Core ML • ! exporters Ͱมͨ͠Core MLϞσϧΛΞϓϦͰಈ͔ͨ͢ ΊͷϥούʔϥΠϒϥϦͱͯ͠ swift-transformers ͱ͍͏
Swift Package͕༻ҙ͞Ε͍ͯΔ • ͦͷαϯϓϧΞϓϦެ։͞Ε͍ͯΔ
iOSΦϯσόΠεͰಈ͘ LLMϞσϧͷݱঢ়
Ϟσϧͷ୳͠ํ • Hugging Face HubͰ୳͢ʢGGUF / Core MLʣ • LLMFarm
ͷ͜͜ ɾɾɾಈ࡞ݕূࡁΈϞσϧ͕αΠζͱڞʹ ϦετΞοϓ͞Ε͍ͯΔ • llama.cpp ͷ README ͷ "Supported models" ɺ͜ ͜ɾɾɾ֤छϞσϧͷiPhoneͰͷϕϯνϚʔΫ
ࢼͯ͠ΈͨϞσϧͷྫ • Mistral 7B v0.1 ʢൺֱతখ͞ͳαΠζͰ༏लʣ • Q3_K_S (3.16GB) •
Q4_K_S (4.14GB) • Calm 2 7B Chat ʢຊޠLLMʣ • Q3_K_S (3.47GB) • Q4_K_S (3.12GB) • Q4_K_M (3.47GB)ɾɾɾiPhone 15 ProͰΫϥογϡ
σϞ
Mistral 7B v0.1 • Q4_K_S • 4-bitྔࢠԽ • 4.14GB •
ϩʔσΟϯά 15ඵ • ςΩετੜ 8.66 t/s
Calm2 7B Chat • Q3_K_S • 3-bitྔࢠԽ • 3.12GB •
ϩʔσΟϯά 25ඵ • ςΩετੜ 1.89 t/s
ݱঢ়ͷॴײ ʢ˞1,2ճࢼ͚ͨͩ͠ͷॴײͰ͢ʣ • ճ༰ ! • ਪ !
ͱ͍͑
ݱঢ়ͰϓϩμΫτͰͷ࣮༻ݫͦ͠͏ • αΠζͷɿ 3-bit or 4-bit ྔࢠԽͨ͠ϞσϧͰ3GBʙ • ΞϓϦʹΈࠐΉΘ͚ʹ͍͔ͳ͍ʗϢʔβʔʹμϯϩ ʔυͤ͞ΔΘ͚ʹ͍͔ͳ͍
• ॲཧͷɿ Ϟσϧͷϩʔυʹ͕͔͔࣌ؒΔʗਪ ·ͩݫ͍͠ → APIΛୟ͍ͨํ͕͍ • ༻ϝϞϦྔͷɿ GBඞཁ
ࠓޙͷల
ϫΫϫΫ͔͠ͳ͍ • Ϟσϧੑೳɿ ΑΓগͳ͍ύϥϝʔλͰߴੑೳͳϞσϧ͕ ʑੜ·Ε͍ͯΔ • ྔࢠԽख๏ɿ ʑਐԽɺBitNetͳΔͷొ • σόΠεੑೳɿ
ϝϞϦ༰ྔGPUɾNeural Engineͷੑೳ ʑਐԽ • มࡁΈϞσϧͷଟ༷ੑɾɾɾCore MLมࡁΈͷLLMॆ ࣮ͯ͘͠Δʢͣʣ
ΦϯσόΠεͰLLM͕αΫαΫಈ͍ۙ͘ʂ
Wrap up • iOSͰϩʔΧϧLLMΛಈ͔͢खஈେ͖͘2௨Γ • llama.cppɿ ྔࢠԽࡁΈɾมࡁΈϞσϧͷબࢶ͕๛ • Core MLɿ
Neural Engine͏ • iOSσόΠεͰಈ͔͢ʹʮݱঢ়ͰʯσΧ͗͢Δ͠ॏ͗͢ Δ • ͕ɺେ͍ʹر͋Δʂ
͝ਗ਼ௌ͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ