Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Speech Frameworkを使った音声認識の基本
Search
Masashi
January 29, 2025
Programming
0
49
Speech Frameworkを使った音声認識の基本
Mobile勉強会 ウォンテッドリー × チームラボ × Sansan #18 〜新技術の導入〜
https://sansan.connpass.com/event/338706/
Masashi
January 29, 2025
Tweet
Share
More Decks by Masashi
See All by Masashi
Eight iOSを支えるアーキテクチャ
kawabe
1
610
これだけは伝えたい設計の技術
kawabe
0
1.3k
EightのUI Component化の取り組み
kawabe
0
130
Other Decks in Programming
See All in Programming
階層化自動テストで開発に機動力を
ickx
1
470
GUI操作LLMの最新動向: UI-TARSと関連論文紹介
kfujikawa
0
360
[DevinMeetupTokyo2025] コード書かせないDevinの使い方
takumiyoshikawa
2
250
202507_ADKで始めるエージェント開発の基本 〜デモを通じて紹介〜(奥田りさ)The Basics of Agent Development with ADK — A Demo-Focused Introduction
risatube
PRO
6
1.4k
リッチエディターを安全に開発・運用するために
unachang113
1
350
副作用と戦う PHP リファクタリング ─ ドメインイベントでビジネスロジックを解きほぐす
kajitack
3
520
QA x AIエコシステム段階構築作戦
osu
0
240
可変性を制する設計: 構造と振る舞いから考える概念モデリングとその実装
a_suenami
10
1.5k
Claude Code で Astro blog を Pages から Workers へ移行してみた
codehex
0
170
JetBrainsのAI機能の紹介 #jjug
yusuke
0
180
リバースエンジニアリング新時代へ! GhidraとClaude DesktopをMCPで繋ぐ/findy202507
tkmru
7
1.7k
AI Ramen Fight
yusukebe
0
120
Featured
See All Featured
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.2k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
26k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
1k
[RailsConf 2023] Rails as a piece of cake
palkan
56
5.7k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
283
13k
Making the Leap to Tech Lead
cromwellryan
134
9.5k
Documentation Writing (for coders)
carmenintech
73
5k
A better future with KSS
kneath
238
17k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Navigating Team Friction
lara
188
15k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Transcript
Speech FrameworkΛͬͨ Իೝࣝͷجຊ
Տล խ࢙ɹMasashi Kawabe NOT A HOTEL גࣜձࣾ Smart Home νʔϜ
ιϑτΣΞΤϯδχΞ
ࣨʹεΠονϦϞίϯͳ͘ɺͯ͢ Home Controller͔Βૢ࡞ɻ ੈքதɺͲ͜ͷNOT A HOTELʹߦͬͯɺ ·ΔͰࣗͷΑ͏ʹ໎͏͜ͱͳͯ͘͢ͷػ ثͷૢ࡞͕Ͱ͖·͢ɻ Home Controller
Speech Framework • iOS ඪ४ͷԻೝࣝ Framework • ԻࡁΈͷԻϑΝΠϧͷೝࣝ, ϚΠΫೖྗΛར༻ͨ͠ϦΞϧλΠϜͳ Իͷೝ͕ࣝՄೳ
• ຊޠαϙʔτ • iOS 10.0 Ҏ߱Ͱར༻Մೳ
Speech Framework ΛͬͨԻೝࣝͷ࠷খ࣮
Իೝࣝͷதؒ݁Ռͷऔಘ • SFSpeechRecognitionRequest ͷ shouldReportPartialResults ϓϩύςΟͰɺԻೝࣝͷதؒ݁ՌΛऔಘ͢Δɾऔಘ͠ͳ͍ͷ ੍ޚ͕Մೳ • σϑΥϧτ true
( தؒ݁ՌΛऔಘ͢Δ ) • a • Իೝࣝͷ࠷ऴతͳ݁Ռ͚͕ͩඞཁͳ߹ɺ shouldReportPartialResults Λfalse ʹઃఆ͢Δ
Իೝࣝͷதؒ݁Ռऔಘͷ༗ແʹΑΔڍಈͷҧ͍
ΦϯσόΠεͰͷԻೝࣝ • ϓϥΠόγʔΛߟྀͯ͠ɺΦϯσόΠεͰԻೝࣝΛͤ͞Δ͜ͱ͕Ͱ͖Δ • ωοτϫʔΫʹܨ͕ͣͱԻೝ͕ࣝՄೳ • ͨͩ͠ɺServer ϕʔεͷԻೝࣝͱൺֱͯ͠ਫ਼͕ߴ͘ͳ͍ • Server
ϕʔεͷԻೝࣝɺԻೝࣝͷ࠷େ࣌ؒͷ੍ݶ, Ұ͋ͨΓͷճ੍ݶ͕͋Δͱ͍͏σϝϦοτ͕͋Δ • SFSpeechRecognitionRequest ͷ requiresOnDeviceRecognition ϓϩύςΟͰɺ༗ޮԽͰ͖Δ • a
ݴޠϞσϧͷΧελϚΠζ • ԻೝࣝͷݴޠϞσϧΛΧελϚΠζ͢Δ͜ͱͰɺಛఆͷϢʔεέʔε ͚ʹೝࣝਫ਼Λ্͛Δ͜ͱ͕Ͱ͖Δ • iOS 17.0 Ҏ߱Ͱར༻Մೳ • ΧελϚΠζͨ͠ݴޠϞσϧΛར༻͢ΔʹɺΦϯσόΠεͰԻೝࣝ
ͤ͞Δඞཁ͕͋Δ • a
ΧελϚΠζͨ͠ݴޠϞσϧͷ࡞
ΧελϚΠζͨ͠ݴޠϞσϧͷ࡞ • PhraseCount ΦϒδΣΫτΛར༻͢Δ͜ͱͰɺਖ਼֬ͳϑϨʔζΛ ΧελϜϞσϧʹొ͢Δ͜ͱ͕Ͱ͖Δ • ಛఆͷϑϨʔζΛೝࣝͤ͘͢͢͞Δ •
ΧελϚΠζͨ͠ݴޠϞσϧͷ࡞ • ΞϓϦͰઐ༻ޠͳͲҰൠతͰͳ͍୯ޠΛ༻͢Δ߹ɺ༻ޠͷεϖϧ ͱൃԻͷϖΞΛఆٛ͠ɺΧελϜϞσϧʹొ͢Δ͜ͱ͕Ͱ͖Δ • ൃԻ X-SAMPA ܗࣜ •
ΧελϚΠζͨ͠ݴޠϞσϧͷར༻ • ࡞ͨ͠ΧελϜݴޠϞσϧ SFSpeechRecognitionRequest ͷ customizedLanguageModel ϓϩύςΟʹઃఆ͢Δ͜ͱͰར༻Մೳ •
ࢀߟ • Advances in Speech Recognition • https://developer.apple.com/videos/play/wwdc2019/256 • Customize
on-device speech recognition • https://developer.apple.com/videos/play/ wwdc2023/10101 • Recognizing speech in live audio • https://developer.apple.com/documentation/speech/ recognizing-speech-in-live-audio