Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
iOSDC2023:聴いて話すiOS 現実世界の「音」との連携
Search
たまねぎ
September 02, 2023
Programming
410
1
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
iOSDC2023:聴いて話すiOS 現実世界の「音」との連携
たまねぎ
September 02, 2023
More Decks by たまねぎ
See All by たまねぎ
AIのAIによるAIのための出力評価と改善
chocoyama
3
960
[FlutterKaigi2024] Effective Form 〜Flutterによる複雑なフォーム開発の実践〜
chocoyama
1
13k
ハードウェア対応のリアル.pdf
chocoyama
0
130
20分でわかる!速習resultBuilder(iOSDC 2022)
chocoyama
7
3.9k
SwiftUIっぽくした話
chocoyama
1
750
SwiftUIとGraphQLでプロダクトの継続的な破壊に立ち向かう
chocoyama
6
2.8k
Other Decks in Programming
See All in Programming
C# and C++ Interoperability - cho-dotnetnew
harukasao
0
160
過去最大のMCPアップデート! 2026-07-28 RC版の謎に迫る
licux
6
360
Oxcを導入して開発体験が向上した話
yug1224
4
320
コンテキストの使い捨てをやめる — ビジネスルール駆動開発と miko —
ioki
0
210
その問い、本当に正しいですか?AI時代のエンジニアに必要な哲学と認知科学 / ai-philosophy-cognitive-science
minodriven
11
5.8k
LLM本来の能力を解き放つサンドボックス技術とAI民主化への適用
yukukotani
3
4.2k
DynamoDBには集計系のクエリがないけどなんとかしたい
musan
1
180
セキュリティの専門家じゃなくてもできる。「セキュリティ意識」をアップデートして サプライチェーン攻撃への耐性を高めよう。
tk3fftk
5
830
TypeScript+Orvalで実現する型安全かつ堅牢でスケーラブルなマルチチャネル通知基盤 / TSKaigi Night talks ~after conference~
d0riven
0
340
「AIで開発し、AIを届ける」をEvalでつなぐ 〜AIネイティブに始めるプロダクト開発の実践〜 / Connecting "Develop with AI, deliver AI" with Eval
rkaga
4
5.2k
Developing with AI Agents — Codex, Claude Code & Cowork Practical Guide
x5gtrn
PRO
0
1.3k
Skillsは効率化、Agentsは"自分の拡張"——Builder時代のエージェント編成(CC Night 2026)
wemra
1
140
Featured
See All Featured
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
1
260
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.4k
Ethics towards AI in product and experience design
skipperchong
2
310
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
1.1k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
490
The World Runs on Bad Software
bkeepers
PRO
72
12k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
Leo the Paperboy
mayatellez
7
1.8k
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.3k
Prompt Engineering for Job Search
mfonobong
0
350
For a Future-Friendly Web
brad_frost
183
10k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Transcript
ௌ͍ͯ͢iOS ݱ࣮ੈքͷʮԻʯͱͷ࿈ܞ iOSDC Japan 2023 ͨ·Ͷ͗ (@_chocoyama)
ࣗݾհ • LayerX ← STORES ← Ϡϑʔ • iOSͱFlutterΛΑ͘৮Δ •
స৬ͯ͠1ϲ݄ͷϐνϐν ͨ·Ͷ͗
ࣗݾհ ࣮3࿈ଓొஃ
ձࣾհ
ԻΛ׆༻ͨ͠ػೳ࣮ɺ ؆୯͔ͭڧྗʹͳ͍ͬͯΔ ͱ͍͏͜ͱΛ͠·͢
ΞδΣϯμ • දʹؔ͢ΔiOSඪ४ػೳͷ ಛͱ͍ํ • ࣮ͨ͠σϞͷհ ͢͜ͱ ͞ͳ͍͜ͱ • ཁૉٕज़ͷৄࡉͳղઆ
• Έͷਂ͍ཧղ
ΞδΣϯμ 1. ݱ࣮ੈքͷʮԻʯΛ׆༻ͨ͠ػೳ 2. ԻೝࣝͱτϥϯεΫϦϓτ 3. ൃʹΑΔϑΟʔυόοΫ 4. ԻϚονϯά 5.
Իྨ 6. ·ͱΊ
ݱ࣮ੈքͷ ʮԻʯΛ׆༻ͨ͠ػೳ
• Siri • FaceTime • Shazam • ϊΠζݕ • Voice
Control • VoiceOver • Իೖྗ • ԻಡΈ্͛ • ԻϝϞ ࣮ଟ༷ͳඪ४ػೳ • AVFoundation • CallKit • Core Audio • ShazamKit • SiriKit • SoundAnalysis • Speech ਐԽ͢ΔFramework
• ଟ͘ͷAppͰ׆༻͍ͯ͠ͳ͍ • ԻΛ׆༻ͨ͠ମݧઃܭك • ΩϟονΞοϓͷϞνϕʔγϣϯ্͕ΓͮΒ͍ ࣮ࡍͷͱ͜Ζ… ʮ࣮ݱͰ͖Δ͜ͱʯʮ׆༻Ͱ͖ͦ͏ͳ͜ͱʯΛΠϝʔδ͘͢͢͠Δ
ԻೝࣝͱτϥϯεΫϦϓτ
• ϚΠΫԻϑΝΠϧʹΑͬ ͯԻσʔλΛऔΓࠐΈ • γεςϜ͕ԻΛղੳ͠ɺς Ωετσʔλʹม • AppଆͰ݁ՌͷΛ׆༻ Իೝࣝ
• ԻೝࣝͷͨΊͷAPI • ԻσʔλΛςΩετʹม ͨ͠Γɺಛஅʹ׆༻Ͱ ͖Δ • iOS17Ҏ߱ɺݴޠϞσϧͷ ΧελϚΠζΛαϙʔτ SFSpeechRecognizer
ࣄલ४උ̍ɿAudioEngineͷϔϧύʔ struct AudioEngine { private let audioEngine = AVAudioEngine() func
start( bufferSize: AVAudioFrameCount, handler: @escaping (AVAudioPCMBuffer, AVAudioTime) -> Void ) throws { // Իೝࣝͷ߹ͷΦʔσΟΦઃఆ let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(.record, mode: .measurement, options: []) try audioSession.setActive(true) // Իೖྗͷ४උ audioEngine.inputNode.installTap( onBus: 0, bufferSize: bufferSize, format: audioEngine.inputNode.outputFormat(forBus: 0), block: handler ) audioEngine.prepare() try audioEngine.start() } }
Իೝࣝؔ࿈ͷΠϯελϯεͱϦΫΤετΛ࡞ class Transcriptor: ObservableObject { // ԻೝࣝΛ࣮ߦ͢ΔΠϯελϯε private let speechRecognizer
= SFSpeechRecognizer(locale: Locale(identifier: “ja-JP"))! // ϦΞϧλΠϜೖྗͰͳ͘ϑΝΠϧೖྗͷ߹ɺSFSpeechURLRecognitionRequestΛ͏ private let request = SFSpeechAudioBufferRecognitionRequest() private let audioEngine = AudioEngine() // ೝࣝ݁ՌΛར༻ଆʹ௨ @Published private(set) var bestTranscription: SFTranscription? init() { // ϦΫΤετͷηοτΞοϓΛߦ͏ if speechRecognizer.supportsOnDeviceRecognition { // ΦϯσόΠε࣮ߦɿར༻੍ݶ͕ͳ͘ɺϓϥΠόγʔอͨΕΔ // αʔόʔ࣮ߦɿར༻੍ݶ͕͋ΓɺσόΠε֎ʹσʔλ͕ૹ৴͞ΕΔ͕ɺਫ਼͕ߴ͍ request.requiresOnDeviceRecognition = true } request.shouldReportPartialResults = true } }
ϨίʔσΟϯάͷ։࢝ func startRecording() async throws { // ར༻Մ൱ͷνΣοΫ guard case
.authorized = await withCheckedContinuation({ SFSpeechRecognizer.requestAuthorization($0.resume(returning:)) }), speechRecognizer.isAvailable else { throw TranscriptorError.unavailable } // ΦʔσΟΦΤϯδϯͷىಈ try audioEngine.start(bufferSize: 2048) { buffer, _ in // όοϑΝαΠζ͝ͱͷσʔλΛೝࣝϦΫΤετʹՃ͍ͯ͘͠ request.append(buffer) } // ೝࣝॲཧͷ։࢝ speechRecognizer.recognitionTask(with: request) { result, _ in DispatchQueue.main.async { self.bestTranscription = result?.bestTranscription } } }
DEMOɿೝࣝͨ͠ԻΛදࣔ struct ContentView: View { @StateObject private var transcriptor =
Transcriptor() private var recognizedText: String? { // Իೝࣝ݁Ռ͔ΒϑΥʔϚοτࡁΈจࣈྻΛऔಘ transcriptor.bestTranscription?.formattedString } var body: some View { VStack { if let recognizedText { Text(recognizedText) } RecordingButton { try await transcriptor.startRecording() } } } }
• LayerXϞόΠϧνʔϜ্ཱ͔ͪ͛ͨΓ • iOSϝΠϯͷΤϯδχΞ͕ࣗ1ਓ • ࣸਅΛࡱΓʹདྷͯ͘Ε͍ͯΔਓ͍ͳ͍ • ʮ͍ɺνʔζʂʯͰγϟολʔΛΔࣗࡱΓΞϓϦΛ࡞͖ͬͯͨ มΘΔ͕…
࠾༻ใ ࠾༻ϙʔλϧɿhttps://jobs.layerx.co.jp/
ൃʹΑΔϑΟʔυόοΫ
• จষΛԻʹม͢Δٕज़ • ΞΫηγϏϦςΟπʔϧ ϢʔβʔͷϑΟʔυόοΫ IFͱͯ͠׆༻͞ΕΔ ԻൃʢText to Speechʣ
• iOSͷςΩετԻมAPI • ϓϨʔϯςΩετSSMLܗࣜͷσʔλΛೖྗʹͱΔ ※ Speech Synthesis Markup LanguageʢԻ߹ϚʔΫΞοϓݴޠʣ
• ൃݴޠεϐʔυͳͲΛίϯτϩʔϧՄೳ • iOS17͔ΒύʔιφϧϘΠεʹରԠʢӳޠͷΈʣ AVSpeechSynthesizer
import AVFoundation // PlaneText͔ΒUtteranceͷ࡞ let utterance = AVSpeechUtterance(string: text) utterance.prefersAssistiveTechnologySettings
= true // ΞγετઃఆҾܧ utterance.rate = 0.5 // εϐʔυ (0 ~ 1) utterance.pitchMultiplier = 1 // ϐον (0.5 ~ 2) utterance.volume = 1 // Իྔ (0 ~ 1) // SSML͔ΒUtteranceͷ࡞ let ssml = """ <speak> <prosody rate="fast" pitch="+2st" volume="loud"> ͜Μʹͪɺͨ·Ͷ͗Ͱ͢ʂ </prosody> </speak> """ let utterance = AVSpeechUtterance(ssmlRepresentation: ssml) // Voiceͷઃఆ utterance.voice = .init(language: "ja-JP") utterance.voice = .init(identifier: AVSpeechSynthesisVoiceIdentifierAlex) utterance.voice = AVSpeechSynthesisVoice.speechVoices().randomElement() • AVSpeechUtteranceʹ ൃσʔλΛηοτ • SSMLΛΘͳͯ͘ ɺ֤छϓϩύςΟ ઃఆՄೳ • AVSpeechSynthesisVoi ceΛηοτͯ͠ɺϏϧ τΠϯԻͷར༻Ͱ ͖Δ AVSpeechSynthesizerͰͷൃ
// AVSpeechSynthesizerͷ࡞ let synthesizer = AVSpeechSynthesizer() // ࠶ੜ synthesizer.speak(utterance) //
Ұ࣌ఀࢭ synthesizer.pauseSpeaking(at: .immediate) synthesizer.pauseSpeaking(at: .word) // ࠶։ synthesizer.continueSpeaking() // ఀࢭ synthesizer.stopSpeaking(at: .immediate) synthesizer.stopSpeaking(at: .word) • AVSpeechSynthesizer ʹAVSpeechUtterance Λͯ͠ൃ͢Δ • ίϯτϩʔϧ༻ͷAPI Λ׆༻ͯ͠ࡉ੍͔͍ޚ Ͱ͖Δ AVSpeechSynthesizerͰͷൃ
DEMOɿೖྗͨ͠จࣈྻΛൃ class SpeechSynthesizer: ObservableObject { @Published var text = “͜Μʹͪɺͨ·Ͷ͗Ͱ͢ʂ”
@Published var selectedVoice = AVSpeechSynthesisVoice .speechVoices() .first { $0.language == "ja-JP" }! @Published var rate: Float = 0.5 @Published var pitchMultiplier: Float = 1 @Published var volume: Float = 1 private let synthesizer: AVSpeechSynthesizer = { let s = AVSpeechSynthesizer() s.usesApplicationAudioSession = false return s }() var voices: [AVSpeechSynthesisVoice] { AVSpeechSynthesisVoice.speechVoices() } func speak() { let utterance = AVSpeechUtterance(string: text) utterance.voice = selectedVoice utterance.rate = rate utterance.pitchMultiplier = pitchMultiplier utterance.volume = volume synthesizer.speak(utterance) } }
DEMOɿԻͰͷର 4'4QFFDI3FDPHOJ[FS "74QFFDI4ZOUIFTJ[FS 0QFO"*"1*
ԻϚονϯά
• ԻͱҰக͢ΔσʔλΛ୳͢ ٕज़ • ͷϚΠΫ͔ΒԻΛऔಘ • ରͷίϯςϯπࣗମͦͷ ࠶ੜҐஔΛಛఆ͢Δ ԻϚονϯά
• ԻݯΧλϩά୳ • Իڹγάωνϟ͔ΒྨࣅԻݯΛಛఆ • ΧελϜΧλϩάͷ࡞ • ಠࣗͷΦʔσΟΦDBΛ࡞͠ɺ ԻΛϚονϯά
• ϓϨΠϦετཧ • ೝָࣝͨ͠ۂͷϥΠϒϥϦಉظ ShazamKit
import ShazamKit class Matcher: ObservableObject { @Published private(set) var matchedItem:
SHMatchedMediaItem? private let audioEngine = AudioEngine() func startMatching() async throws { let session = SHSession() // ϦΞϧλΠϜͳΦʔσΟΦϚονϯάͷ४උ try audioEngine.start(bufferSize: 2048) { buffer, audioTime in session.matchStreamingBuffer(buffer, at: audioTime) } // ೝࣝͨ͠ϝσΟΞΞΠςϜΛड͚औΔ for await case .match(let match) in session.results { await MainActor.run { matchedItem = match.mediaItems.first } } } } • ʢࣄલʹʣDeveloper ϙʔλϧͰAppService ΛՃ • SHSessionΛ࡞ • ΦʔσΟΦσʔλΛ sessionʹྲྀ͠ࠐΉ • ΧλϩάσʔλͱϚο νͨ݁͠ՌΛऔಘ ָۂಛఆ
SHMatchedMediaItemͷϓϩύςΟ܈ func explore(_ mediaItem: SHMatchedMediaItem) { mediaItem.title // λΠτϧ mediaItem.subtitle
// αϒλΠτϧ mediaItem.artist // ΞʔςΟετ໊ mediaItem.artworkURL // ΞʔτϫʔΫURL mediaItem.genres // δϟϯϧͷྻ mediaItem.timeRanges // ࣌ؒൣғ mediaItem.matchOffset. // ϚονՕॴ mediaItem.predictedCurrentMatchOffset // ݱࡏͷϚονՕॴͷ༧ଌ mediaItem.webURL. // ShazamΧλϩάϖʔδͷϦϯΫ mediaItem.appleMusicID // AppleMusicID mediaItem.appleMusicURL // AppleMusicϖʔδͷϦϯΫ mediaItem.songs // MusicKitͷSongΦϒδΣΫτ // etc… }
SHMatchedMediaItemͷ׆༻ // ָۂͷίϯτϩʔϧʹɺMusikKitΛ͏ import MusicKit func play(_ mediaItem: SHMatchedMediaItem) async
throws { guard case .authorized = await MusicAuthorization.request() else { return } // SHMatchedMediaItemͷAppleMusicؔ࿈ͷϓϩύςΟΛࢀর SystemMusicPlayer.shared.queue = .init(for: mediaItem.songs) try await SystemMusicPlayer.shared.play() }
DEMOɿฉ͖औͬͨԻݯΛಛఆ struct ContentView: View { @StateObject private var matcher =
Matcher() var body: some View { VStack(spacing: 56) { if let mediaItem = matcher.matchedItem { MatchedMediaItemView(mediaItem) } else if matcher.isActive { MatchedMediaItemView.loading() } RecordingButton { try? await matcher.startMatching() } }.padding() } }
• Shazam CLIΛͬͯ࡞ΕΔ • Իݯ͔ΒSignature (.shazamsignature)Λ࡞ • ҙͷϝλσʔλ ΛؚΊͨϑΝΠϧ(.csv)Λ༻ҙ •
SignatureͱCSVΛඥ͚ͯΧλϩάʢ.shazamcatalogʣΛ࡞ • ԻϚονϯάΦϑηοτΛ༻͍ͨମݧߏஙʹ׆༻Մೳ ΧελϜΧλϩά
Իྨ
Իྨ • ԻσʔλͷύλʔϯΛࣝผ • ػցֶशϞσϧΛݩʹٕͨ͠ ज़ • ԻݯͷछྨΛಛఆͷΧςΰϦ ʹ͚Δ
• Իྨ༻ͷϑϨʔϜϫʔΫ • ΦϯσόΠεͰ࣮ߦՄೳ • iOS15Ҏ߱ϏϧτΠϯ͞Εͨ ϞσϧΛ͑Δ • औΓࠐΜͩԻͷಛύλʔϯ Λೝࣝ͠ɺ300छྨʹྨ
SoundAnalysis
SNAudioStreamAnalyzer class SoundAnalyzer: NSObject, ObservableObject { @Published private(set) var result:
SNClassificationResult? private let audioEngine = AudioEngine() func startAnalyze() throws { // SNClassifySoundRequestΛ࡞ let request = try SNClassifySoundRequest(classifierIdentifier: .version1) // ϑΝΠϧೖྗͷ߹ɺSNAudioFileAnalyzerΛ͏ let analyzer = SNAudioStreamAnalyzer(format: audioEngine.format) try analyzer.add(request, withObserver: self) try audioEngine.start(bufferSize: 2048) { buffer, time in // ԻσʔλΛྲྀ͜͠Ή analyzer.analyze(buffer, atAudioFramePosition: time.sampleTime) } } }
SNAudioStreamAnalyzer extension SoundAnalyzer: SNResultsObserving { // ೝࣝ݁Ռ͕௨͞Εͯ͘Δ func request(_ request:
SNRequest, didProduce result: SNResult) { DispatchQueue.main.async { self.result = result as? SNClassificationResult self.result?.classifications.first?.identifier // ೝࣝͨ͠Իͷϥϕϧ self.result?.classifications.first?.confidence // ೝࣝͨ͠Իͷ৴པ } } }
DEMOɿ ໐͍ͬͯΔָثΛಛఆ struct ContentView: View { // … @StateObject private
var soundAnalyzer = SoundAnalyzer() var body: some View { ZStack(alignment: .bottom) { ScrollView { LazyVStack(spacing: 0) { // … BandImage(soundAnalyzer.result) } } RecordingButton { try? soundAnalyzer.analyze() }.padding(.vertical) } } }
DEMOɿಈըΛʮস͍ʯʮٽ͖ʯͰߜΓࠐΈ ಈըϑΝΠϧͷ63- 4/"VEJP'JMF"OBMZ[FS con fi dence >= 0.9 identi fi
er == “laughter” $.5JNF3BOHF
·ͱΊ
·ͱΊ • SFSpeechRecognizerɿݴޠೝࣝ • τϥϯεΫϦϓτɺݴޠΛϑοΫʹͨ͠ΞΫγϣϯ • AVSpeechSynthesizerɿൃ • Իग़ྗʹΑΔϑΟʔυόοΫ •
ShazamKitɿϚονϯά • ࣄલʹΧλϩάͷ༻ҙ͕ඞཁ • ݴޠͰ͋Δඞཁ͕ແ͘ɺOffsetͷ׆༻Ͱ͖Δ • SoundAnalysisɿྨ • ϏϧτΠϯϞσϧͰ͋Ε͙͢ʹ׆༻Մೳ • ΧελϜͷϞσϧΛ༻ҙͯ͠ɺҙͷྨΈࠐΊΔ
·ͱΊ • ඪ४ػೳ͚ͩͰଟ༷ͳମݧΛ࣮ݱͰ͖Δ • հ͖͠Εͳ͔ͬͨػೳෳ • ͍ॴݶΒΕΔ͕ɺ༗ޮʹ͑ϢχʔΫͳମݧΛఏڙͰ͖Δ
͋Γ͕ͱ͏͍͟͝·ͨ͠ʂ