Say It Ain't So: Implementing Speech Recognition in your app

Say It Ain't So: Implementing Speech Recognition in your app
!" @heymarcbrown

brooklynswift.com

A (Brief) History of Speech Recognition

A (Brief) History of Speech Recognition 1950s: Numbers + 10
words

words 1960s: A few hundred words

words 1960s: A few hundred words 1970s: 1k words, Beam Search

words 1960s: A few hundred words 1970s: 1k words, Beam Search 1980s: 20k words, HMM

words 1960s: A few hundred words 1970s: 1k words, Beam Search 1980s: 20k words, HMM 1990s: > Average human vocabulary

words 1960s: A few hundred words 1970s: 1k words, Beam Search 1980s: 20k words, HMM 1990s: > Average human vocabulary 2000s: LSTM, Smartphones

Introducing SiriKit Session 217

"Hey Siri, send Rivers Cuomo 5 dollars on WeezerPay for
recording the Blue Album" Domain: Payments Intent: sendPayment App: WeezerPay Payee: Rivers Cuomo Amount: 5 Currency: USD Note: Recording the Blue Album

Supported Domains 1. Messaging 2. VOIP calling 3. Payments 4.
Workouts 5. Ride booking 6. Photo search

Well .

Speech Recognition API Session 509

Speech Framework

Speech Framework → Siri speech-to-text

Speech Framework → Siri speech-to-text → Live audio or audio
ﬁle

ﬁle → Recommended + alternative transcriptions

ﬁle → Recommended + alternative transcriptions → 50+ languages & dialects

Info.plist // Speech Recognition access NSSpeechRecognitionUsageDescription // Microphone access NSMicrophoneUsageDescription

import Speech

Speech Recognizer // Specific language private let speechRecognizer = SFSpeechRecognizer(locale:
Locale(identifier: "en-US")) // Native language private let speechRecognizer = SFSpeechRecognizer(locale: Locale.current)

SFSpeechRecognizer.supportedLocales() ar-SA, ca-ES, cs-CZ, da-DK, de-AT, de-CH, de-DE, el- GR,
en-AE, en-AU, en-CA, en-GB, en-ID, en-IE, en-IN, en-NZ, en-PH, en-SA, en-SG, en-US, en-ZA, es-CL, es- CO, es-ES, es-MX, es-US, ﬁ-FI, fr-BE, fr-CA, fr-CH, fr-FR, he-IL, hr-HR, hu-HU, id-ID, it-CH, it-IT, ja-JP, ko-KR, ms-MY, nb-NO, nl-BE, nl-NL, pl-PL, pt-BR, pt-PT, ro- RO, ru-RU, sk-SK, sv-SE, th-TH, tr-TR, uk-UA, vi-VN, yue-CN, zh-CN, zh-HK, zh-TW

Request Authorization SFSpeechRecognizer.requestAuthorization { authStatus in OperationQueue.main.addOperation { switch authStatus
{ case .authorized: case .denied: case .restricted: case .notDetermined: } } }

Recognition Request private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest? ... recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
recognitionRequest.shouldReportPartialResults = true

Recognition Task private var recognitionTask: SFSpeechRecognitionTask? ... recognitionTask = speechRecognizer?.recognitionTask(with:
recognitionRequest) { result, error in var isFinal = false if let result = result { print(result.bestTranscription.formattedString) isFinal = result.isFinal } if error != nil || isFinal { // 60 sec limit reached } }

Audio Session let audioSession = AVAudioSession.sharedInstance() try audioSession.setCategory(AVAudioSessionCategoryRecord) try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

Capture Audio Stream private let audioEngine = AVAudioEngine() ... let
recordingFormat = audioEngine.inputNode.outputFormat(forBus: 0) audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in self.recognitionRequest?.append(buffer) } audioEngine.prepare() try audioEngine.start()

SFSpeechRecognizerDelegate import Speech public class ViewController: UIViewController, SFSpeechRecognizerDelegate { ...
speechRecognizer?.delegate = self ... public func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool) {

Demo ☎

Gotchas → Set permissions in Info.plist

Gotchas → Set permissions in Info.plist → Usage limits (device,
app)

Gotchas → Set permissions in Info.plist → Usage limits (device,
app) → Up to 60 sec live audio

UX Considerations → Choose your context wisely !

UX Considerations → Choose your context wisely ! → Recording
message "

UX Considerations → Choose your context wisely ! → Recording
message " → International language support ꑣ

References Speech Recognition WWDC Video developer.apple.com/videos/play/wwdc2016/509 Speech Recognition Demo /marcdown/SayWhat

Thanks ! @heymarcbrown

Say It Ain't So: Implementing Speech Recognitio...

Say It Ain't So: Implementing Speech Recognition in your app

More Decks by Marc Brown

Other Decks in Programming

Featured

Transcript