Voice-enabled iOS Apps

I DON’T KNOW WHAT YOU MEAN BY ‚…‘. HOW ABOUT
A WEB SEARCH FOR IT. Siri VOICE ASSISTANT REALITY

VOICE-ENABLED APPS SEBASTIAN MESSINGFELD https://github.com/messeb/ios-voice-interaction (in progress)

VOICE INTERACTIONS CURRENT USE CASES ▸ Siri interaction ▸ Integration
in iOS with system apps ▸ Text dictation ▸ Telephonie Recording of voice messages ▸ VoiceOver

VOICE USER INTERFACES CHALLENGES ▸ Voice-only user interface are hidden
▸ Voice inputs are public ▸ Many utterances to describe the same thing ▸ Everyone says it differently ▸ Queries for all needed values

VOICE USER INTERFACE - HOW IT WORKS VOICE ASSISTANT SPEECH
RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT Siri Alexa Google Assistant Cortana Bixby Magenta Natural Language Understanding Content Logic App Selection

VOICE INPUT INTERFACE - EXAMPLE VOICE ASSISTANT SPEECH RECOGNITION DOMAIN
SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT How will be the weather tomorrow in Cologne?

IOS VOICE INTERACTION ▸ Input ▸ Dictation ▸ AVAudioRecorder /
AVAudioEngine ▸ Speech framework ▸ SiriKit ▸ Voice command for Siri Shortcut

IOS VOICE INTERACTION ▸ Output ▸ VoiceOver ▸ AVSpeechSynthesizer class
▸ AVAudioPlayer class

IOS CONTEXT APP CONTEXT VOICE INPUT INTERFACE VOICE ASSISTANT SPEECH
RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT SIRIKIT AVAUDIORECORDER / AVAUDIOENGINE SHORTCUTS SHORTCUTS SPEECH FRAMEWORK SHORTCUTS

VOICE INPUT AVAUDIORECORDER ▸ High-Level API in contrast to AVAudioEngine
▸ Record microphone input ▸ Settings for quality ▸ Output into ﬁle

EXAMPLE AVAUDIORECORDER let audioFilename = documentsDirectoy.appendingPathComponent("recording.m4a") let settings = [
AVFormatIDKey: Int(kAudioFormatMPEG4AAC), AVSampleRateKey: 12000, AVNumberOfChannelsKey: 1, AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue ] let audioRecorder = try? AVAudioRecorder(url: audioFilename, settings: settings) audioRecorder?.delegate = self audioRecorder?.record() // Stop recording audioRecorder?.stop()

TEXT AVAUDIOENGINE ▸ Interaction with raw pcm audio ▸ Access
during voice input private let audioEngine = AVAudioEngine() let inputNode = audioEngine.inputNode let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in }

ANALYSIS AVAUDIORECORDER / AVAUDIOENGINE ▸ Subsequent analysis ▸ Speech2Text ▸
Intents ▸ Entities ▸ Solutions: ▸ Azure Cognitive Speech Services ▸ …

VOICE INPUT SPEECH FRAMEWORK ▸ Speech to text ▸ Audio
ﬁle or Microphone ▸ Used by iOS text dictation ▸ Results ▸ can be continuously ▸ Transcriptions + alternative interpretation ▸ Conﬁdence levels

TEXT SPEECH FRAMEWORK let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "de-DE")) let
recognitionRequest = SFSpeechAudioBufferRecognitionRequest() recognitionRequest?.taskHint = .search recognitionRequest.shouldReportPartialResults = true speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in print(result.bestTranscription.formattedString) } inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self](buffer: AVAudioPCMBuffer, when: AVAudioTime) in self?.recognitionRequest?.append(buffer) }

VOICE INPUT SPEECH FRAMEWORK USAGE ▸ Control of input interpretation
▸ Usage limits ▸ In-App text comparison ▸ Intent selection & entity extraction ▸ External services ▸ DialogFlow ▸ CoreML model

INTENT & ENTITY EXTRACTION DIALOGFLOW ▸ Service to create conversational
interfaces ▸ Custom interface with intents & entities ▸ Supports 14 languages ▸ Supports training of conversational model ▸ Interaction history

INTENT & ENTITY EXTRACTION DIALOGFLOW

INTENT HANDLER SIRIKIT ▸ Predeﬁned intents for speciﬁc services ▸
Restaurant, Payments, Ride Booking, … ▸ No interaction with user voice or text input ▸ Added as extension ▸ Intents: Handling request from SiriKit ▸ Intents UI: Displaying

INTENT HANDLER SIRIKIT class IntentHandler: INExtension { override func handler(for
intent: INIntent) -> Any? { if intent is INRequestRideIntent { return VIRideIntentHandler() } if intent is INSendPaymentIntent { return VISendPaymentIntentHandler() } if intent is INGetVisualCodeIntent { return VIGetVisualCodeIntentHandler() } return .none } }

ENTITY EXTRACTION SIRIKIT ▸ Predeﬁned Entities ▸ Ride: pickupLocation, dropOffLocation,
scheduledPickupTime, partySize, paymentMethod ▸ „Callback“ Function for Entities ▸ .needValue(), .notRequired(), .unsupported() ▸ .success(), .unsupported()

ENTITY EXTRACTION SIRIKIT func resolvePartySize(for intent: INRequestRideIntent, with completion: @escaping
(INIntegerResolutionResult) -> Void) { guard let partySize = intent.partySize else { completion(.needsValue()) return } switch partySize { case 1...4: completion(.success(with: partySize)) case 5...8: completion(.confirmationRequired(with: partySize)) default: completion(.unsupported()) } }

SIRIKIT ▸ No customisation of voice interface possible ▸ Custom
Vocabulary ▸ App Name ▸ Intent Names ▸ Custom Terms of app

TEXT SIRI SHORTCUTS ▸ Shortcuts for already performed actions ▸
App-speciﬁc ▸ Intent & entities are donated to Siri Shortcuts ▸ Shortcuts can be executed by voice

TEXT SIRI SHORTCUTS ▸ Added by intent deﬁnition ﬁles ▸
Shortcut has a category

DONATION SIRI SHORTCUTS ▸ Donation via NSActivity or Intent ▸
NSActivity: forward to app ▸ Intent: inside Siri let intent = PizzaOrderIntent() intent.pizza = "Tonno" intent.suggestedInvocationPhrase = "Bestelle eine Tonno Pizza" let interaction = INInteraction(intent: intent, response: nil) interaction.donate { _ in }

VOICE SIRI SHORTCUTS ▸ Voice command connection in app ▸
Suggestion command ▸ Custom responses possible

TEXT AVSPEECHSYNTHESIZER ▸ Generate speech output in app ▸ Siri
voices (AVSpeechSynthesisVoice.speechVoices()) ▸ Control and monitoring of ongoing speech ▸ String & NSAttributedString ▸ Custom pronunciation ▸ International Phonetic Alphabet (IPA)

TEXT AVSPEECHSYNTHESIZER let synthesizer = AVSpeechSynthesizer() synthesizer.delegate = self let
utterances = AVSpeechUtterance(string: "...") utterances.voice = AVSpeechSynthesisVoice(language: "de-DE") synthesizer.speak(utterances) // Delegate AVSpeechSynthesizerDelegate { func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { } }

TEXT CONCLUSION ▸ Other approach than Alexa, Google Assistant ▸
Custom conversational interface ▸ Apples voice interface is more predictable for user ▸ SiriKit: Same interface for equal services ▸ Shortcuts: Own Voice command ▸ Speech / AVAudioEngine: In app with additional gui ▸ Will Siri Shortcuts & SiriKit are the next 3D Touch?

Voice-enabled iOS Apps

Voice-enabled iOS Apps

Sebastian

More Decks by Sebastian

Other Decks in Technology

Featured

Transcript

I DON’T KNOW WHAT YOU MEAN BY ‚…‘. HOW ABOUT

VOICE-ENABLED APPS SEBASTIAN MESSINGFELD https://github.com/messeb/ios-voice-interaction (in progress)

VOICE INTERACTIONS CURRENT USE CASES ▸ Siri interaction ▸ Integration

VOICE USER INTERFACES CHALLENGES ▸ Voice-only user interface are hidden

VOICE USER INTERFACE - HOW IT WORKS VOICE ASSISTANT SPEECH

VOICE INPUT INTERFACE - EXAMPLE VOICE ASSISTANT SPEECH RECOGNITION DOMAIN

IOS VOICE INTERACTION ▸ Input ▸ Dictation ▸ AVAudioRecorder /

IOS VOICE INTERACTION ▸ Output ▸ VoiceOver ▸ AVSpeechSynthesizer class

IOS CONTEXT APP CONTEXT VOICE INPUT INTERFACE VOICE ASSISTANT SPEECH

VOICE INPUT AVAUDIORECORDER ▸ High-Level API in contrast to AVAudioEngine

EXAMPLE AVAUDIORECORDER let audioFilename = documentsDirectoy.appendingPathComponent("recording.m4a") let settings = [

TEXT AVAUDIOENGINE ▸ Interaction with raw pcm audio ▸ Access

ANALYSIS AVAUDIORECORDER / AVAUDIOENGINE ▸ Subsequent analysis ▸ Speech2Text ▸

VOICE INPUT SPEECH FRAMEWORK ▸ Speech to text ▸ Audio

TEXT SPEECH FRAMEWORK let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "de-DE")) let

VOICE INPUT SPEECH FRAMEWORK USAGE ▸ Control of input interpretation

INTENT & ENTITY EXTRACTION DIALOGFLOW ▸ Service to create conversational

INTENT & ENTITY EXTRACTION DIALOGFLOW

INTENT HANDLER SIRIKIT ▸ Predeﬁned intents for speciﬁc services ▸

INTENT HANDLER SIRIKIT class IntentHandler: INExtension { override func handler(for

ENTITY EXTRACTION SIRIKIT ▸ Predeﬁned Entities ▸ Ride: pickupLocation, dropOffLocation,

ENTITY EXTRACTION SIRIKIT func resolvePartySize(for intent: INRequestRideIntent, with completion: @escaping

SIRIKIT ▸ No customisation of voice interface possible ▸ Custom

TEXT SIRI SHORTCUTS ▸ Shortcuts for already performed actions ▸

TEXT SIRI SHORTCUTS ▸ Added by intent deﬁnition ﬁles ▸

DONATION SIRI SHORTCUTS ▸ Donation via NSActivity or Intent ▸

VOICE SIRI SHORTCUTS ▸ Voice command connection in app ▸

TEXT AVSPEECHSYNTHESIZER ▸ Generate speech output in app ▸ Siri

TEXT AVSPEECHSYNTHESIZER let synthesizer = AVSpeechSynthesizer() synthesizer.delegate = self let

TEXT CONCLUSION ▸ Other approach than Alexa, Google Assistant ▸