Slide 1

Slide 1 text

I DON’T KNOW WHAT YOU MEAN BY ‚…‘. HOW ABOUT A WEB SEARCH FOR IT. Siri VOICE ASSISTANT REALITY

Slide 2

Slide 2 text

VOICE-ENABLED APPS SEBASTIAN MESSINGFELD https://github.com/messeb/ios-voice-interaction (in progress)

Slide 3

Slide 3 text

VOICE INTERACTIONS CURRENT USE CASES ▸ Siri interaction ▸ Integration in iOS with system apps ▸ Text dictation ▸ Telephonie Recording of voice messages ▸ VoiceOver

Slide 4

Slide 4 text

VOICE USER INTERFACES CHALLENGES ▸ Voice-only user interface are hidden ▸ Voice inputs are public ▸ Many utterances to describe the same thing ▸ Everyone says it differently ▸ Queries for all needed values

Slide 5

Slide 5 text

VOICE USER INTERFACE - HOW IT WORKS VOICE ASSISTANT SPEECH RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT Siri Alexa Google Assistant Cortana Bixby Magenta Natural Language Understanding Content Logic App Selection

Slide 6

Slide 6 text

VOICE INPUT INTERFACE - EXAMPLE VOICE ASSISTANT SPEECH RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT How will be the weather tomorrow in Cologne?

Slide 7

Slide 7 text

IOS VOICE INTERACTION ▸ Input ▸ Dictation ▸ AVAudioRecorder / AVAudioEngine ▸ Speech framework ▸ SiriKit ▸ Voice command for Siri Shortcut

Slide 8

Slide 8 text

IOS VOICE INTERACTION ▸ Output ▸ VoiceOver ▸ AVSpeechSynthesizer class ▸ AVAudioPlayer class

Slide 9

Slide 9 text

IOS CONTEXT APP CONTEXT VOICE INPUT INTERFACE VOICE ASSISTANT SPEECH RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT SIRIKIT AVAUDIORECORDER / AVAUDIOENGINE SHORTCUTS SHORTCUTS SPEECH FRAMEWORK SHORTCUTS

Slide 10

Slide 10 text

VOICE INPUT AVAUDIORECORDER ▸ High-Level API in contrast to AVAudioEngine ▸ Record microphone input ▸ Settings for quality ▸ Output into file

Slide 11

Slide 11 text

EXAMPLE AVAUDIORECORDER let audioFilename = documentsDirectoy.appendingPathComponent("recording.m4a") let settings = [ AVFormatIDKey: Int(kAudioFormatMPEG4AAC), AVSampleRateKey: 12000, AVNumberOfChannelsKey: 1, AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue ] let audioRecorder = try? AVAudioRecorder(url: audioFilename, settings: settings) audioRecorder?.delegate = self audioRecorder?.record() // Stop recording audioRecorder?.stop()

Slide 12

Slide 12 text

TEXT AVAUDIOENGINE ▸ Interaction with raw pcm audio ▸ Access during voice input private let audioEngine = AVAudioEngine() let inputNode = audioEngine.inputNode let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in }

Slide 13

Slide 13 text

ANALYSIS AVAUDIORECORDER / AVAUDIOENGINE ▸ Subsequent analysis ▸ Speech2Text ▸ Intents ▸ Entities ▸ Solutions: ▸ Azure Cognitive Speech Services ▸ …

Slide 14

Slide 14 text

VOICE INPUT SPEECH FRAMEWORK ▸ Speech to text ▸ Audio file or Microphone ▸ Used by iOS text dictation ▸ Results ▸ can be continuously ▸ Transcriptions + alternative interpretation ▸ Confidence levels

Slide 15

Slide 15 text

TEXT SPEECH FRAMEWORK let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "de-DE")) let recognitionRequest = SFSpeechAudioBufferRecognitionRequest() recognitionRequest?.taskHint = .search recognitionRequest.shouldReportPartialResults = true speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in print(result.bestTranscription.formattedString) } inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self](buffer: AVAudioPCMBuffer, when: AVAudioTime) in self?.recognitionRequest?.append(buffer) }

Slide 16

Slide 16 text

VOICE INPUT SPEECH FRAMEWORK USAGE ▸ Control of input interpretation ▸ Usage limits ▸ In-App text comparison ▸ Intent selection & entity extraction ▸ External services ▸ DialogFlow ▸ CoreML model

Slide 17

Slide 17 text

INTENT & ENTITY EXTRACTION DIALOGFLOW ▸ Service to create conversational interfaces ▸ Custom interface with intents & entities ▸ Supports 14 languages ▸ Supports training of conversational model ▸ Interaction history

Slide 18

Slide 18 text

INTENT & ENTITY EXTRACTION DIALOGFLOW

Slide 19

Slide 19 text

INTENT HANDLER SIRIKIT ▸ Predefined intents for specific services ▸ Restaurant, Payments, Ride Booking, … ▸ No interaction with user voice or text input ▸ Added as extension ▸ Intents: Handling request from SiriKit ▸ Intents UI: Displaying

Slide 20

Slide 20 text

INTENT HANDLER SIRIKIT class IntentHandler: INExtension { override func handler(for intent: INIntent) -> Any? { if intent is INRequestRideIntent { return VIRideIntentHandler() } if intent is INSendPaymentIntent { return VISendPaymentIntentHandler() } if intent is INGetVisualCodeIntent { return VIGetVisualCodeIntentHandler() } return .none } }

Slide 21

Slide 21 text

ENTITY EXTRACTION SIRIKIT ▸ Predefined Entities ▸ Ride: pickupLocation, dropOffLocation, scheduledPickupTime, partySize, paymentMethod ▸ „Callback“ Function for Entities ▸ .needValue(), .notRequired(), .unsupported() ▸ .success(), .unsupported()

Slide 22

Slide 22 text

ENTITY EXTRACTION SIRIKIT func resolvePartySize(for intent: INRequestRideIntent, with completion: @escaping (INIntegerResolutionResult) -> Void) { guard let partySize = intent.partySize else { completion(.needsValue()) return } switch partySize { case 1...4: completion(.success(with: partySize)) case 5...8: completion(.confirmationRequired(with: partySize)) default: completion(.unsupported()) } }

Slide 23

Slide 23 text

SIRIKIT ▸ No customisation of voice interface possible ▸ Custom Vocabulary ▸ App Name ▸ Intent Names ▸ Custom Terms of app

Slide 24

Slide 24 text

TEXT SIRI SHORTCUTS ▸ Shortcuts for already performed actions ▸ App-specific ▸ Intent & entities are donated to Siri Shortcuts ▸ Shortcuts can be executed by voice

Slide 25

Slide 25 text

TEXT SIRI SHORTCUTS ▸ Added by intent definition files ▸ Shortcut has a category

Slide 26

Slide 26 text

DONATION SIRI SHORTCUTS ▸ Donation via NSActivity or Intent ▸ NSActivity: forward to app ▸ Intent: inside Siri let intent = PizzaOrderIntent() intent.pizza = "Tonno" intent.suggestedInvocationPhrase = "Bestelle eine Tonno Pizza" let interaction = INInteraction(intent: intent, response: nil) interaction.donate { _ in }

Slide 27

Slide 27 text

VOICE SIRI SHORTCUTS ▸ Voice command connection in app ▸ Suggestion command ▸ Custom responses possible

Slide 28

Slide 28 text

TEXT AVSPEECHSYNTHESIZER ▸ Generate speech output in app ▸ Siri voices (AVSpeechSynthesisVoice.speechVoices()) ▸ Control and monitoring of ongoing speech ▸ String & NSAttributedString ▸ Custom pronunciation ▸ International Phonetic Alphabet (IPA)

Slide 29

Slide 29 text

TEXT AVSPEECHSYNTHESIZER let synthesizer = AVSpeechSynthesizer() synthesizer.delegate = self let utterances = AVSpeechUtterance(string: "...") utterances.voice = AVSpeechSynthesisVoice(language: "de-DE") synthesizer.speak(utterances) // Delegate AVSpeechSynthesizerDelegate { func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { } }

Slide 30

Slide 30 text

TEXT CONCLUSION ▸ Other approach than Alexa, Google Assistant ▸ Custom conversational interface ▸ Apples voice interface is more predictable for user ▸ SiriKit: Same interface for equal services ▸ Shortcuts: Own Voice command ▸ Speech / AVAudioEngine: In app with additional gui ▸ Will Siri Shortcuts & SiriKit are the next 3D Touch?