Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Voice-enabled iOS Apps

Sebastian
December 20, 2018

Voice-enabled iOS Apps

Describes voice interactions in iOS apps with AVAudiorecoder, AVAudioEngine, Speech framework, SiriKit, Voice command for Siri Shortcuts and AVSpeechSynthesizer.

Sebastian

December 20, 2018
Tweet

More Decks by Sebastian

Other Decks in Technology

Transcript

  1. I DON’T KNOW WHAT YOU MEAN BY ‚…‘. HOW ABOUT

    A WEB SEARCH FOR IT. Siri VOICE ASSISTANT REALITY
  2. VOICE INTERACTIONS CURRENT USE CASES ▸ Siri interaction ▸ Integration

    in iOS with system apps ▸ Text dictation ▸ Telephonie Recording of voice messages ▸ VoiceOver
  3. VOICE USER INTERFACES CHALLENGES ▸ Voice-only user interface are hidden

    ▸ Voice inputs are public ▸ Many utterances to describe the same thing ▸ Everyone says it differently ▸ Queries for all needed values
  4. VOICE USER INTERFACE - HOW IT WORKS VOICE ASSISTANT SPEECH

    RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT Siri Alexa Google Assistant Cortana Bixby Magenta Natural Language Understanding Content Logic App Selection
  5. VOICE INPUT INTERFACE - EXAMPLE VOICE ASSISTANT SPEECH RECOGNITION DOMAIN

    SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT How will be the weather tomorrow in Cologne?
  6. IOS VOICE INTERACTION ▸ Input ▸ Dictation ▸ AVAudioRecorder /

    AVAudioEngine ▸ Speech framework ▸ SiriKit ▸ Voice command for Siri Shortcut
  7. IOS CONTEXT APP CONTEXT VOICE INPUT INTERFACE VOICE ASSISTANT SPEECH

    RECOGNITION DOMAIN SELECTION INTENT SELECTION ENTITY EXTRACTION FULFILLMENT SIRIKIT AVAUDIORECORDER / AVAUDIOENGINE SHORTCUTS SHORTCUTS SPEECH FRAMEWORK SHORTCUTS
  8. VOICE INPUT AVAUDIORECORDER ▸ High-Level API in contrast to AVAudioEngine

    ▸ Record microphone input ▸ Settings for quality ▸ Output into file
  9. EXAMPLE AVAUDIORECORDER let audioFilename = documentsDirectoy.appendingPathComponent("recording.m4a") let settings = [

    AVFormatIDKey: Int(kAudioFormatMPEG4AAC), AVSampleRateKey: 12000, AVNumberOfChannelsKey: 1, AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue ] let audioRecorder = try? AVAudioRecorder(url: audioFilename, settings: settings) audioRecorder?.delegate = self audioRecorder?.record() // Stop recording audioRecorder?.stop()
  10. TEXT AVAUDIOENGINE ▸ Interaction with raw pcm audio ▸ Access

    during voice input private let audioEngine = AVAudioEngine() let inputNode = audioEngine.inputNode let recordingFormat = inputNode.outputFormat(forBus: 0) inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in }
  11. ANALYSIS AVAUDIORECORDER / AVAUDIOENGINE ▸ Subsequent analysis ▸ Speech2Text ▸

    Intents ▸ Entities ▸ Solutions: ▸ Azure Cognitive Speech Services ▸ …
  12. VOICE INPUT SPEECH FRAMEWORK ▸ Speech to text ▸ Audio

    file or Microphone ▸ Used by iOS text dictation ▸ Results ▸ can be continuously ▸ Transcriptions + alternative interpretation ▸ Confidence levels
  13. TEXT SPEECH FRAMEWORK let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "de-DE")) let

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest() recognitionRequest?.taskHint = .search recognitionRequest.shouldReportPartialResults = true speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in print(result.bestTranscription.formattedString) } inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self](buffer: AVAudioPCMBuffer, when: AVAudioTime) in self?.recognitionRequest?.append(buffer) }
  14. VOICE INPUT SPEECH FRAMEWORK USAGE ▸ Control of input interpretation

    ▸ Usage limits ▸ In-App text comparison ▸ Intent selection & entity extraction ▸ External services ▸ DialogFlow ▸ CoreML model
  15. INTENT & ENTITY EXTRACTION DIALOGFLOW ▸ Service to create conversational

    interfaces ▸ Custom interface with intents & entities ▸ Supports 14 languages ▸ Supports training of conversational model ▸ Interaction history
  16. INTENT HANDLER SIRIKIT ▸ Predefined intents for specific services ▸

    Restaurant, Payments, Ride Booking, … ▸ No interaction with user voice or text input ▸ Added as extension ▸ Intents: Handling request from SiriKit ▸ Intents UI: Displaying
  17. INTENT HANDLER SIRIKIT class IntentHandler: INExtension { override func handler(for

    intent: INIntent) -> Any? { if intent is INRequestRideIntent { return VIRideIntentHandler() } if intent is INSendPaymentIntent { return VISendPaymentIntentHandler() } if intent is INGetVisualCodeIntent { return VIGetVisualCodeIntentHandler() } return .none } }
  18. ENTITY EXTRACTION SIRIKIT ▸ Predefined Entities ▸ Ride: pickupLocation, dropOffLocation,

    scheduledPickupTime, partySize, paymentMethod ▸ „Callback“ Function for Entities ▸ .needValue(), .notRequired(), .unsupported() ▸ .success(), .unsupported()
  19. ENTITY EXTRACTION SIRIKIT func resolvePartySize(for intent: INRequestRideIntent, with completion: @escaping

    (INIntegerResolutionResult) -> Void) { guard let partySize = intent.partySize else { completion(.needsValue()) return } switch partySize { case 1...4: completion(.success(with: partySize)) case 5...8: completion(.confirmationRequired(with: partySize)) default: completion(.unsupported()) } }
  20. SIRIKIT ▸ No customisation of voice interface possible ▸ Custom

    Vocabulary ▸ App Name ▸ Intent Names ▸ Custom Terms of app
  21. TEXT SIRI SHORTCUTS ▸ Shortcuts for already performed actions ▸

    App-specific ▸ Intent & entities are donated to Siri Shortcuts ▸ Shortcuts can be executed by voice
  22. DONATION SIRI SHORTCUTS ▸ Donation via NSActivity or Intent ▸

    NSActivity: forward to app ▸ Intent: inside Siri let intent = PizzaOrderIntent() intent.pizza = "Tonno" intent.suggestedInvocationPhrase = "Bestelle eine Tonno Pizza" let interaction = INInteraction(intent: intent, response: nil) interaction.donate { _ in }
  23. VOICE SIRI SHORTCUTS ▸ Voice command connection in app ▸

    Suggestion command ▸ Custom responses possible
  24. TEXT AVSPEECHSYNTHESIZER ▸ Generate speech output in app ▸ Siri

    voices (AVSpeechSynthesisVoice.speechVoices()) ▸ Control and monitoring of ongoing speech ▸ String & NSAttributedString ▸ Custom pronunciation ▸ International Phonetic Alphabet (IPA)
  25. TEXT AVSPEECHSYNTHESIZER let synthesizer = AVSpeechSynthesizer() synthesizer.delegate = self let

    utterances = AVSpeechUtterance(string: "...") utterances.voice = AVSpeechSynthesisVoice(language: "de-DE") synthesizer.speak(utterances) // Delegate AVSpeechSynthesizerDelegate { func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) { } }
  26. TEXT CONCLUSION ▸ Other approach than Alexa, Google Assistant ▸

    Custom conversational interface ▸ Apples voice interface is more predictable for user ▸ SiriKit: Same interface for equal services ▸ Shortcuts: Own Voice command ▸ Speech / AVAudioEngine: In app with additional gui ▸ Will Siri Shortcuts & SiriKit are the next 3D Touch?