$30 off During Our Annual Pro Sale. View Details »

Voice-enabled iOS Apps

Sebastian
December 20, 2018

Voice-enabled iOS Apps

Describes voice interactions in iOS apps with AVAudiorecoder, AVAudioEngine, Speech framework, SiriKit, Voice command for Siri Shortcuts and AVSpeechSynthesizer.

Sebastian

December 20, 2018
Tweet

More Decks by Sebastian

Other Decks in Technology

Transcript

  1. I DON’T KNOW WHAT YOU
    MEAN BY ‚…‘. HOW ABOUT A
    WEB SEARCH FOR IT.
    Siri
    VOICE ASSISTANT REALITY

    View Slide

  2. VOICE-ENABLED APPS
    SEBASTIAN MESSINGFELD
    https://github.com/messeb/ios-voice-interaction (in progress)

    View Slide

  3. VOICE INTERACTIONS
    CURRENT USE CASES
    ▸ Siri interaction
    ▸ Integration in iOS with system apps
    ▸ Text dictation
    ▸ Telephonie Recording of voice messages
    ▸ VoiceOver

    View Slide

  4. VOICE USER INTERFACES
    CHALLENGES
    ▸ Voice-only user interface are hidden
    ▸ Voice inputs are public
    ▸ Many utterances to describe the same thing
    ▸ Everyone says it differently
    ▸ Queries for all needed values

    View Slide

  5. VOICE USER INTERFACE - HOW IT WORKS
    VOICE
    ASSISTANT
    SPEECH
    RECOGNITION
    DOMAIN
    SELECTION
    INTENT
    SELECTION
    ENTITY
    EXTRACTION
    FULFILLMENT
    Siri
    Alexa
    Google Assistant
    Cortana
    Bixby
    Magenta
    Natural Language
    Understanding
    Content
    Logic
    App Selection

    View Slide

  6. VOICE INPUT INTERFACE - EXAMPLE
    VOICE
    ASSISTANT
    SPEECH
    RECOGNITION
    DOMAIN
    SELECTION
    INTENT
    SELECTION
    ENTITY
    EXTRACTION
    FULFILLMENT
    How will be the weather tomorrow in Cologne?

    View Slide

  7. IOS VOICE INTERACTION
    ▸ Input
    ▸ Dictation
    ▸ AVAudioRecorder / AVAudioEngine
    ▸ Speech framework
    ▸ SiriKit
    ▸ Voice command for Siri Shortcut

    View Slide

  8. IOS VOICE INTERACTION
    ▸ Output
    ▸ VoiceOver
    ▸ AVSpeechSynthesizer class
    ▸ AVAudioPlayer class

    View Slide

  9. IOS CONTEXT
    APP CONTEXT
    VOICE INPUT INTERFACE
    VOICE
    ASSISTANT
    SPEECH
    RECOGNITION
    DOMAIN
    SELECTION
    INTENT
    SELECTION
    ENTITY
    EXTRACTION
    FULFILLMENT
    SIRIKIT
    AVAUDIORECORDER / AVAUDIOENGINE
    SHORTCUTS SHORTCUTS
    SPEECH FRAMEWORK
    SHORTCUTS

    View Slide

  10. VOICE INPUT
    AVAUDIORECORDER
    ▸ High-Level API in contrast to AVAudioEngine
    ▸ Record microphone input
    ▸ Settings for quality
    ▸ Output into file

    View Slide

  11. EXAMPLE
    AVAUDIORECORDER
    let audioFilename = documentsDirectoy.appendingPathComponent("recording.m4a")
    let settings = [
    AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
    AVSampleRateKey: 12000,
    AVNumberOfChannelsKey: 1,
    AVEncoderAudioQualityKey: AVAudioQuality.high.rawValue
    ]
    let audioRecorder = try? AVAudioRecorder(url: audioFilename, settings: settings)
    audioRecorder?.delegate = self
    audioRecorder?.record()
    // Stop recording
    audioRecorder?.stop()

    View Slide

  12. TEXT
    AVAUDIOENGINE
    ▸ Interaction with raw pcm audio
    ▸ Access during voice input
    private let audioEngine = AVAudioEngine()
    let inputNode = audioEngine.inputNode
    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0,
    bufferSize: 1024,
    format: recordingFormat)
    { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
    }

    View Slide

  13. ANALYSIS
    AVAUDIORECORDER / AVAUDIOENGINE
    ▸ Subsequent analysis
    ▸ Speech2Text
    ▸ Intents
    ▸ Entities
    ▸ Solutions:
    ▸ Azure Cognitive Speech Services
    ▸ …

    View Slide

  14. VOICE INPUT
    SPEECH FRAMEWORK
    ▸ Speech to text
    ▸ Audio file or Microphone
    ▸ Used by iOS text dictation
    ▸ Results
    ▸ can be continuously
    ▸ Transcriptions + alternative interpretation
    ▸ Confidence levels

    View Slide

  15. TEXT
    SPEECH FRAMEWORK
    let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "de-DE"))
    let recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    recognitionRequest?.taskHint = .search
    recognitionRequest.shouldReportPartialResults = true
    speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
    print(result.bestTranscription.formattedString)
    }
    inputNode.installTap(onBus: 0,
    bufferSize: 1024,
    format: recordingFormat)
    {
    [weak self](buffer: AVAudioPCMBuffer, when: AVAudioTime) in
    self?.recognitionRequest?.append(buffer)
    }

    View Slide

  16. VOICE INPUT
    SPEECH FRAMEWORK USAGE
    ▸ Control of input interpretation
    ▸ Usage limits
    ▸ In-App text comparison
    ▸ Intent selection & entity extraction
    ▸ External services
    ▸ DialogFlow
    ▸ CoreML model

    View Slide

  17. INTENT & ENTITY EXTRACTION
    DIALOGFLOW
    ▸ Service to create conversational interfaces
    ▸ Custom interface with intents & entities
    ▸ Supports 14 languages
    ▸ Supports training of conversational model
    ▸ Interaction history

    View Slide

  18. INTENT & ENTITY EXTRACTION
    DIALOGFLOW

    View Slide

  19. INTENT HANDLER
    SIRIKIT
    ▸ Predefined intents for specific services
    ▸ Restaurant, Payments, Ride Booking, …
    ▸ No interaction with user voice or text input
    ▸ Added as extension
    ▸ Intents: Handling request from SiriKit
    ▸ Intents UI: Displaying

    View Slide

  20. INTENT HANDLER
    SIRIKIT
    class IntentHandler: INExtension {
    override func handler(for intent: INIntent) -> Any? {
    if intent is INRequestRideIntent {
    return VIRideIntentHandler()
    }
    if intent is INSendPaymentIntent {
    return VISendPaymentIntentHandler()
    }
    if intent is INGetVisualCodeIntent {
    return VIGetVisualCodeIntentHandler()
    }
    return .none
    }
    }

    View Slide

  21. ENTITY EXTRACTION
    SIRIKIT
    ▸ Predefined Entities
    ▸ Ride: pickupLocation, dropOffLocation,
    scheduledPickupTime, partySize, paymentMethod
    ▸ „Callback“ Function for Entities
    ▸ .needValue(), .notRequired(), .unsupported()
    ▸ .success(), .unsupported()

    View Slide

  22. ENTITY EXTRACTION
    SIRIKIT
    func resolvePartySize(for intent: INRequestRideIntent,
    with completion: @escaping (INIntegerResolutionResult) -> Void) {
    guard let partySize = intent.partySize else {
    completion(.needsValue())
    return
    }
    switch partySize {
    case 1...4:
    completion(.success(with: partySize))
    case 5...8:
    completion(.confirmationRequired(with: partySize))
    default:
    completion(.unsupported())
    }
    }

    View Slide

  23. SIRIKIT
    ▸ No customisation of voice interface possible
    ▸ Custom Vocabulary
    ▸ App Name
    ▸ Intent Names
    ▸ Custom Terms of app

    View Slide

  24. TEXT
    SIRI SHORTCUTS
    ▸ Shortcuts for already performed actions
    ▸ App-specific
    ▸ Intent & entities are donated to Siri Shortcuts
    ▸ Shortcuts can be executed by voice

    View Slide

  25. TEXT
    SIRI SHORTCUTS
    ▸ Added by intent definition files
    ▸ Shortcut has a category

    View Slide

  26. DONATION
    SIRI SHORTCUTS
    ▸ Donation via NSActivity or Intent
    ▸ NSActivity: forward to app
    ▸ Intent: inside Siri
    let intent = PizzaOrderIntent()
    intent.pizza = "Tonno"
    intent.suggestedInvocationPhrase = "Bestelle eine Tonno Pizza"
    let interaction = INInteraction(intent: intent, response: nil)
    interaction.donate { _ in
    }

    View Slide

  27. VOICE
    SIRI SHORTCUTS
    ▸ Voice command connection in app
    ▸ Suggestion command
    ▸ Custom responses possible

    View Slide

  28. TEXT
    AVSPEECHSYNTHESIZER
    ▸ Generate speech output in app
    ▸ Siri voices (AVSpeechSynthesisVoice.speechVoices())
    ▸ Control and monitoring of ongoing speech
    ▸ String & NSAttributedString
    ▸ Custom pronunciation
    ▸ International Phonetic Alphabet (IPA)

    View Slide

  29. TEXT
    AVSPEECHSYNTHESIZER
    let synthesizer = AVSpeechSynthesizer()
    synthesizer.delegate = self
    let utterances = AVSpeechUtterance(string: "...")
    utterances.voice = AVSpeechSynthesisVoice(language: "de-DE")
    synthesizer.speak(utterances)
    // Delegate
    AVSpeechSynthesizerDelegate {
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer,
    willSpeakRangeOfSpeechString characterRange: NSRange,
    utterance: AVSpeechUtterance) {
    }
    }

    View Slide

  30. TEXT
    CONCLUSION
    ▸ Other approach than Alexa, Google Assistant
    ▸ Custom conversational interface
    ▸ Apples voice interface is more predictable for user
    ▸ SiriKit: Same interface for equal services
    ▸ Shortcuts: Own Voice command
    ▸ Speech / AVAudioEngine: In app with additional gui
    ▸ Will Siri Shortcuts & SiriKit are the next 3D Touch?

    View Slide