Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creating Conversational Interfaces on iOS - mDe...

Creating Conversational Interfaces on iOS - mDevCamp 2018

mDevCamp Prague 2018

We'll explore the latest technologies for creating conversational interfaces in your app. We'll explore speech recognition, both with Apple's APIs and other solutions such as Amazon Lex.

Wendy Lu

June 15, 2018
Tweet

More Decks by Wendy Lu

Other Decks in Technology

Transcript

  1. 1) Why conversational interfaces? 1) Why conversational interfaces? 3) In

    your own apps 2) Examples of apps 4) Design best practices
  2. 2) Examples of apps 1) Why conversational interfaces? 3) In

    your own apps 2) Examples of apps 4) Design best practices
  3. 2) Examples of apps 1) Why conversational interfaces? 3) In

    your own apps 2) Examples of apps 4) Design best practices 3) In your own apps
  4. iOS Speech Recognition API Amazon Transcribe Google Cloud Speech-to-Text IBM

    Watson Microsoft Speech API iOS Speech Recognition API
  5. Google Cloud / Dialogflow Bing Speech API / Azure Bot

    Service IBM Watson / Assistant Amazon Transcribe / Lex
  6. Cost Platforms Free Trial Languages Accuracy (Speech-to- text) API Response

    Time iOS Speech Recognition API Free iOS only n/a 63 n/a n/a Amazon Transcribe/ Lex $0.004/ request iOS, Android, Web 5000 requests/ month English + Spanish/ English only n/a n/a Google Cloud/ Dialogflow $0.0065/ request iOS, Android, Web 1000 requests/ day 100+/ 26 40% 2200 ms Bing Speech API/Azure Bot Service $0.004/ request iOS, Android, Web 5000 request/ month 34/ 18 28% 2500 ms IBM Watson/ Assistant $0.0025/ request iOS, Android, Web 10,000 requests 9 33% 4500 ms Accuracy and API response time referenced from: https://recast.ai/blog/benchmarking-speech-recognition-api/
  7. Cost Platforms Free Trial Languages Accuracy (Speech-to- text) API Response

    Time iOS Speech Recognition API Free iOS only n/a 63 n/a n/a Amazon Transcribe/ Lex $0.004/ request iOS, Android, Web 5000 requests/ month English + Spanish/ English only n/a n/a Google Cloud/ Dialogflow $0.0065/ request iOS, Android, Web 1000 requests/ day 100+/ 26 40% 2200 ms Bing Speech API/Azure Bot Service $0.004/ request iOS, Android, Web 5000 request/ month 34/ 18 28% 2500 ms IBM Watson/ Assistant $0.0025/ request iOS, Android, Web 10,000 requests 9 33% 4500 ms Accuracy and API response time referenced from: https://recast.ai/blog/benchmarking-speech-recognition-api/
  8. Additional Resources WWDC - Speech Recognition API Lex iOS Demo

    https://developer.apple.com/videos/play/wwdc2016/509/ https://github.com/wendylu/TestLex Benchmarking Speech APIs https://recast.ai/blog/benchmarking-speech-recognition-api/
  9. func recognizeRecording() { guard let url = Bundle.main.url(forResource: "hi", withExtension:

    "m4a") else { return } guard let recognizer = SFSpeechRecognizer() else { // Device or locale not supported return } if !recognizer.isAvailable { // Internet connection may not be available return } let request = SFSpeechURLRecognitionRequest(url: url) recognizer.recognitionTask(with: request) { (result, error) in guard let result = result else { return } print("result: \(result.bestTranscription.formattedString)") if result.isFinal { print("final result: \(result.bestTranscription.formattedString)") } } }
  10. let audioEngine = AVAudioEngine() let speechRecognizer = SFSpeechRecognizer() let request

    = SFSpeechAudioBufferRecognitionRequest() func startRecording() throws { let node = audioEngine.inputNode let recordingFormat = node.outputFormat(forBus: 0) node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self] (buffer, _) in self?.request.append(buffer) } audioEngine.prepare() try audioEngine.start() speechRecognizer?.recognitionTask(with: request, resultHandler: { (result, error) in guard let result = result else { return } print("result: \(result.bestTranscription.formattedString)") }) } func stopRecording() { audioEngine.stop() request.endAudio() }
  11. func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool

    { let credentialsProvider = AWSCognitoCredentialsProvider(regionType: AWSRegionType.USEast1, identityPoolId:"your-pool-id") let serviceConfiguration = AWSServiceConfiguration(region: AWSRegionType.USEast1, credentialsProvider:credentialsProvider) AWSServiceManager.default().defaultServiceConfiguration = serviceConfiguration let config = AWSLexInteractionKitConfig.defaultInteractionKitConfig(withBotName: "PizzaBot", botAlias:"Prod") // 5000 seconds before timeout config.noSpeechTimeoutInterval = 5000 config.maxSpeechTimeoutInterval = 5000 // We will use this key to retrieve the interaction kit in our view controller AWSLexInteractionKit.register(with: serviceConfiguration!, interactionKitConfiguration: config, forKey:"USEast1InteractionKit") return true } Amazon Lex
  12. internal func interactionKit(onAudioPlaybackStarted _ : AWSLexInteractionKit) { spinner.startAnimating() } internal

    func interactionKit(onAudioPlaybackFinished _ : AWSLexInteractionKit) { spinner.stopAnimating() } Amazon Lex
  13. // Called after you finish speaking func interactionKit(_ interactionKit: AWSLexInteractionKit,

    switchModeInput: AWSLexSwitchModeInput, completionSource: AWSTaskCompletionSource<AWSLexSwitchModeResponse>?) { let switchModeResponse = AWSLexSwitchModeResponse() switchModeResponse.interactionMode = .speech switchModeResponse.sessionAttributes = switchModeInput.sessionAttributes completionSource?.set(result: switchModeResponse) if (switchModeInput.dialogState == .fulfilled) { // Check for slots guard let slots = switchModeInput.slots else { return } let toppings = slots["Toppings"] let drink = slots["Drink"] let address = slots["Address"] let intentName = switchModeInput.intent print("\(intentName.debugDescription) fulfilled. Toppings: \(toppings.debugDescription), Drink: \(drink.debugDescription), Address: \(address.debugDescription)"); } } Amazon Lex