Creating Conversational Interfaces on iOS - mDevCamp 2018

Conversational Interfaces on iOS

Hi, I’m Wendy!

Just Another Rather Very Intelligent System

1) Why conversational interfaces? 1) Why conversational interfaces? 3) In
your own apps 2) Examples of apps 4) Design best practices

Chatbots Voice Interfaces

Burden on the software, not the user

2) Examples of apps 1) Why conversational interfaces? 3) In
your own apps 2) Examples of apps 4) Design best practices

2) Examples of apps 1) Why conversational interfaces? 3) In
your own apps 2) Examples of apps 4) Design best practices 3) In your own apps

iOS Speech Recognition API Amazon Transcribe Google Cloud Speech-to-Text IBM
Watson Microsoft Speech API iOS Speech Recognition API

Server-side recognition iOS Speech Recognition API Internet connection required

Free, but not unlimited 1000 requests/hour/device 1 minute of audio/request
iOS Speech Recognition API

Pre-recorded or live audio iOS Speech Recognition API

Over 60 languages and dialects 60以上の国と⽅方⾔言 iOS Speech Recognition API

iOS 10+ iOS Speech Recognition API

Code example iOS Speech Recognition API

Google Cloud / Dialogflow Bing Speech API / Azure Bot
Service IBM Watson / Assistant Amazon Transcribe / Lex

Cost Platforms Free Trial Languages Accuracy (Speech-to- text) API Response
Time iOS Speech Recognition API Free iOS only n/a 63 n/a n/a Amazon Transcribe/ Lex $0.004/ request iOS, Android, Web 5000 requests/ month English + Spanish/ English only n/a n/a Google Cloud/ Dialogﬂow $0.0065/ request iOS, Android, Web 1000 requests/ day 100+/ 26 40% 2200 ms Bing Speech API/Azure Bot Service $0.004/ request iOS, Android, Web 5000 request/ month 34/ 18 28% 2500 ms IBM Watson/ Assistant $0.0025/ request iOS, Android, Web 10,000 requests 9 33% 4500 ms Accuracy and API response time referenced from: https://recast.ai/blog/benchmarking-speech-recognition-api/

Amazon Lex

Cost Platforms Free Trial Languages Accuracy (Speech-to- text) API Response
Time iOS Speech Recognition API Free iOS only n/a 63 n/a n/a Amazon Transcribe/ Lex $0.004/ request iOS, Android, Web 5000 requests/ month English + Spanish/ English only n/a n/a Google Cloud/ Dialogﬂow $0.0065/ request iOS, Android, Web 1000 requests/ day 100+/ 26 40% 2200 ms Bing Speech API/Azure Bot Service $0.004/ request iOS, Android, Web 5000 request/ month 34/ 18 28% 2500 ms IBM Watson/ Assistant $0.0025/ request iOS, Android, Web 10,000 requests 9 33% 4500 ms Accuracy and API response time referenced from: https://recast.ai/blog/benchmarking-speech-recognition-api/

Best Practices

Transparency!

Sensitive information

Be creative!

Additional Resources WWDC - Speech Recognition API Lex iOS Demo
https://developer.apple.com/videos/play/wwdc2016/509/ https://github.com/wendylu/TestLex Benchmarking Speech APIs https://recast.ai/blog/benchmarking-speech-recognition-api/

Thanks! Wendy Lu @wendyluwho

Code demo

func recognizeRecording() { guard let url = Bundle.main.url(forResource: "hi", withExtension:
"m4a") else { return } guard let recognizer = SFSpeechRecognizer() else { // Device or locale not supported return } if !recognizer.isAvailable { // Internet connection may not be available return } let request = SFSpeechURLRecognitionRequest(url: url) recognizer.recognitionTask(with: request) { (result, error) in guard let result = result else { return } print("result: \(result.bestTranscription.formattedString)") if result.isFinal { print("final result: \(result.bestTranscription.formattedString)") } } }

let audioEngine = AVAudioEngine() let speechRecognizer = SFSpeechRecognizer() let request
= SFSpeechAudioBufferRecognitionRequest() func startRecording() throws { let node = audioEngine.inputNode let recordingFormat = node.outputFormat(forBus: 0) node.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { [weak self] (buffer, _) in self?.request.append(buffer) } audioEngine.prepare() try audioEngine.start() speechRecognizer?.recognitionTask(with: request, resultHandler: { (result, error) in guard let result = result else { return } print("result: \(result.bestTranscription.formattedString)") }) } func stopRecording() { audioEngine.stop() request.endAudio() }

Amazon Lex

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplicationLaunchOptionsKey: Any]?) -> Bool
{ let credentialsProvider = AWSCognitoCredentialsProvider(regionType: AWSRegionType.USEast1, identityPoolId:"your-pool-id") let serviceConfiguration = AWSServiceConfiguration(region: AWSRegionType.USEast1, credentialsProvider:credentialsProvider) AWSServiceManager.default().defaultServiceConfiguration = serviceConfiguration let config = AWSLexInteractionKitConfig.defaultInteractionKitConfig(withBotName: "PizzaBot", botAlias:"Prod") // 5000 seconds before timeout config.noSpeechTimeoutInterval = 5000 config.maxSpeechTimeoutInterval = 5000 // We will use this key to retrieve the interaction kit in our view controller AWSLexInteractionKit.register(with: serviceConfiguration!, interactionKitConfiguration: config, forKey:"USEast1InteractionKit") return true } Amazon Lex

Listen for input Amazon Lex

let interactionKit = AWSLexInteractionKit(forKey: "USEast1InteractionKit") interactionKit.audioInAudioOut() Amazon Lex Amazon Lex

interactionKit.audioInTextOut() interactionKit.textInTextOut() interactionKit.textInAudioOut() Amazon Lex

internal func interactionKit(onAudioPlaybackStarted _ : AWSLexInteractionKit) { spinner.startAnimating() } internal
func interactionKit(onAudioPlaybackFinished _ : AWSLexInteractionKit) { spinner.stopAnimating() } Amazon Lex

// Called after you finish speaking func interactionKit(_ interactionKit: AWSLexInteractionKit,
switchModeInput: AWSLexSwitchModeInput, completionSource: AWSTaskCompletionSource<AWSLexSwitchModeResponse>?) { let switchModeResponse = AWSLexSwitchModeResponse() switchModeResponse.interactionMode = .speech switchModeResponse.sessionAttributes = switchModeInput.sessionAttributes completionSource?.set(result: switchModeResponse) if (switchModeInput.dialogState == .fulfilled) { // Check for slots guard let slots = switchModeInput.slots else { return } let toppings = slots["Toppings"] let drink = slots["Drink"] let address = slots["Address"] let intentName = switchModeInput.intent print("\(intentName.debugDescription) fulfilled. Toppings: \(toppings.debugDescription), Drink: \(drink.debugDescription), Address: \(address.debugDescription)"); } } Amazon Lex

Amazon Lex Amazon Lex

func interactionKit(_ interactionKit: AWSLexInteractionKit, onError error: Error) { interactionKit.audioInAudioOut() }
Amazon Lex

Creating Conversational Interfaces on iOS - mDe...

Creating Conversational Interfaces on iOS - mDevCamp 2018

More Decks by Wendy Lu

Other Decks in Technology

Featured

Transcript