Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Empowering Mobile Applications with Speech Recognition (CodeMobile UK 2018)

Empowering Mobile Applications with Speech Recognition (CodeMobile UK 2018)

Advances in artificial intelligence and machine learning in terms of understanding spoken language have triggered a proliferation of commercial applications and services that understand our voice.

The likes of Siri, Alexa, Google Now, and Cortana, are very impressive when it comes to understanding free form speech, but is it possible to take advantage of the powerful speech recognition behind such services in your mobile app?

This talk will explore the speech recognition process and what are the relevant APIs that can be used to empower mobile applications for improved accessibility and new ways of user interaction.

Mihai Cîrlănaru

April 03, 2018
Tweet

More Decks by Mihai Cîrlănaru

Other Decks in Technology

Transcript

  1. Speech Recognition API bit.ly/AndroidSpeechAPI // Speech Recognition package // on

    Android (API level 8+) import android.speech.* // Relevant class class SpeechRecognizer {} // Getting a recognizer myRecognizer = SpeechRecognizer .createSpeechRecognizer(context) bit.ly/iOSSpeechAPI // Speech Recognition framework // on iOS (10.0+) import Speech // Relevant class class SFSpeechRecognizer {} // Getting a recognizer myRecognizer = SFSpeechRecognizer()
  2. Permissions bit.ly/AndroidSpeechAPI android.permission.RECORD_AUDIO // Required if performing recognition with //

    the cloud based service android.permission.INTERNET bit.ly/iOSSpeechAPI // Relevent authStatus: // .authorized // .denied // .restricted SFSpeechRecognizer.requestAuthorization() { (authStatus) in … }
  3. Start Recognition bit.ly/AndroidSpeechAPI // Requires a RecognizerIntent that // contains

    recognition parameters: // ­ ACTION_RECOGNIZE_SPEECH // sets its action // ­ EXTRA_LANGUAGE_PREFERENCE // language to recognize // … myRecognizer.startListening(intent) bit.ly/iOSSpeechAPI // Needs a Speech Recognition request, can be // SFSpeechAudioBufferRecognitionRequest // for live audio (with AVAudioEngine) or // SFSpeechURLRecognitionRequest // for recognizing from file myRecognizer .recognitionTask(with: request) { (result, error) in … }
  4. Get Recognition Results bit.ly/AndroidSpeechAPI // Catch results with onActivityResult @Override

    protected void onActivityResult( int requestCode, int resultCode, Intent data ) { //… switch (requestCode) { case REQ_CODE_SPEECH_INPUT: { if (resultCode == RESULT_OK && null != data ArrayList result = data .getStringArrayListExtra( RecognizerIntent.EXTRA_RESULTS ); System.out.println( result.get(0) ); } //… bit.ly/iOSSpeechAPI // Within recognitionTask handler guard let result = result else { // Recognition failed return } if result.isFinal { // Print the recognized speech print(result .bestTranscription .formattedString ) }
  5. Stop Recognition bit.ly/AndroidSpeechAPI myRecognizer.stopListening() bit.ly/iOSSpeechAPI // Stop the speech recognition

    request // Also, don't forget about the AVAudioEngine // instance (if live audio was used) // … request.stop()
  6. Web Speech API bit.ly/WebSpeechAPI const SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;

    let recognizer = null; if (!SpeechRecognition) { console.error("Web Speech APIs not supported in your browser"); } else { recognizer = new SpeechRecognition(); }
  7. Speech Recognition Attributes bit.ly/WebSpeechAPI // Sets the language to be

    recognized (32 languages // supported, incl. Romanian) recognizer.lang = 'en­US'; // Get recognition results as early as possible, // even if they will change recognizer.interimResults = true; // Continuously listen to speech, regardless if // the user takes pauses or not recognizer.continuous = true; // The number of alternative recognition // matches to be returned recognizer.maxAlternatives = 3; …
  8. Speech Recognition Event Handlers bit.ly/WebSpeechAPI recognizer.onresult // Whenever a speech

    recognition match is found recognizer.onnomatch // When no match was found for the current speech recognizer.onerror // When an error occurred …
  9. Speech Recognition Result format bit.ly/WebSpeechAPI // results[i][0].transcript ­­ top confidence

    transcription for result i { results: [ // SpeechRecognitionResultList [ // SpeechRecognitionResult { // SpeechRecognitionAlternative transcript: "hello", confidence: 0.8999, isFinal: false }, { // SpeechRecognitionAlternative transcript: "world", confidence: 0.4792, isFinal: false } ], ] … }
  10. Best practices Handle failures Plan for short recordings (1 min

    limit) Remind the user your app is recording Do not perform speech recognition on private or sensitive information