Empowering Mobile Applications with Speech Recognition (CodeMobile UK 2018)

Empowering Mobile Applications with Speech Recognition (CodeMobile UK 2018)

Advances in artificial intelligence and machine learning in terms of understanding spoken language have triggered a proliferation of commercial applications and services that understand our voice.

The likes of Siri, Alexa, Google Now, and Cortana, are very impressive when it comes to understanding free form speech, but is it possible to take advantage of the powerful speech recognition behind such services in your mobile app?

This talk will explore the speech recognition process and what are the relevant APIs that can be used to empower mobile applications for improved accessibility and new ways of user interaction.

2f55ef3093479f2bb68b7ec5e1ead838?s=128

Mihai Cîrlănaru

April 03, 2018
Tweet

Transcript

  1. None
  2. Mihai Cîrlănaru Senior Engineering Team Lead

  3. None
  4. None
  5. None
  6. None
  7. None
  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. DEMO github.com/mihai/talk.js

  23. None
  24. Speech Recognition API bit.ly/AndroidSpeechAPI // Speech Recognition package // on

    Android (API level 8+) import android.speech.* // Relevant class class SpeechRecognizer {} // Getting a recognizer myRecognizer = SpeechRecognizer .createSpeechRecognizer(context) bit.ly/iOSSpeechAPI // Speech Recognition framework // on iOS (10.0+) import Speech // Relevant class class SFSpeechRecognizer {} // Getting a recognizer myRecognizer = SFSpeechRecognizer()
  25. Permissions bit.ly/AndroidSpeechAPI android.permission.RECORD_AUDIO // Required if performing recognition with //

    the cloud based service android.permission.INTERNET bit.ly/iOSSpeechAPI // Relevent authStatus: // .authorized // .denied // .restricted SFSpeechRecognizer.requestAuthorization() { (authStatus) in … }
  26. Start Recognition bit.ly/AndroidSpeechAPI // Requires a RecognizerIntent that // contains

    recognition parameters: // ­ ACTION_RECOGNIZE_SPEECH // sets its action // ­ EXTRA_LANGUAGE_PREFERENCE // language to recognize // … myRecognizer.startListening(intent) bit.ly/iOSSpeechAPI // Needs a Speech Recognition request, can be // SFSpeechAudioBufferRecognitionRequest // for live audio (with AVAudioEngine) or // SFSpeechURLRecognitionRequest // for recognizing from file myRecognizer .recognitionTask(with: request) { (result, error) in … }
  27. Get Recognition Results bit.ly/AndroidSpeechAPI // Catch results with onActivityResult @Override

    protected void onActivityResult( int requestCode, int resultCode, Intent data ) { //… switch (requestCode) { case REQ_CODE_SPEECH_INPUT: { if (resultCode == RESULT_OK && null != data ArrayList result = data .getStringArrayListExtra( RecognizerIntent.EXTRA_RESULTS ); System.out.println( result.get(0) ); } //… bit.ly/iOSSpeechAPI // Within recognitionTask handler guard let result = result else { // Recognition failed return } if result.isFinal { // Print the recognized speech print(result .bestTranscription .formattedString ) }
  28. Stop Recognition bit.ly/AndroidSpeechAPI myRecognizer.stopListening() bit.ly/iOSSpeechAPI // Stop the speech recognition

    request // Also, don't forget about the AVAudioEngine // instance (if live audio was used) // … request.stop()
  29. None
  30. Web Speech API bit.ly/WebSpeechAPI const SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;

    let recognizer = null; if (!SpeechRecognition) { console.error("Web Speech APIs not supported in your browser"); } else { recognizer = new SpeechRecognition(); }
  31. Speech Recognition Attributes bit.ly/WebSpeechAPI // Sets the language to be

    recognized (32 languages // supported, incl. Romanian) recognizer.lang = 'en­US'; // Get recognition results as early as possible, // even if they will change recognizer.interimResults = true; // Continuously listen to speech, regardless if // the user takes pauses or not recognizer.continuous = true; // The number of alternative recognition // matches to be returned recognizer.maxAlternatives = 3; …
  32. Speech Recognition Event Handlers bit.ly/WebSpeechAPI recognizer.onresult // Whenever a speech

    recognition match is found recognizer.onnomatch // When no match was found for the current speech recognizer.onerror // When an error occurred …
  33. Speech Recognition Control bit.ly/WebSpeechAPI recognizer.start(); recognizer.abort(); recognizer.stop();

  34. Speech Recognition Result format bit.ly/WebSpeechAPI // results[i][0].transcript ­­ top confidence

    transcription for result i { results: [ // SpeechRecognitionResultList [ // SpeechRecognitionResult { // SpeechRecognitionAlternative transcript: "hello", confidence: 0.8999, isFinal: false }, { // SpeechRecognitionAlternative transcript: "world", confidence: 0.4792, isFinal: false } ], ] … }
  35. Privacy bit.ly/WebSpeechAPI

  36. Browser Support caniuse.com/#feat=speech-recognition

  37. Best practices Handle failures Plan for short recordings (1 min

    limit) Remind the user your app is recording Do not perform speech recognition on private or sensitive information
  38. None