Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Say It Ain't So: Implementing Speech Recognition in your app

Marc Brown
September 01, 2016

Say It Ain't So: Implementing Speech Recognition in your app

SiriKit was one of the more talked about features announced at WWDC this year; unfortunately its initial implementation is limited to a small number of use cases. But all is not lost! Apple introduced a collection of general purpose Speech API's in iOS 10 that provide simple speech-to-text conversion from streaming voice or audio files in over 50 languages.

In this talk I will walk through the new API's, discuss its limitations and end with providing a practical use case by adding speech recognition to a text-based search app.

Presented @ try! Swift NYC: http://www.tryswiftnyc.com

Marc Brown

September 01, 2016
Tweet

More Decks by Marc Brown

Other Decks in Programming

Transcript

  1. A (Brief) History of Speech Recognition 1950s: Numbers + 10

    words 1960s: A few hundred words 1970s: 1k words, Beam Search
  2. A (Brief) History of Speech Recognition 1950s: Numbers + 10

    words 1960s: A few hundred words 1970s: 1k words, Beam Search 1980s: 20k words, HMM
  3. A (Brief) History of Speech Recognition 1950s: Numbers + 10

    words 1960s: A few hundred words 1970s: 1k words, Beam Search 1980s: 20k words, HMM 1990s: > Average human vocabulary
  4. A (Brief) History of Speech Recognition 1950s: Numbers + 10

    words 1960s: A few hundred words 1970s: 1k words, Beam Search 1980s: 20k words, HMM 1990s: > Average human vocabulary 2000s: LSTM, Smartphones
  5. "Hey Siri, send Rivers Cuomo 5 dollars on WeezerPay for

    recording the Blue Album" Domain: Payments Intent: sendPayment App: WeezerPay Payee: Rivers Cuomo Amount: 5 Currency: USD Note: Recording the Blue Album
  6. Supported Domains 1. Messaging 2. VOIP calling 3. Payments 4.

    Workouts 5. Ride booking 6. Photo search
  7. Speech Framework → Siri speech-to-text → Live audio or audio

    file → Recommended + alternative transcriptions
  8. Speech Framework → Siri speech-to-text → Live audio or audio

    file → Recommended + alternative transcriptions → 50+ languages & dialects
  9. Speech Recognizer // Specific language private let speechRecognizer = SFSpeechRecognizer(locale:

    Locale(identifier: "en-US")) // Native language private let speechRecognizer = SFSpeechRecognizer(locale: Locale.current)
  10. SFSpeechRecognizer.supportedLocales() ar-SA, ca-ES, cs-CZ, da-DK, de-AT, de-CH, de-DE, el- GR,

    en-AE, en-AU, en-CA, en-GB, en-ID, en-IE, en-IN, en-NZ, en-PH, en-SA, en-SG, en-US, en-ZA, es-CL, es- CO, es-ES, es-MX, es-US, fi-FI, fr-BE, fr-CA, fr-CH, fr-FR, he-IL, hr-HR, hu-HU, id-ID, it-CH, it-IT, ja-JP, ko-KR, ms-MY, nb-NO, nl-BE, nl-NL, pl-PL, pt-BR, pt-PT, ro- RO, ru-RU, sk-SK, sv-SE, th-TH, tr-TR, uk-UA, vi-VN, yue-CN, zh-CN, zh-HK, zh-TW
  11. Recognition Task private var recognitionTask: SFSpeechRecognitionTask? ... recognitionTask = speechRecognizer?.recognitionTask(with:

    recognitionRequest) { result, error in var isFinal = false if let result = result { print(result.bestTranscription.formattedString) isFinal = result.isFinal } if error != nil || isFinal { // 60 sec limit reached } }
  12. Capture Audio Stream private let audioEngine = AVAudioEngine() ... let

    recordingFormat = audioEngine.inputNode.outputFormat(forBus: 0) audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in self.recognitionRequest?.append(buffer) } audioEngine.prepare() try audioEngine.start()
  13. SFSpeechRecognizerDelegate import Speech public class ViewController: UIViewController, SFSpeechRecognizerDelegate { ...

    speechRecognizer?.delegate = self ... public func speechRecognizer(_ speechRecognizer: SFSpeechRecognizer, availabilityDidChange available: Bool) {
  14. UX Considerations → Choose your context wisely ! → Recording

    message " → International language support ꑣ