Applied ASR

P A G E Applied ASR Ben Langfeld @benlangfeld [email protected]
1

P A G E !2

P A G E !3

P A G E WHAT IS ASR? Automatic Speech Recognition
! Giving a computer system with an audio-based user interface the ability to “understand” human speech. !4

P A G E WHERE IS ASR USED? !5

P A G E WHERE IS ASR USED? •IVR !5

P A G E WHERE IS ASR USED? •IVR •Dictation
!5

•Accessibility systems !5

•Accessibility systems •Artificial intelligence !5

•Accessibility systems •Artificial intelligence •The Google search bar !5

P A G E ANATOMY OF AN ASR ENGINE !6

Endpointing

Endpointing Feature Extraction

Endpointing Feature Extraction Recognition

Endpointing Feature Extraction Recognition Natural Language Understanding

Endpointing Feature Extraction Recognition Natural Language Understanding Dialog Management

P A G E HOW RECOGNITION WORKS !7

P A G E HOW RECOGNITION WORKS •Searches recognition model
!7

•Acoustic models !7

•Acoustic models •Dictionary !7

•Acoustic models •Dictionary •Grammar !7

•Acoustic models •Dictionary •Grammar •Rule-based !7

•Acoustic models •Dictionary •Grammar •Rule-based •Statistical Language Model !7

•Acoustic models •Dictionary •Grammar •Rule-based •Statistical Language Model •Matches feature vectors to similarly represented models !7

P A G E NATURAL LANGUAGE UNDERSTANDING !8

P A G E NATURAL LANGUAGE UNDERSTANDING •Assign meaning to
human speech !8

human speech •Unstructured -> structured data !8

human speech •Unstructured -> structured data •Key-value !8

human speech •Unstructured -> structured data •Key-value •Maybe nesting !8

human speech •Unstructured -> structured data •Key-value •Maybe nesting •Either statistical or rule-based !8

human speech •Unstructured -> structured data •Key-value •Maybe nesting •Either statistical or rule-based •Rule-based = SISR (Semantic Interpretation in Speech Recognition) - http://www.w3.org/TR/ semantic-interpretation/ !8

human speech •Unstructured -> structured data •Key-value •Maybe nesting •Either statistical or rule-based •Rule-based = SISR (Semantic Interpretation in Speech Recognition) - http://www.w3.org/TR/ semantic-interpretation/ •Can get very complex very quickly !8

human speech •Unstructured -> structured data •Key-value •Maybe nesting •Either statistical or rule-based •Rule-based = SISR (Semantic Interpretation in Speech Recognition) - http://www.w3.org/TR/ semantic-interpretation/ •Can get very complex very quickly •One of the main components of so-called AI !8

P A G E WHAT SOLUTIONS ARE AVAILABLE? •Lumenvox (UniMRCP
& C API) •Nuance - complex and confusing product range •Vestec •AT&T Speech API (HTTP) •AT&T Watson •CMU Sphinx & PocketSphinx - open source !9

P A G E HOW TO GET STARTED QUICKLY? •Telephony
Dev Box •http://mojolingo.github.io/Telephony-Dev-Box •Asterisk & Lumenvox (MRCP) •FreeSWITCH & PocketSphinx •Adhearsion !10

P A G E GRAMMAR DEVELOPMENT MASTERCLASS !11

P A G E GRAMMAR DEVELOPMENT MASTERCLASS •SRGS - http://www.w3.org/TR/speech-grammar/
!11

•RubySpeech - github.com/benlangfeld/ ruby_speech !11

•RubySpeech - github.com/benlangfeld/ ruby_speech •Rules !11

•RubySpeech - github.com/benlangfeld/ ruby_speech •Rules •Tokens !11

•RubySpeech - github.com/benlangfeld/ ruby_speech •Rules •Tokens •Sequences !11

•RubySpeech - github.com/benlangfeld/ ruby_speech •Rules •Tokens •Sequences •Alternatives !11

•RubySpeech - github.com/benlangfeld/ ruby_speech •Rules •Tokens •Sequences •Alternatives •Repeats !11

•RubySpeech - github.com/benlangfeld/ ruby_speech •Rules •Tokens •Sequences •Alternatives •Repeats •Tags !11

P A G E TESTING AND TUNING •Recognition-testing - scripted
and small volume •Usability testing •Tuning •Project pilot phase •Call monitoring, call log analysis or UX research •In and out of grammar examples •Compare recognition with utterance •Dictionary tuning •Grammar probabilities and weighting !12

P A G E SPEECH RECOGNITION ON THE WEB !13
bit.ly/HTML5_Speech_Input_API talater.com/annyang

P A G E SPEAKER VERIFICATION !14

P A G E SPEAKER VERIFICATION •Enrolment -> Speech Model
!14

•Impostor model: combination of models of other speakers !14

•Impostor model: combination of models of other speakers •Identity claim -> compared to speech model and impostor model !14

P A G E QUESTIONS? @benlangfeld

Applied ASR

Applied ASR

More Decks by Ben Langfeld

Other Decks in Programming

Featured

Transcript