a PhD in CS or EE • Software Engineer - BS CS or CompEng, MS a plus, C++ or C • Speech UX Designer - varies: anthropology, BS CS, linguistics... • Speech Product Manager - varies: BS CS, MS CS, Linguistics, BS/MS EE
of a word or words that represent a single meaning to the computer. Utterances can be a single word, a few words, a sentence, or even multiple sentences.
units of sound in a specified language that distinguish one word from another. 44 in American English The word dog is made up of 3 phonemes: /d/-/o/-/g/
of words to recognize 2. Utterances are received in wave form 3. SE looks at features, compares against acoustic model using grammars to guide 4. Determines which words match best and returns a result
spectral features, pronunciation models, and prior context Collect lots of speech and transcribe all the words Train the model on the labeled speech How much speech was needed for one language for the Xbox Kinect?
Search Language Model Input Speech Recognize d Utterance • Signal is converted to a sequence of feature vectors based on spectral and temporal measurements • Acoustic models represent sub-word units, such as phonemes • Language model predicts the next set of words, and controls which models are hypothesized • Search is crucial to the system, since many combinations of words must be investigated to find most probable word sequence