CCXML subgroup, SSML 1.0/1.1 subgroup, HTML Speech Incubator Group • Chief editor of – MRCPv2, CCXML, SSML 1.0 and 1.1, SRGS, VoiceXML 2.0, 2.1, and 3 • Editor/author of – SISR, PLS, EMMA, OneAPI • First full commercial implementations of – VoiceXML 2.0 and 2.1, CCXML, OneAPI
Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central office programmers Voice calls signaled and connected electronically
Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central office programmers Voice calls Phone company hardware DIDs *or* “automated agents” (AAs) Central office programmers The birth of Interactive Voice Response systems
Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central office programmers Voice calls Phone company hardware DIDs *or* “automated agents” (AAs) Central office programmers Voice calls Third-‐party hardware DIDs *or* AAs PBX programmers Programming moves closer to the end user
Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 IEEE starts being more about E than E
Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 Protocol-‐level Anything Internet Protocol SIP/RTP, 3GPP, various IMS WHOA, where did that Internet thing come from?!?!!
Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 Protocol-‐level Anything Internet Protocol SIP/RTP, 3GPP, various IMS XML-‐level Anything IP CCXML Protocol-level programming is HARD
Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 Protocol-‐level Anything Internet Protocol SIP/RTP, 3GPP, various IMS XML-‐level Anything IP CCXML JavaScript libraries Anything HTML WebRTC/RTCWeb, RTP XML is no longer cool
JavaScript API for browser-to- browser media • IETF RTCWeb – Protocol recommendations/ enhancements to support WebRTC • 90% overlap between group members
Three new objects/concepts – PeerConnection object, which contains – MediaStream objects, each of which contains – Track objects • Probably using SDP O/A to describe media • “Signaling” mostly in client code – ICE/STUN/TURN required for NAT/firewall traversal, but some may be client JS code • Big Question: How “simple” should the interface be?
type="button" value="Start" onclick="start()" id="startBtn"> <script> var pc = new PeerConnection('TURNS example.net', sendSignalingChannel); function sendSignalingChannel(message) { ... // send message to the other side via the signaling channel } function receiveSignalingChannel (message) { ... // call this whenever we get a message on the signaling channel } var startBtn = document.getElementById('startBtn'); function start() { navigator.getUserMedia('audio,video', gotStream); startBtn.disabled = true; } function gotStream(stream) { pc.addStream(stream); } </script>
via RTP • RTP session == media stream – RTP multiplexing allows for multiple tracks – Allows for time synch for tracks, since in same session • Discussing data channel option (non-real-time) • Carried over TLS (some form)
– New drafts to enhance RTP for multiplexing media – Too much discussion of things in W3C domain • WebRTC – Editor's draft only – First Public Working Draft this month
• RTCWeb – Group: http://tools.ietf.org/wg/rtcweb/ – List archive: http://www.ietf.org/mail-archive/ web/rtcweb/current/maillist.html • Both groups operate in public
• Grammar optional • Events for audio/sound/speech start/end, plus results and errors • Default/remote speech services • One-shot and continuous recognition • Assumes local audio input/output available
on(audio/ sound/speech)(start/end), languages, maxresults, etc. – Methods: open, start, stop, abort • Proposed: SpeechOutputObject, but – May be superseded by simple TTS element
• <tts> element <form> <label for=“count”>How many would you like:</label> <input type=“number” id=“count”/> <reco for=“count” service="http://www.example.com/ASR"/> <tts src="Thankyou.ssml"/> </form>
November • IT IS *NOT* A W3C STANDARD AT THIS POINT • So we might – Form a new W3C standards-track working group – Join an existing W3C standards-track working group, e.g. • WebApps WG • HTML WG • Key goal: a Standard that is really implemented
as platform) via standards • Speech technology access via browser is being standardized • Browsers are increasingly built into mobile devices • Ergo, all future voice and telephony apps will eventually be browser-based