The New HTML Standards

The New HTML Standards Daniel C. Burnett Director of Speech
Technologies and Standards

Standards Prologue • Chair/co-chair of – W3C Voice Browser Working Group,
CCXML subgroup, SSML 1.0/1.1 subgroup, HTML Speech Incubator Group • Chief editor of – MRCPv2, CCXML, SSML 1.0 and 1.1, SRGS, VoiceXML 2.0, 2.1, and 3 • Editor/author of – SISR, PLS, EMMA, OneAPI • First full commercial implementations of – VoiceXML 2.0 and 2.1, CCXML, OneAPI

Who programs communication apps?" Communica)on Pla-orm Endpoints
Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls signaled and connected by humans

Who programs communication apps? Communica)on Pla-orm Endpoints
Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central oﬃce programmers Voice calls signaled and connected electronically

Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central oﬃce programmers Voice calls Phone company hardware DIDs *or* “automated agents” (AAs) Central oﬃce programmers The birth of Interactive Voice Response systems

Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central oﬃce programmers Voice calls Phone company hardware DIDs *or* “automated agents” (AAs) Central oﬃce programmers Voice calls Third-‐party hardware DIDs *or* AAs PBX programmers Programming moves closer to the end user

Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central oﬃce programmers Voice calls Phone company hardware DIDs *or* “automated agents” (AAs) Central oﬃce programmers Voice calls Third-‐party hardware DIDs *or* AAs PBX programmers Voice communicaGon PC/Mobile phone Logical endpoints Device/network programmers "Device" begins to lose meaning, and instead becomes "endpoint"

Apps created by Voice calls Wires and switchboards Dedicated individual devices (DIDs) -‐-‐-‐ Voice calls Phone company hardware DIDs Central oﬃce programmers Voice calls Phone company hardware DIDs *or* “automated agents” (AAs) Central oﬃce programmers Voice calls Third-‐party hardware DIDs *or* AAs PBX programmers Voice communicaGon PC/Mobile phone Logical endpoints Device/network programmers Voice/video communicaGon SoMware Logical endpoints XML/HTML, JS programmers All hell breaks loose in telecom

Communication technology trends Signaling Media Pla-orm Standards
Jacks and manual switches Analog voice Huh? Probably Cans + string == phone

Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Mainframes rock

Communication technology trends" Signaling Media Pla-orm Standards
Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 IEEE starts being more about E than E

Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 Protocol-‐level Anything Internet Protocol SIP/RTP, 3GPP, various IMS WHOA, where did that Internet thing come from?!?!!

Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 Protocol-‐level Anything Internet Protocol SIP/RTP, 3GPP, various IMS XML-‐level Anything IP CCXML Protocol-level programming is HARD

Jacks and manual switches Analog voice Huh? Probably Automated switches Analog voice Custom hardware Various ITU Electronic switching Digital voice Nortel, Periphonics, etc. SS7 Protocol-‐level Anything Internet Protocol SIP/RTP, 3GPP, various IMS XML-‐level Anything IP CCXML JavaScript libraries Anything HTML WebRTC/RTCWeb, RTP XML is no longer cool

Two standards eﬀorts • WebRTC/RTCWeb – just starting – Skype, Phono
• HTML Speech – one year old – Tropo

WebRTC/RTCWeb" • Standards for Skype, Phono • W3C WebRTC –
JavaScript API for browser-to- browser media • IETF RTCWeb – Protocol recommendations/ enhancements to support WebRTC • 90% overlap between group members

WebRTC • Goal: negotiate and manage media in JavaScript •
Three new objects/concepts – PeerConnection object, which contains – MediaStream objects, each of which contains – Track objects • Probably using SDP O/A to describe media • “Signaling” mostly in client code – ICE/STUN/TURN required for NAT/ﬁrewall traversal, but some may be client JS code • Big Question: How “simple” should the interface be?

WebRTC example • A rough direction . . . <input
type="button" value="Start" onclick="start()" id="startBtn"> <script> var pc = new PeerConnection('TURNS example.net', sendSignalingChannel); function sendSignalingChannel(message) { ... // send message to the other side via the signaling channel } function receiveSignalingChannel (message) { ... // call this whenever we get a message on the signaling channel } var startBtn = document.getElementById('startBtn'); function start() { navigator.getUserMedia('audio,video', gotStream); startBtn.disabled = true; } function gotStream(stream) { pc.addStream(stream); } </script>

RTCWeb • No special signaling needs • Real-time media carried
via RTP • RTP session == media stream – RTP multiplexing allows for multiple tracks – Allows for time synch for tracks, since in same session • Discussing data channel option (non-real-time) • Carried over TLS (some form)

Progress • RTCWeb – Fairly stable use cases, requirements, and architecture
– New drafts to enhance RTP for multiplexing media – Too much discussion of things in W3C domain • WebRTC – Editor's draft only – First Public Working Draft this month

Getting involved • WebRTC – Group: http://www.w3.org/2011/04/webrtc/ – List archive: http://lists.w3.org/Archives/ Public/public-webrtc
• RTCWeb – Group: http://tools.ietf.org/wg/rtcweb/ – List archive: http://www.ietf.org/mail-archive/ web/rtcweb/current/maillist.html • Both groups operate in public

Two standards eﬀorts • WebRTC/RTCWeb – just starting – Skype, Phono
• HTML Speech – one year old – Tropo

HTML Speech • Standards for Tropo • Happening in W3C
– Voxeo co-chairs • Incubator Group only for now – NOT A STANDARD!!!!!!!!!!!!!!!!!!!!!!!!!!! • Web world wants

HTML Speech – what it ain’t • NOT VoiceXML •
NOT IVR language • NOT dialog language • NOT telephony

HTML Speech – what it is • ASR/TTS added to
HTML • Three pieces – Web API – HTML extensions – Browser-to-ASR/TTS protocol

Web API • JavaScript API • SRGS/SISR, SSML, EMMA support
• Grammar optional • Events for audio/sound/speech start/end, plus results and errors • Default/remote speech services • One-shot and continuous recognition • Assumes local audio input/output available

Web API • Proposed: SpeechInputRequest object – Many attributes: uri, grammars,
on(audio/ sound/speech)(start/end), languages, maxresults, etc. – Methods: open, start, stop, abort • Proposed: SpeechOutputObject, but – May be superseded by simple TTS element

HTML Extensions • Link to "for" attribute • <reco> element
• <tts> element <form> <label for=“count”>How many would you like:</label> <input type=“number” id=“count”/> <reco for=“count” service="http://www.example.com/ASR"/> <tts src="Thankyou.ssml"/> </form>

Protocol • Communication between browser and ASR/ TTS service •
Based loosely on MRCP • Runs over WebSockets • Control and media mixed together in stream

Schedule and next steps • Incubator group will ﬁnish in
November • IT IS *NOT* A W3C STANDARD AT THIS POINT • So we might – Form a new W3C standards-track working group – Join an existing W3C standards-track working group, e.g. • WebApps WG • HTML WG • Key goal: a Standard that is really implemented

Getting involved • Group page: http://www.w3.org/2005/ Incubator/htmlspeech/ • Mailing list
archives: http://lists.w3.org/ Archives/Public/public-xg-htmlspeech/

Write this down • Communications are becoming browser- based (HTML
as platform) via standards • Speech technology access via browser is being standardized • Browsers are increasingly built into mobile devices • Ergo, all future voice and telephony apps will eventually be browser-based

Unlocked and Loaded Voxeo standards implementations UNLOCK your apps Join
us in designing good standards!

The New HTML Standards Daniel C. Burnett Director of Speech
Technologies and Standards Voxeo Customer Summit 2011

The New HTML Standards

The New HTML Standards

More Decks by Voxeo

Other Decks in Technology

Featured

Transcript