Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The New HTML Standards

Voxeo
October 17, 2011

The New HTML Standards

Dan Burnett, Voxeo Director of Speech Technologies and Standards

Voxeo

October 17, 2011
Tweet

More Decks by Voxeo

Other Decks in Technology

Transcript

  1. Standards Prologue •  Chair/co-chair of – W3C Voice Browser Working Group,

    CCXML subgroup, SSML 1.0/1.1 subgroup, HTML Speech Incubator Group •  Chief editor of – MRCPv2, CCXML, SSML 1.0 and 1.1, SRGS, VoiceXML 2.0, 2.1, and 3 •  Editor/author of – SISR, PLS, EMMA, OneAPI •  First full commercial implementations of – VoiceXML 2.0 and 2.1, CCXML, OneAPI
  2. Who programs communication apps?" Communica)on   Pla-orm   Endpoints  

    Apps  created  by   Voice  calls   Wires  and   switchboards   Dedicated  individual   devices  (DIDs)   -­‐-­‐-­‐   Voice calls signaled and connected by humans
  3. Who programs communication apps? Communica)on   Pla-orm   Endpoints  

    Apps  created  by   Voice  calls   Wires  and   switchboards   Dedicated  individual   devices  (DIDs)   -­‐-­‐-­‐   Voice  calls   Phone  company   hardware   DIDs   Central  office   programmers   Voice calls signaled and connected electronically
  4. Who programs communication apps? Communica)on   Pla-orm   Endpoints  

    Apps  created  by   Voice  calls   Wires  and   switchboards   Dedicated  individual   devices  (DIDs)   -­‐-­‐-­‐   Voice  calls   Phone  company   hardware   DIDs   Central  office   programmers   Voice  calls   Phone  company   hardware   DIDs  *or*   “automated   agents”  (AAs)   Central  office   programmers   The birth of Interactive Voice Response systems
  5. Who programs communication apps? Communica)on   Pla-orm   Endpoints  

    Apps  created  by   Voice  calls   Wires  and   switchboards   Dedicated  individual   devices  (DIDs)   -­‐-­‐-­‐   Voice  calls   Phone  company   hardware   DIDs   Central  office   programmers   Voice  calls   Phone  company   hardware   DIDs  *or*   “automated   agents”  (AAs)   Central  office   programmers   Voice  calls   Third-­‐party   hardware   DIDs  *or*  AAs   PBX  programmers   Programming moves closer to the end user
  6. Who programs communication apps? Communica)on   Pla-orm   Endpoints  

    Apps  created  by   Voice  calls   Wires  and   switchboards   Dedicated  individual   devices  (DIDs)   -­‐-­‐-­‐   Voice  calls   Phone  company   hardware   DIDs   Central  office   programmers   Voice  calls   Phone  company   hardware   DIDs  *or*   “automated   agents”  (AAs)   Central  office   programmers   Voice  calls   Third-­‐party   hardware   DIDs  *or*  AAs   PBX  programmers   Voice   communicaGon   PC/Mobile  phone   Logical  endpoints   Device/network   programmers   "Device" begins to lose meaning, and instead becomes "endpoint"
  7. Who programs communication apps? Communica)on   Pla-orm   Endpoints  

    Apps  created  by   Voice  calls   Wires  and   switchboards   Dedicated  individual   devices  (DIDs)   -­‐-­‐-­‐   Voice  calls   Phone  company   hardware   DIDs   Central  office   programmers   Voice  calls   Phone  company   hardware   DIDs  *or*   “automated   agents”  (AAs)   Central  office   programmers   Voice  calls   Third-­‐party   hardware   DIDs  *or*  AAs   PBX  programmers   Voice   communicaGon   PC/Mobile  phone   Logical  endpoints   Device/network   programmers   Voice/video   communicaGon   SoMware   Logical  endpoints   XML/HTML,  JS   programmers   All hell breaks loose in telecom
  8. Communication technology trends Signaling   Media   Pla-orm   Standards

      Jacks  and  manual   switches   Analog  voice   Huh?   Probably   Cans + string == phone
  9. Communication technology trends Signaling   Media   Pla-orm   Standards

      Jacks  and  manual   switches   Analog  voice   Huh?   Probably   Automated   switches   Analog  voice   Custom  hardware   Various  ITU   Mainframes rock
  10. Communication technology trends" Signaling   Media   Pla-orm   Standards

      Jacks  and  manual   switches   Analog  voice   Huh?   Probably   Automated   switches   Analog  voice   Custom  hardware   Various  ITU   Electronic  switching   Digital  voice   Nortel,  Periphonics,   etc.   SS7   IEEE starts being more about E than E
  11. Communication technology trends Signaling   Media   Pla-orm   Standards

      Jacks  and  manual   switches   Analog  voice   Huh?   Probably   Automated   switches   Analog  voice   Custom  hardware   Various  ITU   Electronic  switching   Digital  voice   Nortel,  Periphonics,   etc.   SS7   Protocol-­‐level   Anything   Internet  Protocol   SIP/RTP,  3GPP,   various  IMS   WHOA, where did that Internet thing come from?!?!!
  12. Communication technology trends Signaling   Media   Pla-orm   Standards

      Jacks  and  manual   switches   Analog  voice   Huh?   Probably   Automated   switches   Analog  voice   Custom  hardware   Various  ITU   Electronic  switching   Digital  voice   Nortel,  Periphonics,   etc.   SS7   Protocol-­‐level   Anything   Internet  Protocol   SIP/RTP,  3GPP,   various  IMS   XML-­‐level   Anything   IP   CCXML   Protocol-level programming is HARD
  13. Communication technology trends Signaling   Media   Pla-orm   Standards

      Jacks  and  manual   switches   Analog  voice   Huh?   Probably   Automated   switches   Analog  voice   Custom  hardware   Various  ITU   Electronic  switching   Digital  voice   Nortel,  Periphonics,   etc.   SS7   Protocol-­‐level   Anything   Internet  Protocol   SIP/RTP,  3GPP,   various  IMS   XML-­‐level   Anything   IP   CCXML   JavaScript  libraries   Anything   HTML   WebRTC/RTCWeb,   RTP   XML is no longer cool
  14. WebRTC/RTCWeb" •  Standards for Skype, Phono •  W3C WebRTC –

    JavaScript API for browser-to- browser media •  IETF RTCWeb – Protocol recommendations/ enhancements to support WebRTC •  90% overlap between group members
  15. WebRTC •  Goal: negotiate and manage media in JavaScript • 

    Three new objects/concepts – PeerConnection object, which contains – MediaStream objects, each of which contains – Track objects •  Probably using SDP O/A to describe media •  “Signaling” mostly in client code – ICE/STUN/TURN required for NAT/firewall traversal, but some may be client JS code •  Big Question: How “simple” should the interface be?
  16. WebRTC example •  A rough direction . . . <input

    type="button" value="Start" onclick="start()" id="startBtn"> <script> var pc = new PeerConnection('TURNS example.net', sendSignalingChannel); function sendSignalingChannel(message) { ... // send message to the other side via the signaling channel } function receiveSignalingChannel (message) { ... // call this whenever we get a message on the signaling channel } var startBtn = document.getElementById('startBtn'); function start() { navigator.getUserMedia('audio,video', gotStream); startBtn.disabled = true; } function gotStream(stream) { pc.addStream(stream); } </script>
  17. RTCWeb •  No special signaling needs •  Real-time media carried

    via RTP •  RTP session == media stream – RTP multiplexing allows for multiple tracks – Allows for time synch for tracks, since in same session •  Discussing data channel option (non-real-time) •  Carried over TLS (some form)
  18. Progress •  RTCWeb – Fairly stable use cases, requirements, and architecture

    – New drafts to enhance RTP for multiplexing media – Too much discussion of things in W3C domain •  WebRTC – Editor's draft only – First Public Working Draft this month
  19. Getting involved •  WebRTC – Group: http://www.w3.org/2011/04/webrtc/ – List archive: http://lists.w3.org/Archives/ Public/public-webrtc

    •  RTCWeb – Group: http://tools.ietf.org/wg/rtcweb/ – List archive: http://www.ietf.org/mail-archive/ web/rtcweb/current/maillist.html •  Both groups operate in public
  20. HTML Speech •  Standards for Tropo •  Happening in W3C

    – Voxeo co-chairs •  Incubator Group only for now – NOT A STANDARD!!!!!!!!!!!!!!!!!!!!!!!!!!! •  Web world wants
  21. HTML Speech – what it ain’t •  NOT VoiceXML • 

    NOT IVR language •  NOT dialog language •  NOT telephony
  22. HTML Speech – what it is •  ASR/TTS added to

    HTML •  Three pieces – Web API – HTML extensions – Browser-to-ASR/TTS protocol
  23. Web API •  JavaScript API •  SRGS/SISR, SSML, EMMA support

    •  Grammar optional •  Events for audio/sound/speech start/end, plus results and errors •  Default/remote speech services •  One-shot and continuous recognition •  Assumes local audio input/output available
  24. Web API •  Proposed: SpeechInputRequest object – Many attributes: uri, grammars,

    on(audio/ sound/speech)(start/end), languages, maxresults, etc. – Methods: open, start, stop, abort •  Proposed: SpeechOutputObject, but – May be superseded by simple TTS element
  25. HTML Extensions •  Link to "for" attribute •  <reco> element

    •  <tts> element <form> <label for=“count”>How many would you like:</label> <input type=“number” id=“count”/> <reco for=“count” service="http://www.example.com/ASR"/> <tts src="Thankyou.ssml"/> </form>
  26. Protocol •  Communication between browser and ASR/ TTS service • 

    Based loosely on MRCP •  Runs over WebSockets •  Control and media mixed together in stream
  27. Schedule and next steps •  Incubator group will finish in

    November •  IT IS *NOT* A W3C STANDARD AT THIS POINT •  So we might – Form a new W3C standards-track working group – Join an existing W3C standards-track working group, e.g. • WebApps WG • HTML WG •  Key goal: a Standard that is really implemented
  28. Getting involved •  Group page: http://www.w3.org/2005/ Incubator/htmlspeech/ •  Mailing list

    archives: http://lists.w3.org/ Archives/Public/public-xg-htmlspeech/
  29. Write this down •  Communications are becoming browser- based (HTML

    as platform) via standards •  Speech technology access via browser is being standardized •  Browsers are increasingly built into mobile devices •  Ergo, all future voice and telephony apps will eventually be browser-based
  30. The New HTML Standards Daniel C. Burnett Director of Speech

    Technologies and Standards Voxeo Customer Summit 2011