Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Voice Conferencing Platform with Node.js, SIP.js, and WebRTC

mickeydread
August 07, 2012

Building a Voice Conferencing Platform with Node.js, SIP.js, and WebRTC

Tells the story of how Powwownow have used Node.js and components to build a resilient, high definition, high avilability, scalable and infinitely customisable conferencing platform running on commodity servers

mickeydread

August 07, 2012
Tweet

Other Decks in Technology

Transcript

  1. • UK based Conference Service Provider (CSP) • Founded 2003

    • 50+ employees • 8 million+ minutes a week • Turnover 2012 predicted $20m • Diversifying into UC • Telco and DID provider • powwownow.com
  2. • Server side development platform designed for building dynamic network

    applications • Built on Google’s V8 Javascript Engine • Adds Process, File and Network I/O APIs to V8 • Fast – V8 offers Javascript performance wins “for free” • Fast by Design – event driven non-blocking I/O -> high concurrency • Scalable - data-intensive real-time applications across distributed devices • Lightweight and efficient • Supports heavy I/O • Event loop - memory efficient
  3. • Not just for the browser • Expressive and powerful

    dynamic object-oriented general-purpose language • Prototypal (not “class”) inheritance • Functions are first class objects • Code is data
  4. • Multiple components functioning independently • Separate proprietary conference bridges

     No failover  Expensive  Difficult to manage  Impossible to customise  8000 Hz G.711only  Not scalable • SBCs – PJSIP based  Logic and media in same place  Difficult to upgrade and customise  No hot failover • IVR servers - Freeswitch  Very reliable  Completely scriptable
  5. • Separation of logic from audio components • “Dumb” audio

    mixers  Hot failover  Runs on inexpensive commodity servers  Nothing to manage  Small basic set of intrinsic behaviours  8000 to 32000 Hz  Highly scalable • SBC  Just an audio media relay + dsp  Removed all signalling and routing logic • “Brain”  Centralised logic and signalling  Management of SBCs and mixers in one place  Infinitely scriptable and customisable  Hot failover between instances • IVRs  Freeswitch is already perfect for our purposes!
  6. • Can choke under heavy load – unresponsive during garbage

    collection • No heavy duty computation • Just event/response • Async I/O model • Can mitigate by moving some of the core functionality into C compiled Node modules
  7. • Float both the IP address and the MAC address

     Avoid OS/device-specific ARP cache behaviour  Instant re-routing between physical devices on the “same switch” • macvlan kernel module to create virtual ethernet device • Brain uses VIP for SIP signalling, & control API • SBCs use VIPs for RTP / media
  8. • “Brain” creates compressed JSON objects containing everything needed to

    recreate:  each call - sent to the SBC carrying the call  each conference - sent to the mixer bridging the conference
  9. • If brain fails or is removed from service 

    no interruption in audio  all components retry connection to brain IP until success  standby brain takes the Virtual IP & MAC addresses
  10. • Need not be loaded as modules. Node gives us

    the ability to run external sandboxed scripts: • Expose application functions & data through sandbox object • Execute script to expose its functions & data to application though sandbox object
  11. • Modules are cached by Node, but you can: •

    The next time you require() the module, it is reloaded from disk • But what about data persistence?  Store module data in global shared object created in main file:
  12. • One TX thread + one RX thread per network

    device  can have multiple threads per device if CPU-bound (eg, transcoding) • Custom UDP/IP stack built on Linux's "PACKET_MMAP“  same mechanism used by pcap / tcpdump  but using "TX_RING" to send ethernet frames  avoids kernel context switching when sending/receiving packets  avoids BSD sockets entirely  create/remove VIPs, MACs, IP routes easily and instantly in user-space  Brain therefore has comprehensive control over network behaviour of SBCs • Custom built & modified kernel  remove all unneeded network layers / protocols from kernel processing  instant 30-40% increase in packet rate  custom IOCTL to force PACKET_MMAP to front of network processing chain  custom IOCTL to disable generation of ICMP UNREACHABLE • Gigabit saturation per CPU core / network device pair  7-figure packet rates on a $1500 server
  13. • Brain config defines port ranges for RTP/media  each

    port range represents a slice of audio "capacity“  SBCs can carry multiple port ranges • Brain gives each SBC some port ranges & associated VIPs • SBC <-> Brain TCP connection, with very low keep-alive timer • If SBC is removed from service (or crashes), the Brain:  removes port range from SBC  assigns port range to least-busy remaining SBC  audio interruption almost undetectable
  14. • SSE4 optimised  5000 x 8kHz calls mixed per

    CPU core  that's 1650 conferences of 3 calls each (worst case)  $1500 server hardware • Shared NFS mount for audio recordings / prompts • No need for VIPs  SBC is only device which talks audio to Mixer  Brain therefore updates SBCs with new Mixer IP upon failure • Almost undetectable failover  Any audio prompt currently playing would be cut off  Audio recording barely interrupted
  15. • Fully featured suite of libraries  Audio processing (VAD,

    AGC, Noise-sup, PLC, AEC, and more  Audio device management  Video encoding  Network transport • Based on GIPS but for free! • Aimed at desktop/mobile developers using API  BUT easy to commandeer specific low-level components for our purposes • Low-level implementation written by skilled DSP engineers  we've re-implemented a few key components in SSE4 for speed • Future client development  browser integration for Audio/Video, a leap above HTML5  High level API – can be used by web and mobile developers • Future video development?
  16. • Rapid prototyping and development • Can be clumsy 

    Loose typing can lead to object hell!  Nested call backs – needs to understand to avoid • Thriving developer community • 10,000+ Node modules available • Still relatively young but maturing fast • Complete scriptability is the future of telephony – not just for callflows or IVR
  17. • Opus codec • SBC already in production for 20%

    of calls • Bridge already in production for our UC product (codename Visualise) • Needs some refactoring of code  Extended prototype  Lessons learned • Will roll out for various services Q4 2012 • Expect to start full rollout Q1 2013 • Further integration into Visualise and customer console (MyPowwownow) • Open Source SIP.js wrapper • Look at Open Sourcing more