Extend User Experience of WebRTC with Cool Sensor Devices
Cloud Expo 2017, original title was "Extend User Experience of WebRTC with Unique Sensor Devices"
Using WebRTC with 360 Camera, Microphone Array, 3D scan by RealSense, holographic devices such as Dreamoc HD3 and HoloLens
About myself • Masashi Ganeko / @massie_g – Manager of a research team – INFOCOM CORPORATION (from Japan, Tokyo) • http://infocom.co.jp/english/index.html • One of Organizers of WebRTC Meetup Tokyo – https://atnd.org/groups/webrtc • English Presentation for WebRTC (2013-2017) – https://speakerdeck.com/mganeko • Japanese Presentation for WebRTC (2013-2017) – http://www.slideshare.net/mganeko 3
What is WebRTC • Web Real-time Communication for – Video – Audio – Data • Open standard – W3C WebRTC Working Group ... API – IETF RTCWEB Working Group … Protocol – Core library is open source software • Designed for Web Browsers, and other web connected devices • Easy to combine with other Web technologies 10101110100…
What I want to talk about today • WebRTC is a very useful tool to build your own communication application • WebRTC + Sensor Devices à more Interesting & Exciting user experience • Introduce two experimental projects, to show “the Power of WebRTC” – Shotoku-Tamago – Virtual Teleport
Problem in Web Meeting • Web Meeting is very common – It works pretty well for 1 to 1 – It works for 3 or 4 distributed members • But it is poor experience for a meeting, – between a group and 1 remote member – Hard to understand who is speaking, from a remote member ? ? ?
Purpose of Shotoku-Tamago • Improve experience of remote member, – at the meeting between a group and 1 person • Make easy to understand: – who is/are speaking – their expression, such as smiling, angry, happy, disappointed, … • With not expensive devices • With fixed camera, without manual operation
Cool sensor devices in Shotoku-Tamago • RICOH THETA S (360 Camera) – Dual fisheye lenses – Capture the whole area of the meeting room at once, without swinging or moving • SYSTEM IN FRONTIER TAMAGO-03 – Egg Shaped Microphone Array – Locates and tracks who are speaking automatically http://www.sifi.co.jp/system/modules/pico/index.php?content_id=39&ml_lang=en
Origin of Name: Shotoku Taishi / • Legendary Prince of Japan, AD. 600 • Many episodes • Some might be true, some might not be • One of the most famous episodes: • When 10 people were talking to him at the same time, he could understand each one’s talk. • So, He is knows as “prince with multiple ears”. • Shotoku-Tamago
Origin of Name: Shotoku Taishi / • Legendary Prince of Japan, AD. 600 • Many episodes • Some might be true, some might not be • One of the most famous episodes: • When 10 people were talking to him at the same time, he could understand each one’s talk. • So, He is knows as “prince with multiple ears”. • Shotoku-Tamago “Egg” in Japanese
Whole architecture of Shotoku-Tamago Web Browser Web Browser Web Browser Web Browser Video/Audio media 360 Video/ mono Audio Video/Audio Direction of speaking member WebSocket WebRTC Render with WebGL
1. Detecting who are speaking HARK - Robot Audition Software http://www.hark.jp/ • By Honda Research Institute Japan with Kyoto University • Royalty free for research use Microphone array - consists of 8 small microphones - work with HARK Using “source tracker” of HARK tool, to locate and track speaking members
2. Connecting HARK tool and Web Browser Web Browser Web Browser Video/Audio media 360 Video/ mono Audio Video/Audio WebSocket WebRTC Web Browser Web Browser Direction of speaking member
2. Connecting HARK tool and Web Browser • HARK tool is command line standalone native app. • It is not possible to send data from HARK tool to a Web Browser directly. • Write a pipe tool, with Go-lang as WebSocket server. HARK tool Standalone native app Web Browser Web Browser USB stdout stdin Convert tool As WebSocket Server WebSocket
4. capturing 360 video Video/Audio WebSocket Direction of speaking member Web Browser Web Browser Video/Audio media WebRTC 360 Video/ mono Audio Web Browser Web Browser mediaDevices.getUserMedia() Dual-fisheye format Video
5. sending 360 video with WebRTC Video/Audio WebSocket Direction of speaking member 360 Video/ mono Audio Web Browser Web Browser Web Browser Web Browser Video/Audio media WebRTC Dual-fisheye format Video
Web Browser Web Browser 6. rendering 360 video with WebGL https://github.com/ricohapi/video-streaming-sample-app/tree/master/samples/oneway-watch RICOH sample Dual-fisheye format Video Map to sphere, with UV mapping Render with WebGL (three.js)
Web Browser Web Browser 7. Cropping members face who are speaking Which areas to crop are decided by sound direction located with HARK Sphere of 360 video Up to 5 WebGL cameras Up to 5 canvas elements
Whole architecture of Shotoku-Tamago (again) Web Browser Web Browser Web Browser Web Browser Video/Audio media 360 Video/ mono Audio Video/Audio Direction of speaking member WebSocket WebRTC Render with WebGL
Power of WebRTC in Shotoku-Tamago • Easy to handle 360 video with WebGL – Use VR technology in a web browser • Easy to utilize real-time data of sensor devices with WebSocket – data from a sensor device, such as microphone array – data processed by signal process software, such as HARK • Makes web meeting much more vivid, by combination of all of these technologies
Virtual Teleport • Real-time communication tool with – Forward: Real-time 3D scanned Hologram – Backward: 360video • Demonstrated in AppsJapan exhibition of Interop Tokyo June 2017. (140,000 visitors / 3days) – More than 800 guests enjoyed the new experience with Holographic communication • Referred in Web Media of Japan – http://www.watch.impress.co.jp/headline/docs/extra/vr/1064673.html
Challenge in Virtual Teleport • Communication with Web meeting today – Only 2D videos of faces are transferred • Try Future communication with Virtual Teleport – Transfer your existence to remote place – Show your whole body in 3D Hologram, such as “STAR WARS”
Cool devices in Virtual Teleport • Real-time 3D scan device – Intel RealSense R200 • https://www.intel.com/content/www/us/en/support/emerging-technologies/intel-realsense-technology/000016214.html • Depth Camera (IR Laser Projector, Dual IR Camera) • Holographic Display devices – Dreamoc HD3 • https://www.realfiction.com/solutions/dreamoc-hd3 – Microsoft HoloLens • https://www.microsoft.com/en-us/hololens IR Laser Projector IR Camera RGB Camera
1. Capturing 3D in point cloud data • Capture with RealSense, from 4 directions – Data is called as “Point Cloud”, a set of 3D points • Merge 4 sets of point cloud from 4 directions – Shown in 4 different colors in the right figure
Merging 4 directions • Using multiple depth cameras is not easy – Each camera projects IR Laser pattern – Multiple patterns collide usually IR Laser Projector 1 • Intel RealSense R200 can avoid collision – With libRealsense – https://github.com/IntelRealSense/librealsense IR Laser Projector 2
Processing point cloud data • Reduce points – to support HoloLens (not so powerful) – to control network bitrate (< 100Mbps) • Remove noise – remove splattered points • Make 3D mesh object – find triangles for polygon – connect polygons to make mesh – repair holes of mesh – reduce polygons of mesh 1.7 M points à 15 K points 26 K polygons à 5.5 K polygons
Point Cloud Library • a standalone, large scale, open project for 2D/3D image and point cloud processing. – http://pointclouds.org/ • Development is active, after Kinect V1 released • Using with libRealsense – https://github.com/lebronzhang/pcl – https://github.com/lebronzhang/pcl/blob/master/visualization/tools/real_sense_viewer.cpp
• Data – 3D mesh data • build from point cloud of 4 IR camera – Texture of RGB camera • 4 jpeg images, 640 x 480 – UV Map • How to map texture to mesh • Convert different coordinate system – PCL … Right-handed coordinate system – Unity … Left-handed coordinate system • 2 – 3 frames / sec - 1 MB / frame - about 20 M bits / sec Inside of 3D data
Building Dreamoc HD3 App • Build with Unity C# for Windows app HDMI • Use 3 camera and 3 image for 3 mirrors – Front view, Left view, Right view • Unity Asset: WebRTC Network – https://www.assetstore.unity3d.com/en/#!/content/47846 DataChannel
Building HoloLens App • Use Unity for 3D programming with C# – use MixedRealityToolkit-Unity (a.k.a HoloToolKit-Unity) • position detection, gesture detection – export Visual Studio project • Use Visual Studio 2017 to build Windows 10 UWP app – UWP: Universal Windows Platform (Store App)
Demo Video 2. Hologram on HoloLens • https://lab.infocom.co.jp/2017/06/appsjapan2017.html – https://lab.infocom.co.jp/2017/06/12/VirtualTeleport_Mixed_640.mp4
Demo Video 2. Hologram on HoloLens • https://lab.infocom.co.jp/2017/06/appsjapan2017.html – https://lab.infocom.co.jp/2017/06/12/VirtualTeleport_Mixed_640.mp4
Real-time Hologram is not perfect yet • Many holes and bumps in 3D mesh object – Algorithm of real-time point reducing and polygon detection is not mature yet – But it will be improved soon with machine learning • Not smooth motion – CPU power is not enough to handle high frame rate • But CPU / GPU is improved year by year – Bitrate is too high in case of transfering many frames per second • But 3D data compression method is coming, such as Draco I believe that it will be improved in 2 – 3 years.
Backward: 360video with multiple displays • 360 camera to capture video • Multiple displays to cover wide view (about 180) – Synchronized Scroll as one large screen – NO VR headset, NOT to hide face • WebRTC for video / audio • WebGL / Three.js for rendering DO NOT Use, NOT to hide face ݱࡏ͜ͷΠϝʔδΛද ࣔͰ͖·ͤΜɻ 3 (or more) displays as 1 large screen MediaStream Synchronize direction with WebSocket
Demo Video 3. Wide video with multiple display https://lab.infocom.co.jp/2017/06/appsjapan2017.html https://lab.infocom.co.jp/2017/06/01/02948838aa4562da5ebc936e8e11550f097db0b0.gif
Power of WebRTC in Virtual Teleport • Make “Holographic communication” possible, today – Transfer 3D data in real-time over WebRTC DataChannel – With 3D scan camera, such as RealSense – With Holographic device, such as Dreamoc HD3 and HoloLens • Holographic communication is very attractive experience – even with rough model and not smooth motion • Real-time 3D scan with depth camera is evolving rapidly – There are may useful Open Source Software – Machine learning may improve 3D scan stunningly • WebRTC works in many platforms, as well as Web Browser – Linux C++ app – Windows Unity C# app
Conclusion • WebRTC is very powerful, because easy to combine with – many Web technologies, such as WebSocket, WebGL – many open source software, such as three.js, PCL • It is possible to make exciting user experience – with many cool sensor devices and display devices • I hope you will build your own great application with WebRTC!