11_22_CEE_Singapore_2024_Presentation.pdf

Slide 1

Slide 1 text

Slide 2

Slide 2 text

Copyright © Parallel Inc. All Rights Reserved.  ● X / GitHub: @yoheimuta    ● Recent Focus: SRE, MySQL, Vitess, Flink     ● Career  ○ Mobile Game : Worked on SNS game launches at GREE   ○ AdTech: Built the LINE Ads Platform at FreakOut and LINE   Yohei Yoshimuta   Head of Engineering, Parallel  Speaker  

Slide 3

Slide 3 text

Copyright © Parallel Inc. All Rights Reserved.  Our goal   Enjoying It with Friends   Enjoying Entertainment Alone   Parallel brings friends together for shared experiences, aiming to be a central hub for all types of entertainment in a world where affordable, segmented content often drives solo consumption.  

Slide 4

Slide 4 text

Copyright © Parallel Inc. All Rights Reserved.  About Parallel: Your Hangout Place   Parallel is a hangout app where you can hop on, see close friends already online, join them instantly, and enjoy content together, both in and outside the app.    

Slide 5

Slide 5 text

Copyright © Parallel Inc. All Rights Reserved.  Achievements   With MAU in Japan now reaching levels similar to Twitch and Roblox, 70% Gen Z, 3-hour daily call times, over $20 million raised, and all with a team of just 16.   # of Teams 6

Slide 6

Slide 6 text

Copyright © Parallel Inc. All Rights Reserved.  How Parallel Embraces AI   Parallel’s key feature is being a space where young people spend hours with friends, now enhanced with AI that joins voice chats to act as a gaming partner, study buddy and more!       Users ..    - hang out with friends in groups     - spend around 180 mins a day on voice calls    - do everything from gaming to studying and watching videos, both on and off the platform     Parallel’s standout features   Our AI ..    - acts as a conversation partner, gaming buddy, study helper, etc     - keeps group conversations on track and offers ideas    - can take on roles like a poker dealer, energize the group, spin as a DJ, or even step in as a study tutor   Parallel with AI  

Slide 7

Slide 7 text

Copyright © Parallel Inc. All Rights Reserved.  Our Initial Approach with Realtime API   With the release of the Realtime API, our initial approach was to create a demo specifically centered around gaming as an example.   1. We use a Wake Word to call AI whenever users need.     2. Our AI acts as a chat buddy or game partner until friends arrive.     3. Our AI mediates conversations when in a group.     4. Our AI becomes the game commentator when people playing together.  

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Copyright © Parallel Inc. All Rights Reserved.  To enable AI-powered conversations, we have added:  - Agora Demo Server : Manages real-time communication with the OpenAI Realtime API.  - OpenAI Realtime API   System Overview   The core system is a group call application consisting of the following components:   - Parallel App  - Parallel API Server   - Agora Voice Server (SaaS)   Existing System   AI-Powered Conversations  

Slide 10

Slide 10 text

Copyright © Parallel Inc. All Rights Reserved.  Key Challenges in Implementing AI-Driven Group Conversations   While the API supports audio and text processing, achieving the following complex features wasn’t straightforward and required thoughtful customization and innovation to make it all work seamlessly.   1. Real-Time Group Conversation     2. Speaker Differentiation     3. Wake Word Detection     4. Real-Time Game State Synchronization  

Slide 11

Slide 11 text

Copyright © Parallel Inc. All Rights Reserved.  ● Solution   ○ Merge individual streams into a single mixed stream on the client side.    ● Implementation   ○ Used Agora’s onPlaybackAudioFrame to manage mixed audio streams.  AI facilitating real-time conversations with multiple users is essential for our application. A key challenge is that the Realtime API is not designed to support multiple audio streams from different users simultaneously.   Real-Time Group Conversation with AI  

Slide 12

Slide 12 text

Slide 13

Slide 13 text

Copyright © Parallel Inc. All Rights Reserved.  Speaker Identification   ● Solution   ○ Use speaker volume data to distinguish individual speakers    ● Implementation   ○ Leveraged Agora’s onAudioVolumeIndication to monitor volume levels for each participant   Identifying speakers and tailoring responses for each user is essential in group conversations, but the AI model can't distinguish speakers in a single mixed audio stream.    

Slide 14

Slide 14 text

Slide 15

Slide 15 text

Copyright © Parallel Inc. All Rights Reserved.  ● Solution   ○ Integrate wake word detection to activate the AI as needed.    ● Implementation   ○ Checked audio transcriptions for wake or sleep words  ○ If detected, VAD (Voice Activity Detection) is turned on or off  Wake Word Detection   To enable hands-free interaction, detecting wake word in real-time is crucial. However, the system needs to maintain session continuity while accurately identifying wake word.     📝VAD identifies speech in the audio stream. When off (default is on), the AI only responds when requested by the system client  

Slide 16

Slide 16 text

Copyright © Parallel Inc. All Rights Reserved.  Wake Word Detection System Flow   👍The Whisper Engine enables quick prototyping with real-time transcriptions.   🤖But a dedicated model will be needed in production for greater robustness and precision.  

Slide 17

Slide 17 text

Copyright © Parallel Inc. All Rights Reserved.  Real-Time Game Commentary   ● Solution   ○ Real-time game status updates are sent from the Parallel server to the AI as user text messages via the API.    ● Implementation   ○ Only the moves (where each player places a piece) are sent to the AI, rather than the entire board state.  In 1v1 games, conversations often dwindle, creating awkward silences. AI commentary helps maintain engagement, but relying on audio input alone is insufficient to fully convey the dynamic game state.   

Slide 18

Slide 18 text

Copyright © Parallel Inc. All Rights Reserved.  ⚠We learned that conveying the full board state in text led to misinterpretations.   ✅So we reduced the information to key moves and formatted it for clarity, using prompt engineering techniques.   Real-Time Game Commentary System Flow  

Slide 19

Slide 19 text

Copyright © Parallel Inc. All Rights Reserved.  ● Rapid Prototyping ⚡  ○ 1 engineer, 2 weeks prototype  ○ Previously, setting up AI-driven calls required complex infrastructure, and game commentary needed intricate rule-based systems.     ● Improved Tone Control 🛠  ○ AI responses sometimes lack tonal variety, especially in commentary, missing subtle cues like excitement or surprise.   ○ The recently added voices bring impressive tonal variety, offering natural and expressive delivery 🤯.  ○ Yet we’re still learning to tailor the prompt effectively as they were only just released.  Learnings and Future Enhancements  

Slide 20

Slide 20 text

Copyright © Parallel Inc. All Rights Reserved.  Key Takeaways   🧩Challenge 💡Solution ⚙Implementation 1. Real-Time Group Conversation Merge individual audio streams into a single mixed stream Used Agora’s onPlaybackAudioFrame for real-time audio management 2. Speaker Differentiation Identify active speaker based on volume data Leveraged onAudioVolumeIndication to detect and distinguish speakers 3. Wake Word Detection Enable AI activation via wake words Used Whisper Engine for transcription and VAD for response control 4. Real-Time Game State Synchronization Update AI with individual moves instead of full board Sent each move as a user message, refined through prompt engineering Scan the QR code to view the full presentation