Behind the Stream - How AbemaTV Engineers Build Video Apps at Scale
This presentation describes how AbemaTV engineers develop their video apps with limited resources while ensuring software quality, against the scale of ABEMA's expanding business model.
1 Behind the Stream How AbemaTV Engineers Build Video Apps at Scale 2026 January 15th Tokyo Video Tech #11 Orbit Yusuke Goto (五藤 佑典) Developer Expert for Video Technology at CyberAgent, Inc. Director of Device Engineering at AbemaTV, Inc
@ygoto3_ • Majored in Graphic Design at California State University, San Bernardino • CyberAgent Developer Expert at CyberAgent, Inc. • Director of Device Engineering at AbemaTV, Inc. Career History 1. Graphic / Web Designer 2. Marketer 3. Web Engineer 4. Video Engineer 5. Product Manager 6. Director of Engineering
& video Allowing you to enjoy both 24/7 linear streaming and video on demand that allows you to watch missed or exclusive content at your preferred time without registration. Largest collection of original episodes in Japan Offering over 30,000 episodes available for streaming at any given time with the largest* collection of original episodes among domestic video services. (*as of January 2022, based on in-house research) 100% professional content Delivering high-quality content through a production system that leverages the respective strengths of CyberAgent and TV Asahi. Diverse lineup Broadcasting approximately 20 channels of diverse genres 24/7, including Japan’s only 24-hour news channel, as well as original dramas, romance programs, anime, sports, and more.
Fully free service offering linear broadcasts + advertising • Access to past episodes was partially paid Transition to freemium model (hybrid of FAST + SVOD) Introduction of AVOD offerings Segmented ad delivery implementation Addition of Pay-Per-View models Implementation of programmatic advertising Introduction of split-screen advertising format Launch of partnership programs Release of advertisement-supported premium subscription plan 2016 2017 2024 2018 2023 2021 2020
do was obvious • There's no VOD functionality => Let’s implement it! • There are no advertising features in VOD => Let’s implement it! • There’s no favorite list functionality => Let’s implement it! • There’s no start over playback functionality => Let’s implement it!
do was obvious • There's no VOD functionality => Let’s implement it! • There are no advertising features in VOD => Let’s implement it! • There’s no favorite list functionality => Let’s implement it! • There’s no start over playback functionality => Let’s implement it! Speed is paramount
were Implemented … • Gradually increased opportunities to question which features are most needed for business growth at this moment • The “speed is paramount” had accumulated technical debt, slowing development velocity
… • Gradually increased opportunities to question which features are most needed for business growth at this moment • The “speed is paramount” had accumulated technical debt, slowing development velocity Gradually shifted focus on key performance factors of experience at ABEMA
at ABEMA • Navigation flow efficiency • Natural conversion points for payment initiation • Advertising revenue generation • User viewing duration How can we focus on them?
at ABEMA • Navigation flow efficiency • Natural conversion points for payment initiation • Advertising revenue generation • User viewing duration How can we focus on them? Transitioning toward a development structure focused on the key performance factors
Key Service Factors App Development Group Server-side development teams Client-side development teams 2016 Navigation Development Group Payment Development Group Video Engineering Group 2026
Key Service Factors AbemaTV in 2026 Navigation Development Group Ad Tech Group Server-side Client-side Server-side Payment Development Group Server-side Client-side Video Engineering Group Server-side Client-side
Key Service Factors AbemaTV in 2026 Navigation Development Group Ad Tech Group Payment Development Group Video Engineering Group Establishing domain-specific commitments leads to gaining deeper understanding of associated challenges and solve them more effectively
2016 Model Lack of domain expertise accumulation • e.g. Unable to diagnose the cause of DRM not functioning on specific devices • e.g. End up with a tightly coupled UI and playback functionality implementation 2026 Model Domain experts evolve naturally over time • e.g. Able to resolve issues, understanding device revocations and behavior variations across SDK versions • e.g. Able to design future-proofed content protection system
of requests between groups • Due to differing primary responsibilities/focus areas, deeper contextual requirements emerged that needed to be negotiated Even if each group could independently come up with optimal solutions, delivering a single feature to users still requires coordination across multiple groups
Group Payment Development Group Video Engineering Group Ad Tech Group Want video download capability to be premium, thus requiring download playback functionality
Intersections of Requests The probability of discovering issues during final integration to increase => Increased delivery time The priorities in the other groups to frequently diverge => Increased development time Old system replacement to go slowly (Scaling quality to enhance market competitiveness requires modifications to a historic system infrastructure) cause
Group Payment Development Group Video Engineering Group Ad Tech Group To launch partnership collaboration business, need functionality to receive and distribute partner streams
Group Payment Development Group Video Engineering Group Ad Tech Group Want to quickly finalize payment system changes and conduct early testing of partner’s stream reception and playback
• The content of partner streams remains unknown until the systems are fully integrated ◦ Of course, general specifications are clear, but the actual output can only be determined after actual encoding ◦ Plus, even if we can receive feeds in advance, there was no mechanism to actually test the viewing experience until the integration • ABEMA maintains 22 reference devices as testbeds, with validation based on whether these devices pass at least 2-hour aging tests ◦ 2 hours * 22 devices * 3 viewing modes = 132 hours are required to perform aging tests ◦ At that time, aging testing was performed manually by human observation
• The content of partner streams remains unknown until the systems are fully integrated ◦ Of course, general specifications are clear, but the actual output can only be determined after actual encoding ◦ Plus, even if we can receive feeds in advance, there is no mechanism to actually test the viewing experience until the integration • ABEMA maintains 22 reference devices as testbeds, with validation based on whether these devices pass at least 2-hour aging tests ◦ 2 hours * 22 devices * 3 viewing modes = 132 hours are required to perform aging tests ◦ At that time, aging testing was performed manually by human observation We managed to complete aging testing and necessary fixes within the limited timeframe to launch with satisfactory quality… However, we still incurred additional costs a lot
• The content of partner streams remains unknown until the systems are fully integrated ◦ Of course, general specifications are clear, but the actual output can only be determined after actual encoding ◦ Plus, even if we can receive feeds in advance, there is no mechanism to actually test the viewing experience until the integration • ABEMA maintains 22 reference devices as testbeds, with validation based on whether these devices pass at least 2-hour aging tests ◦ 2 hours * 22 devices * 3 viewing modes = 132 hours are required to perform aging tests ◦ At that time, aging testing was performed manually by human observation We managed to complete aging testing and necessary fixes within the limited timeframe to launch with satisfactory quality… However, we still incurred additional costs a lot We can't pay such cost for every additional partner !!
we aim to solve 1. We cannot start new feature testing until the system is completely integrated 2. Aging tests for multi-device support takes too long
To manage the quality of product at scale, What should we do? For video engineering, we decided to develop 2 quality assuring products to scale the video viewing experience
testing until the system is completely integrated What can we do to assure the quality before the integration? Develop specialized sandbox testing system for viewing test purposes
Requirements from Engineers’ Voice • Video feature development often requires more time setting up reproduction conditions than debugging time itself • With TV devices in particular, setting up reproduction conditions like character input via remote control takes considerable time • I want to enhance quality down to every detail, but there simply isn't enough time Engineer S
testing environment Purpose • If quality assurance can be completed in Golem, all video functions should operate flawlessly • Enable testing of actual viewing experiences even when the systems are not fully integrated yet Main features • Can run any staging videos with arbitrary configurations • Can enable configuration of advertising insertion, including arbitrary VMAP and VAST settings • Can simplify difficult-to-use UI elements (such as keyboard operations via remote control)
Simplified Diagram Request the web app (HTML) Command Control message (WebSocket) Control message (WebSocket) Host the web app Controller DUT (Web) DUT (Installed App) Control Device
Navigation Development QA Develop Fix Develop Fix Test E2E Re-test Video Engineering Navigation Development QA Develop Develop Fix Test video feats Re-test Fix Test E2E Release Release Faster & Less bugs
Navigation Development QA Develop Fix Develop Fix Test E2E Re-test Video Engineering Navigation Development QA Develop Develop Fix Test video feats Re-test Fix Test E2E Release Release Faster & Less bugs Engineer S • Should keep developing Golem to fully automate all repetitive testing related to video viewing, so test frequency becomes infinite, enabling continuous quality improvement However, automation is limited yet
Requirements from Engineers’ Voice • We want to adjust encoding parameters to improve coding efficiency, but delays in aging tests make this difficult to do casually • Coordinating with QA team planning takes much time Engineer N • We should evaluate viewing experience quality quantitatively • We should automate the evaluation process so anyone can perform it Engineers
through aging testing Purpose • Reduce costs and improve frequency of aging testing execution • In aging testing for video playback experiences, we eliminate human observation and automatically analyze metrics that would otherwise be missed by visual inspection Use cases • When modifying encoding parameters • When changing advertising insertion methods • When adjusting CDN configurations
video Trigger Analyze the video Record the playback session Analysis result External RAM System RAM data Identifier metadata Record Aggregate the id and analysis
video Trigger Analyze the video Record the playback session Analysis result External RAM System RAM data Identifier metadata Record Aggregate the id and analysis Setup is as easy as this
video Trigger Analyze the video Record the playback session Analysis result External RAM System RAM data Identifier metadata Record Aggregate the id and analysis Setup is as easy as this Able to analyze recordings even for DRM content !
analyzing the recorded video’s timestamps and frames with RTFDA (Real-Time Video Freezing Detection Algorithm) https://link.springer.com/article/10.1007/s11554-019-00873-y
Intersections of Requests The probability of discovering issues during final integration to increase => Increased delivery time The priorities in the other groups to frequently diverge => Increased development time Old system replacement to go slowly (Scaling quality to enhance market competitiveness requires modifications to a historic system infrastructure) cause
The probability of discovering issues during final integration to increase => Increased delivery time The priorities in the other groups to frequently diverge => Increased development time Old system replacement to go slowly (Scaling quality to enhance market competitiveness requires modifications to a historic system infrastructure) Test all conditions in sandbox environments => Reduced time to identify causes Develop and test future features => Eliminated development time overhead Implementing replacements within the viewing system itself => The system evolves independently