RoboChallenge Annual Report Large Scale Real-Robot Evaluation for Embodied AI 2026 May 09

RoboChallenge Annual Report Large Scale Real-Robot Evaluation for Embodied AI
Emily Chen GM of Dexmal Committee of RoboChallenge

The Current Status of VLA Evaluation The vast majority is
conducted in simulation, while the availability of real-world testing is very limited

The Architecture of RoboChallenge

Leaderboard of RoboChallenge

ARX-5 Aloha COBOT Magic UR5 FRANKA PANDA What is First
VLA Benchmark - Table 30 ?

Distribution of Tasks

Data Driving Growth Hacking 41969 Accumulated Rollouts 181 The peak
of day roll out 39.2% Conversion rate of users 17K Table30 Dataset Download 03

Model Capability Analysis

Where Models Win, Where Models Fail Per-capability success rate. Each
row shows the leader of that dimension. 20% 40% 60% 80% 100% Simple-pick 85% — Spirit-v1.5 Manipulation 68.3% — Spirit-v1.5 Classification 52% — pi0.5 Temporal / Sequence 40% — wall-oss-v0.1 Softbody 13.3% — hardest dimension 50% line Hard surfaces are easy. Soft, deformable, and long-horizon are still open research. 12 / 20

Task Difficulty Spectrum — Three Tiers Hello-world, easy, specialty win
— what 30 standardized tasks look like under the leaderboard lens. Tier 1 — Hello World DEFINITION Top-3 = 100% All three top models clear the task Representative tasks: · stack_bowls · stack_color_blocks SIGNAL Foundational manipulation has a working baseline across the leading models. Tier 2 — Easy DEFINITION Top1 ≥ 90%, Top3 ≥ 70% A leader exists; followers within reach Representative tasks: · place_shoes_on_rack · search_green_boxes SIGNAL Visual discrimination is mostly solved; pickup precision becomes the differentiator. Tier 3 — Specialty Win DEFINITION Top 3 = 0% – 10% Share drop after Representative task: · press_three_buttons (wall-oss-v0.1 only) SIGNAL Architectural specialization shows. Different VLAs encode different priors — diversity is real. Source: RoboChallenge per-task aggregate, snapshot 2026-01-23 13 / 20

Transparency and Traceable

1. The demand for testing has grown exponentially, making real
machine validation a necessity in the industry. 2. Stacking bowls and moving objects into boxes have become "Hello World" level tasks. 3. Organizing paper cups and making sandwiches remain challenging problems. 4. The top model's success rate is about 60%, indicating room for improvement. 5. VLA models are still at a very early stage, operating at a near-basic level of human intelligence. Core Findings and Hightlights

Roadmap 2026 — Conferences & Platform Evolution From a 90-day
prototype to a permanent venue for real-robot evaluation. TIMELINE 2025-10 Platform launch 2025-11-20 Committee founded 2026-01 Leaderboard milestones 2026-04-15 ↔ 05-15 CVPR Track 2026-05-08 ↔ 05-25 ICRA Track 2026 H2 → Real 100 · Sim-vs-Real · Zero-shot CVPR 2026 · DENVER Table30 v2 — 18 New Bimanual Tasks Open submissions: 2026-04-15 → 2026-05-15 120+ teams interested 96 registered 400+ participants 68 / 28 universities / enterprises Workshop spotlight: bimanual coordination, scene generalization, long-horizon planning under real-world distribution shift. ICRA 2026 · VIENNA Real Supermarket — AGIBOT Track Open submissions: 2026-05-08 → 2026-05-25 50+ teams Real supermarket scene Closed-loop embodied evaluation Beyond tabletop: shopping aisle navigation, cluttered shelf retrieval, multi-stage planning under live human bystanders. 19 / 20

Competition at a top-tier conference (CVPR) https://robochallenge.cn/competition/cvpr

Table30 V2 for CVPR A large-scale benchmark for evaluating generalization
in frontier models

The Competition is On Fire Download the Table 30 v2
dataset https://huggingface.co/datasets/RoboChallenge/Table30v2

Competition at a top-tier conference ( ICRA ) https://robochallenge.cn/competition/icra

Real Scenarios – Supermarket - Targeting real retail supermarket scenarios
- focusing on the deployable, real-world capabilities of embodied intelligence

RoboChallenge Partner Ecosystem – 2025 Oct

The Committee and Working Group

RoboChallenge Partner Ecosystem – 2025 Nov

RoboChallenge Partner Ecosystem – 2026 Apr

Join RoboChallenge •🌐 Website: RoboChallenge.ai •🐦 X: @robochallenge •💻 GitHub:
RoboChallenge •Hugging Face: huggingface.co/RoboChallenge

RoboChallenge Annual Report Large Scale Real-Ro...

RoboChallenge Annual Report Large Scale Real-Robot Evaluation for Embodied AI 2026 May 09

TAKASU Masakazu

More Decks by TAKASU Masakazu

Other Decks in Research

Featured

Transcript