From Data to Decisions: VLA Pipelines for Real Robots

in IEEE International Conference on Robotics and Automation (ICRA) 2026

๐Ÿ“… 1-5 June 2026

๐Ÿ“ Vienna, Austria

๐Ÿ”ด Registration Now Open: Google Form

๐Ÿ‘๏ธ Vision
๐Ÿ’ฌ Language
๐Ÿค– Action

Abstract

Vision-Language-Action (VLA) models have emerged as a promising paradigm for building generally capable robot agents. Leveraging scalable teleoperation, simulation, and web-scale video data, as well as pre-trained vision and language models, these systems have demonstrated impressive generalization and instruction-following capability across diverse manipulation tasks.

However, their real-world impact remains limited: most pipelines lack rigorous data curation, principled training regimes, and standardized evaluation, leading to inconsistent performance and safety concerns on physical robots. Meanwhile, performance on real robots is dictated by the underlying "data engine", highlighting the importance of scalable data collection, domain randomization, procedural synthesis, and human-in-the-loop feedback.

This workshop aims to catalyze progress by examining the entire end-to-end VLA pipeline, from scalable data collection and systematic curation to training strategies and inference-time reasoning. A core focus will be on surfacing best practices and establishing shared metrics for generalization and safety.

We Welcome Contributions Across the Entire VLA Pipeline

Whether your work focuses on data collection & curation, preprocessing & augmentation, model architecture, training strategies, optimization techniques, inference & deployment, or evaluation & benchmarking โ€” if it advances Vision-Language-Action systems, we want to hear from you.

Submit Your Work

Important Dates

Now Open
๐Ÿ†

Competition Registration

Register your team now

Apr 15
๐Ÿ“„

Paper Submission

Submit research papers

May 1
๐Ÿ“ง

Results Announced

Competition results & accepted papers

Jun 1
๐ŸŽช

Workshop Day

ICRA 2026, Vienna

Invited Speakers

Karl Pertsch

Karl Pertsch

Physical Intelligence

Vision-Language-Action Models

Shuran Song

Shuran Song

Stanford University

Robotic Perception & Learning

Alberto Rodriguez

Alberto Rodriguez

Boston Dynamics

VLAs in Manufacturing

Yuke Zhu

Yuke Zhu

NVIDIA / UT Austin

Embodied AI & Robotics

Tetsuya Ogata

Tetsuya Ogata

Waseda University

Multimodal Learning

Russ Tedrake

Russ Tedrake

MIT / TRI

Manipulation & Control

๐Ÿ“„

Paper Track

Call for Extended Abstracts

We invite 2-4 page extended abstracts on any component of the VLA pipeline, including:

  • Data collection, curation, and preprocessing
  • Training regimes and post-training strategies
  • Inference-time decision making and reasoning
  • Evaluation protocols and deployment studies
  • Safety, robustness, and generalization analysis

Authors are encouraged to include links to code, configs, and data recipes.

Submission Guidelines

Format: 2-4 page extended abstracts
Review: Single-blind review
Deadline: April 15, 2026
Submission: Via OpenReview

Accepted Papers Receive:

  • ๐ŸŽฏ Lightning spotlights
  • ๐Ÿ“‹ Poster presentations
  • ๐ŸŽค Selected contributed orals
Submit Extended Abstract
๐Ÿ†

Competition Track

Mobile Manipulation Challenge

Train on over 10,000 hours of data and evaluate your VLA systems on real robot hardware.

๐Ÿค– Real Robot Evaluation
๐Ÿ“… Bi-weekly Experiments
๐Ÿ“Š Public Leaderboard
๐ŸŽค Top 3 Present at ICRA

Evaluation videos and data provided for finetuning after each round

๐Ÿงช Try with sample dataset (airoa-moma)
Toyota HSR Robot

Standard Hardware: Toyota HSR

Competition Timeline

Feb 14
Dataset Release 10,000+ hours of training data
Mar 1
Evaluation Begins Bi-weekly real robot evaluation starts
May 1
Final Results Competition winners announced
Jun 1
Workshop Presentation Top 3 teams present at ICRA

The Data Engine

Competition teams will have access to:

๐Ÿ“Š
Massive Dataset Over 10,000 hours of curated data
๐Ÿ”ง
Baselines Pre-trained models and training recipes
๐Ÿ“‹
Evaluation Standardized benchmarks and metrics
๐Ÿค
Cross-pollination Teams encouraged to submit methods papers

Prizes

๐Ÿฅ‡
1st Place
$2,000
๐Ÿฅˆ
2nd Place
$1,000
๐Ÿฅ‰
3rd Place
$600
Register for Competition

Tentative Schedule

Detailed program schedule - subject to minor changes

Morning Session (08:30 - 12:30)

08:30 - 08:40
Opening Remarks
Organizers
08:40 - 09:10
Invited Talk 1
TBA
09:10 - 09:40
Invited Talk 2
TBA
09:40 - 10:10
Invited Talk 3
TBA
10:10 - 11:00
Panel Discussion: Building Reliable VLA Pipelines
Moderator + Panelists
11:00 - 11:30
Coffee Break
11:30 - 12:00
Competition Track: Overview, Baselines & Evaluation
Starter kit and submission flow
12:00 - 12:30
Lightning Spotlights (Accepted Papers)
3-4 papers, 6-7 min each
12:30 - 13:30
Lunch Break

Afternoon Session (13:30 - 17:30)

13:30 - 14:00
Invited Talk 4
TBA
14:00 - 14:30
Invited Talk 5
TBA
14:30 - 15:00
Coffee Break
15:00 - 15:30
Invited Talk 6
TBA
15:30 - 16:00
Contributed Talks (Selected Papers)
Selected papers (8-10 min each)
16:00 - 16:45
Poster Session
Interactive posters and demos
16:45 - 17:15
Competition Winner Talk
Winning team presentation
17:15 - 17:30
Awards & Closing Remarks
Best Poster / Competition

Organizers

* indicates the primary contact person

YW

Yueh-Hua (Kris) Wu*

AIRoA

kris.wu@airoa.org

KO

Kei Ota

AIRoA

kei.ota@airoa.org

JY

Jun Yamada

University of Oxford

jyamada@robots.ox.ac.uk

IP

Ingmar Posner

University of Oxford

ingmar.posner@pmb.ox.ac.uk

TM

Tatsuya Matsushima

University of Tokyo

matsushima@weblab.t.u-tokyo.ac.jp

TO

Tetsuya Ogata

Waseda University

ogata@waseda.jp

IS

Ishika Singh

University of Southern California

ishikasi@usc.edu