We present DEAS, an offline RL framework that learns from action sequences instead of single actions. Unlike previous methods that couple actor-critic training, our key insight is to train the action-sequence critic separately from the policy through detached value learning, which enables stable learning while avoiding value overestimation. We further enhance stability by combining distributional RL with detached value learning and using dual discount factors. DEAS can be directly used to enhance VLA performance in both real-world robotic manipulation and simulated environments.
We report the partial success rate (%, over 20 trials per task) on 3 tasks from 5 initial points. Bold and underline indicate best and runner-up results, respectively.
DEAS demonstrates consistent performance improvements across all tasks In contrast, baseline methods show inconsistent performance—while some methods perform well on certain tasks, they exhibit significant performance degradation or minimal improvement on others.
All videos are
GR00T N1.5
Filtered BC
IQL
QC
DEAS (Ours)
All videos are
GR00T N1.5
Filtered BC
IQL
QC
DEAS (Ours)
We report the success rate (%, over 50 trials per task) on 4 tasks, aggregated with 3 different seeds. Bold and underline indicate best and runner-up results, respectively.
DEAS achieves the highest success rates in 3 out of 4 tasks, with the remaining task also showing improved performance compared to the base model.
GR00T N1.5
Filtered BC
IQL
QC
DEAS (Ours)
GR00T N1.5
Filtered BC
IQL
QC
DEAS (Ours)
GR00T N1.5
Filtered BC
IQL
QC
DEAS (Ours)
GR00T N1.5
Filtered BC
IQL
QC
DEAS (Ours)
DEAS consistently outperforms various prior offline RL methods in OGBench, and shows far more effective performance in more challenging tasks (e.g., puzzle and cube-quadruple). Furthermore, DEAS shows consistent performance across different data scales in diverse tasks.
@article{kim2025deas,
title={DEAS: DEtached value learning with Action Sequence for Scalable Offline RL},
author={Changyeon Kim and Haewon Lee and Younggyo Seo and Kimin Lee and Yuke Zhu},
journal={arXiv preprint arXiv:},
year={2025},
}