DEAS: DEtached value learning with Action Sequence for Scalable Offline RL

Changyeon Kim*1    Haewon Lee1    Younggyo Seo2    Kimin Lee1    Yuke Zhu3,4   
1 KAIST    2 UC Berkeley    3 The University of Texas at Austin    4 NVIDIA

Overview

We present DEAS, an offline RL framework that learns from action sequences instead of single actions. Unlike previous methods that couple actor-critic training, our key insight is to train the action-sequence critic separately from the policy through detached value learning, which enables stable learning while avoiding value overestimation. We further enhance stability by combining distributional RL with detached value learning and using dual discount factors. DEAS can be directly used to enhance VLA performance in both real-world robotic manipulation and simulated environments.


Overview of DEAS

Experiments

Franka Research 3 Kitchen

We report the partial success rate (%, over 20 trials per task) on 3 tasks from 5 initial points. Bold and underline indicate best and runner-up results, respectively.

Experimental results on Franka Research 3 Kitchen

DEAS demonstrates consistent performance improvements across all tasks In contrast, baseline methods show inconsistent performance—while some methods perform well on certain tasks, they exhibit significant performance degradation or minimal improvement on others.

Peach

All videos are 2x real-time.



GR00T N1.5

Filtered BC

IQL

QC

DEAS (Ours)


Hichew

All videos are 2x real-time.



GR00T N1.5

Filtered BC

IQL

QC

DEAS (Ours)



RoboCasa Kitchen

We report the success rate (%, over 50 trials per task) on 4 tasks, aggregated with 3 different seeds. Bold and underline indicate best and runner-up results, respectively.

Experimental results on RoboCasa Kitchen

DEAS achieves the highest success rates in 3 out of 4 tasks, with the remaining task also showing improved performance compared to the base model.


CoffeeSetupMug

GR00T N1.5

Filtered BC

IQL

QC

DEAS (Ours)


PnPCounterToMicrowave

GR00T N1.5

Filtered BC

IQL

QC

DEAS (Ours)


PnPMicrowaveToCounter

GR00T N1.5

Filtered BC

IQL

QC

DEAS (Ours)


TurnOffStove

GR00T N1.5

Filtered BC

IQL

QC

DEAS (Ours)



OGBench

Experimental results on OGBench Data scaling results on OGBench

DEAS consistently outperforms various prior offline RL methods in OGBench, and shows far more effective performance in more challenging tasks (e.g., puzzle and cube-quadruple). Furthermore, DEAS shows consistent performance across different data scales in diverse tasks.


Citation


@article{kim2025deas,
    title={DEAS: DEtached value learning with Action Sequence for Scalable Offline RL},
    author={Changyeon Kim and Haewon Lee and Younggyo Seo and Kimin Lee and Yuke Zhu},
    journal={arXiv preprint arXiv:},
    year={2025},
}