DEAS: DEtached value learning with Action Sequence for Scalable Offline RL

Changyeon Kim*¹ Haewon Lee¹ Younggyo Seo² Kimin Lee¹ Yuke Zhu^3,4

¹ KAIST ² UC Berkeley ³ The University of Texas at Austin ⁴ NVIDIA

Paper Code (RoboCasa) Dataset (RoboCasa) Code (OGBench)

Overview

We present DEAS, an offline RL framework that learns from action sequences instead of single actions. Unlike previous methods that couple actor-critic training, our key insight is to train the action-sequence critic separately from the policy through detached value learning, which enables stable learning while avoiding value overestimation. We further enhance stability by combining distributional RL with detached value learning and using dual discount factors. DEAS can be directly used to enhance VLA performance in both real-world robotic manipulation and simulated environments.

Experiments

Franka Research 3 Kitchen

We report the partial success rate (%, over 20 trials per task) on 3 tasks from 5 initial points. Bold and underline indicate best and runner-up results, respectively.

DEAS demonstrates consistent performance improvements across all tasks In contrast, baseline methods show inconsistent performance—while some methods perform well on certain tasks, they exhibit significant performance degradation or minimal improvement on others.

Peach

All videos are 2x real-time.

GR00T N1.5

Filtered BC

IQL

DEAS (Ours)

Hichew

All videos are 2x real-time.

GR00T N1.5

Filtered BC

IQL

DEAS (Ours)

RoboCasa Kitchen

We report the success rate (%, over 50 trials per task) on 4 tasks, aggregated with 3 different seeds. Bold and underline indicate best and runner-up results, respectively.

DEAS achieves the highest success rates in 3 out of 4 tasks, with the remaining task also showing improved performance compared to the base model.

CoffeeSetupMug

GR00T N1.5

Filtered BC

IQL

DEAS (Ours)

PnPCounterToMicrowave

GR00T N1.5

Filtered BC

IQL

DEAS (Ours)

PnPMicrowaveToCounter

GR00T N1.5

Filtered BC

IQL

DEAS (Ours)

TurnOffStove

GR00T N1.5

Filtered BC

IQL

DEAS (Ours)

OGBench

DEAS consistently outperforms various prior offline RL methods in OGBench, and shows far more effective performance in more challenging tasks (e.g., puzzle and cube-quadruple). Furthermore, DEAS shows consistent performance across different data scales in diverse tasks.

Citation

@article{kim2025deas,
    title={DEAS: DEtached value learning with Action Sequence for Scalable Offline RL},
    author={Changyeon Kim and Haewon Lee and Younggyo Seo and Kimin Lee and Yuke Zhu},
    journal={arXiv:2510.07730},
    year={2025},
}