MaskAdapt: Learning Flexible Motion Adaptation via Mask-Invariant Prior for Physics-Based Characters

KAIST
CVPR 2026 Highlight ✨
MaskAdapt Teaser

Abstract

We present MaskAdapt, a framework for flexible motion adaptation in physics-based humanoid control. The framework follows a two-stage residual learning paradigm. In the first stage, we train a mask-invariant base policy using stochastic body-part masking and a regularization term that enforces consistent action distributions across masking conditions. This yields a robust motion prior that remains stable under missing observations, anticipating later adaptation in those regions. In the second stage, a residual policy is trained atop the frozen base controller to modify only the targeted body parts while preserving the original behaviors elsewhere. We demonstrate the versatility of this design through two applications: (i) motion composition, where varying masks enable multi-part adaptation within a single sequence, and (ii) text-driven partial goal tracking, where designated body parts follow kinematic targets provided by a pre-trained text-conditioned autoregressive motion generator. Through experiments, MaskAdapt demonstrates strong robustness and adaptability, producing diverse behaviors under masked observations and delivering superior targeted motion adaptation compared to prior work.

Video

Method

Our framework follows a two-stage residual learning paradigm: the base controller first learns a robust action prior, and a residual controller is then trained on top of the frozen base policy to produce residual actions that adapt the base behavior.

Key Challenges

  • Existing methods typically train the base policy without accounting for the substantial state-distribution shifts that arise during the subsequent adaptation phase.
  • They either restrict adaptation to fixed body regions or lack semantic control, e.g., text-conditioned guidance.

To tackle these challenges, we introduce MaskAdapt, a framework that learns a robust motion prior that enables flexible residual adaptation.

Stage 1: Learning a Mask-Invariant Motion Prior

The base policy is trained with stochastic body-part masking and regularized using the mask-invariant loss to maintain consistent actions across masking conditions. This encourages a robust motion prior that remains stable under missing observations, anticipating later adaptation in those regions.

Stage 2: Learning Flexible Motion Adaptation

The residual policy enables flexible motion adaptation, which we evaluate through two representative tasks: Dynamic Motion Composition, where varying masks allow multi-part adaptation within a single sequence, and Text-Driven Partial Motion Tracking, where designated body parts follow kinematic targets generated by a pre-trained text-conditioned autoregressive diffusion model.

Mask-Invariant Motion Prior

Effect of Mask-Invariant Loss

Here, we compare base policies trained with and without the mask-invariant loss.

Comparison with Unmasked Baseline

The MI loss effectively prevents collapse under masking and enables the policy to retain dataset-level diversity comparable to the unmasked baseline (AMP), qualifying it as a robust motion prior.

Ours

AMP

Flexible Motion Adaptation

Here, we showcase the qualitative results of the residual adaptation stage across two representative tasks.

(i) Dynamic Motion Composition

Jump + Alternating Kick

Locomotion + Rotate Arms

Aim + Sneak

(ii) Text-Driven Partial Motion Tracking

"Raise right arm"

"Flap like a bird"

"Cross arms"

Goal-Driven and Complex Scenarios

Here, we showcase a wide range of goal-driven tasks (a-c) and complex scenarios (d-e) through motion composition.

(a) Target Location Task

(b) Strike Task

(c) Heading Task

(d) Multi-Motion Composition

(e) Adaptation for Real Humanoid Robot (Unitree G1)

Comparison with SOTA

We compare our method against Composite Motion Learning (CML) on both motion composition and partial tracking tasks.

Motion Composition

Jump + Alternating Kick

Locomotion + Rotate Arms

Aim + Sneak

Partial Tracking

"Dribble"

"Raise arms"

"Wave hands"

BibTeX

@article{park2026maskadapt,
  title={MaskAdapt: Learning Flexible Motion Adaptation via Mask-Invariant Prior for Physics-Based Characters},
  author={Park, Soomin and Lee, Eunseong and Lee, Kwang Bin and Lee, Sung-Hee},
  journal={arXiv preprint arxiv:2603.29272},
  year={2026}
}