MaskAdapt: Learning Flexible Motion Adaptation via Mask-Invariant Prior for Physics-Based Characters

Abstract

We present MaskAdapt, a framework for flexible motion adaptation in physics-based humanoid control. The framework follows a two-stage residual learning paradigm. In the first stage, we train a mask-invariant base policy using stochastic body-part masking and a regularization term that enforces consistent action distributions across masking conditions. This yields a robust motion prior that remains stable under missing observations, anticipating later adaptation in those regions. In the second stage, a residual policy is trained atop the frozen base controller to modify only the targeted body parts while preserving the original behaviors elsewhere. We demonstrate the versatility of this design through two applications: (i) motion composition, where varying masks enable multi-part adaptation within a single sequence, and (ii) text-driven partial goal tracking, where designated body parts follow kinematic targets provided by a pre-trained text-conditioned autoregressive motion generator. Through experiments, MaskAdapt demonstrates strong robustness and adaptability, producing diverse behaviors under masked observations and delivering superior targeted motion adaptation compared to prior work.

Video

Method

Our framework follows a two-stage residual learning paradigm: the base controller first learns a robust action prior, and a residual controller is then trained on top of the frozen base policy to produce residual actions that adapt the base behavior.

Key Challenges

Existing methods typically train the base policy without accounting for the substantial state-distribution shifts that arise during the subsequent adaptation phase.
They either restrict adaptation to fixed body regions or lack semantic control, e.g., text-conditioned guidance.

To tackle these challenges, we introduce MaskAdapt, a framework that learns a robust motion prior that enables flexible residual adaptation.

Stage 1: Learning a Mask-Invariant Motion Prior

The base policy is trained with stochastic body-part masking and regularized using the mask-invariant loss to maintain consistent actions across masking conditions. This encourages a robust motion prior that remains stable under missing observations, anticipating later adaptation in those regions.

Stage 2: Learning Flexible Motion Adaptation

The residual policy enables flexible motion adaptation, which we evaluate through two representative tasks: Dynamic Motion Composition, where varying masks allow multi-part adaptation within a single sequence, and Text-Driven Partial Motion Tracking, where designated body parts follow kinematic targets generated by a pre-trained text-conditioned autoregressive diffusion model.

Flexible Motion Adaptation

Here, we showcase the qualitative results of the residual adaptation stage across two representative tasks.

(i) Dynamic Motion Composition

Jump + Alternating Kick

Locomotion + Rotate Arms

Aim + Sneak

(ii) Text-Driven Partial Motion Tracking

"Raise right arm"

"Flap like a bird"

"Cross arms"

Comparison with SOTA

We compare our method against Composite Motion Learning (CML) on both motion composition and partial tracking tasks.

Motion Composition

Jump + Alternating Kick

Locomotion + Rotate Arms

Aim + Sneak

Partial Tracking

"Dribble"

"Raise arms"

"Wave hands"

BibTeX

@article{park2026maskadapt,
  title={MaskAdapt: Learning Flexible Motion Adaptation via Mask-Invariant Prior for Physics-Based Characters},
  author={Park, Soomin and Lee, Eunseong and Lee, Kwang Bin and Lee, Sung-Hee},
  journal={arXiv preprint arxiv:2603.29272},
  year={2026}
}