Creating expressive character animations is labor-intensive, requiring intricate manual adjustment of animators across space and time. Previous works on controllable motion generation often rely on a predefined set of dense spatio-temporal specifications (e.g., dense pelvis trajectories with exact per-frame timing), limiting practicality for animators. To process high-level intent and intuitive control in diverse scenarios, we propose a practical controllable motions synthesis framework that respects sparse and flexible keyjoint signals. Our approach employs a decomposed diffusion-based motion synthesis framework that first synthesizes keyjoint movements from sparse input control signals and then synthesizes full-body motion based on the completed keyjoint trajectories. The low-dimensional keyjoint movements can easily adapt to various control signal types, such as end-effector position for diverse goal-driven motion synthesis, or incorporate functional constraints on a subset of keyjoints. Additionally, we introduce a time-agnostic control formulation, eliminating the need for frame-specific timing annotations and enhancing control flexibility. Then, the shared second stage can synthesize a natural whole-body motion that precisely satisfies the task requirement from dense keyjoint movements. We demonstrate the effectiveness of sparse and flexible keyjoint control through comprehensive experiments on diverse datasets and scenarios.
From single-to-multiple joints dense trajecotries, we synthesize natural motion with high precision.
Dense Signal
a person walks in a curved line.
a person running to the right
a person walked in a clockwise circle.
jogging forward in medium pace.
he is leaning on something and cleaning it with a towel.
the person crouches and walks forward.
From highly sparse signal, we synthesize plausible motion with high accuracies. We visualize synthesized intermediate keyjoint trajectories with skyblue color.
Sparse Signal
a person moves sideways to the right and then sideways back to the left and then one more step to the right.
a person slowly walks backwards.
a person takes a huge leap forward.
a person walks forward with a limp.
a person starts walking with their right foot first and takes eleven steps forward.
a person takes two steps forward, then walks sideways three steps, then walks forward diagonally and to the left three steps.
Existing controllable motion synthesis methods often struggle with sparse control; however, our approach maintains high performance despite the sparsity of control.
Comparison with Baselines
MotionLCM
TLControl
Ours
MotionLCM
TLControl
Ours
From highly sparse signals, which are initial pose and target end-effector goal, we synthesize goal-driven motion. We trained a unified network for different tasks and demonstrate its performance across various goal-driven scenarios.
Goal-Driven Scenarios
Reaching Hand Target
Climbing with Rock Constraints
Sitting with Hand Control
We define control signals as functions derived from a set of keyjoints, enabling more user-friendly control.
With our decomposed framework, our approach better satisfies objective constraints while preserving the realsim of the full-body motion.
Head-Hand Touching
DNO
Ours w/o decomposed
Ours
Without exact timing information, model fails to accurately follow the trajectory. Our time-agnostic control enables the trajectory control without need for exact timesteps, enabling temporally flexible synthesis.
Time-Agnostic Control
w/o time-agnostic
Ours
Input (no-timing information) & Our trajectory
w/o time-agnostic
Ours
Input (no-timing information) & Our trajectory
@misc{hwang2025motionsynthesissparseflexible,
title={Motion Synthesis with Sparse and Flexible Keyjoint Control},
author={Inwoo Hwang and Jinseok Bae and Donggeun Lim and Young Min Kim},
year={2025},
eprint={2503.15557},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2503.15557},
}