EgoForce: Robust Online Egocentric Motion Reconstruction via Diffusion Forcing
1. Online Reconstruction Pipeline
EgoForce formulates egocentric reconstruction as a causal generation problem with temporally evolving uncertainty. The denoising network performs a fixed $\Delta k$ refinement step to fully denoise the current pose while progressively refining future predictions. This refinement is conditioned only on the generated history and current observations under a strict causal setting , thereby successfully adapt the expressive power of diffusion models to an online streaming environment.
2. Egocentric Reconstruction Results
We visualize the reconstruction results of EgoForce on streaming egocentric inputs. EgoForce addresses egocentric motion reconstruction under strict causal constraints, demonstrating the ability to reconstruct high-fidelity and long-term stable motion across diverse scenarios.
EgoForce (Ours)
Ground Truth
EgoForce (Ours)
Ground Truth
EgoForce (Ours)
Ground Truth
EgoForce (Ours)
Ground Truth
EgoForce (Ours)
Ground Truth
EgoForce (Ours)
Ground Truth
3. Comparisons to Previous Works
We provide comparisons with state-of-the-art online (RPM) and offline (UniEgoMotion) baselines.
- Online methods (RPM) struggle to handle abrupt changes in egocentric signals and possess limited model capacity, leading to degraded motion quality.
- Offline methods (UniEgoMotion) achieve higher motion fidelity but violate causal constraints and rely on window-based generation and stitching for long-term motion reconstruction, which introduce temporal discontinuities.
In contrast, EgoForce generates globally coherent and long-term stable motion while strictly satisfying causal constraints, achieving state-of-the-art reconstruction performance.
RPM (Online)
UniEgoMotion (Offline)
EgoForce (Ours, Online)
Ground Truth
RPM (Online)
UniEgoMotion (Offline)
EgoForce (Ours, Online)
Ground Truth
4. Noise Robust Reconstruction
EgoForce demonstrates high robustness to input signal corruption by effectively handling noise during the diffusion process, whereas non-noise-aware models tend to directly track corrupted signals, resulting in motion artifacts.
Without Noise-awareness
EgoForce (Ours)
Groud Truth
Without Noise-awareness
EgoForce (Ours)
Groud Truth
5.Results on Ego-Exo4D
Our framework generates stable and plausible full-body motions from real-world sequences using Ego-Exo4D SLAM and hand estimation inputs.
EgoForce (Ours)
EgoForce (Ours)
6. Limitations
Our reconstruction occasionally fails to capture subtle motion details and may produce head pose errors when the subject performs large global movements.
EgoForce (Ours)
Ground Truth
EgoForce (Ours)
Ground Truth