EgoForce: Robust Online Egocentric Motion Reconstruction via Diffusion Forcing

Inwoo Hwang, Donggeun Lim, Hojun Jang, Young Min Kim^✉

Seoul National University

1. Online Reconstruction Pipeline

EgoForce formulates egocentric reconstruction as a causal generation problem with temporally evolving uncertainty. The denoising network performs a fixed $\Delta k$ refinement step to fully denoise the current pose while progressively refining future predictions. This refinement is conditioned only on the generated history and current observations under a strict causal setting , thereby successfully adapt the expressive power of diffusion models to an online streaming environment.

2. Egocentric Reconstruction Results

We visualize the reconstruction results of EgoForce on streaming egocentric inputs. EgoForce addresses egocentric motion reconstruction under strict causal constraints, demonstrating the ability to reconstruct high-fidelity and long-term stable motion across diverse scenarios.

EgoForce (Ours)

Ground Truth

EgoForce (Ours)

Ground Truth

EgoForce (Ours)

Ground Truth

EgoForce (Ours)

Ground Truth

EgoForce (Ours)

Ground Truth

EgoForce (Ours)

Ground Truth

3. Comparisons to Previous Works

We provide comparisons with state-of-the-art online (RPM) and offline (UniEgoMotion) baselines.

- Online methods (RPM) struggle to handle abrupt changes in egocentric signals and possess limited model capacity, leading to degraded motion quality.

- Offline methods (UniEgoMotion) achieve higher motion fidelity but violate causal constraints and rely on window-based generation and stitching for long-term motion reconstruction, which introduce temporal discontinuities.

In contrast, EgoForce generates globally coherent and long-term stable motion while strictly satisfying causal constraints, achieving state-of-the-art reconstruction performance.

RPM (Online)

UniEgoMotion (Offline)

EgoForce (Ours, Online)

Ground Truth

RPM (Online)

UniEgoMotion (Offline)

EgoForce (Ours, Online)

Ground Truth

4. Noise Robust Reconstruction

EgoForce demonstrates high robustness to input signal corruption by effectively handling noise during the diffusion process, whereas non-noise-aware models tend to directly track corrupted signals, resulting in motion artifacts.

Without Noise-awareness

EgoForce (Ours)

Groud Truth

Without Noise-awareness

EgoForce (Ours)

Groud Truth

5.Results on Ego-Exo4D

Our framework generates stable and plausible full-body motions from real-world sequences using Ego-Exo4D SLAM and hand estimation inputs.

EgoForce (Ours)

6. Limitations

Our reconstruction occasionally fails to capture subtle motion details and may produce head pose errors when the subject performs large global movements.

EgoForce (Ours)

Ground Truth

EgoForce (Ours)

Ground Truth