PlayerOne

We introduce PlayerOne, the first egocentric realistic world simulator, facilitating immersive and unrestricted exploration within vividly dynamic environments. Given an egocentric scene image from the user, PlayerOne can accurately construct the corresponding world and generate egocentric videos that are strictly aligned with the real-scene human motion of the user captured by an exocentric camera. PlayerOne is trained in a coarse-to-fine pipeline that first performs pretraining on large-scale egocentric text-video pairs for coarse-level egocentric understanding, followed by finetuning on synchronous motion-video data extracted from egocentric-exocentric video datasets with our automatic construction pipeline. Besides, considering the varying importance of different components, we design a part-disentangled motion injection scheme, enabling precise control of part-level movements. In addition, we devise a joint reconstruction framework that progressively models both the 4D scene and video frames, ensuring scene consistency in the long-form video generation. Experimental results demonstrate its great generalization ability in precise control of varying human movements and world-consistent modeling of diverse scenarios. It marks the first endeavor into egocentric real-world simulation and can pave the way for the community to delve into fresh frontiers of world modeling and its diverse applications.

PlayerOne: Egocentric World Simulator

Abstract

Overall Framework of PlayerOne

Dataset Construction

By seamlessly integrating detection and human pose esimation models, we can extract motion-video pairs from existing egocentric-exocentric video datasets while retaining high-quality data through our automatic filtering scheme.

Experimental Results

More simulated videos

Ablation study on core components

Comparison with other works

Ours Cosmos-7B Cosmos-14B Aether

Descriptions: First-person perspective, stretch out the left hand to high-five the man opposite

Descriptions: First-person perspective, I stand up and extend my right hand to shake hands with the man opposite me

Descriptions: First-person perspective,I stroll forward

Descriptions: First-person perspective, I extended my left hand to shake hands with the woman opposite me

Descriptions: First-person perspective, I squatted down and put my hands on both sides of the chair

Descriptions: First-person perspective, squat down and stretch out my hands to touch the dog's head

Descriptions: First-person perspective, I stretch out my right hand to pick up the triangular rice ball below

Descriptions: First-person perspective, I stretched out my left hand and high-fived the golden retriever.

Descriptions: First-person perspective, I reach out my left hand and stroke the golden retriever's head

Descriptions: First-person perspective, I stretch out my left hand and high-five the man opposite me

Reference