Using Cosmos-Reason1 for Rejection Sampling#

Rejection sampling workflow

Cosmos-Reason1 is capable of evaluating whether a video adheres to fundamental physical laws, such as gravity, object permanency, collision dynamics, and cause-and-effect relationships.

When paired with a world model like Cosmos-Predict2, Cosmos-Reason1 enables best-of-N sampling by generating multiple video candidates and selecting the most physically accurate ones, thereby improving generation quality.

Generating Multiple Rollouts#

The Cosmos-Predict2 world model can generate multiple video rollouts using the same input. Among these, some will exhibit greater physical plausibility than others. For example, the following videos are generated by the Cosmos-Predict2-2B-Video2World model.

Generation 0

../_images/generation_0.gif

Generation 1

../_images/generation_1.gif

Generation 2

../_images/generation_2.gif

Generation 3

../_images/generation_3.gif

Evaluating Physical Accuracy Using Cosmos-Reason1#

Cosmos-Reason1 can analyze each video element-by-element and organize the analysis process in a structured format for easy parsing. To obtain a fine-grained assessment, Cosmos-Reason1 evaluates each video multiple times (four times in this example). A quality score is calculated as the ratio of artifact-free evaluations to the total number of evaluations. The video with the highest score is retained as the final, high-quality output.

Video

Evaluation and Example Output

Generation 0

Score: 0.0

Raw Output

Formatted Output

Generation 1

Score: 0.75 (Best)

Raw Output

Formatted Output

Generation 2

Score: 0.25

Raw Output

Formatted Output

Generation 3

Score: 0.5

Raw Output

Formatted Output

Example#

The video_critic.py example script demonstrates how to parse the Cosmos-Reason1 output and reformat it into an html report:

./examples/video_critic/video_critic.py --video_path assets/sample.mp4

Launch a server to view the generated report (assets/sample.html):

python -m http.server 8000 -d assets

Example output:

User prompt: Is it safe to turn right?

../_images/sample.gif

Assistant response:

<think> Okay, let’s see. The user is asking if it’s safe to turn right based on the video provided. First, I need to … The key points here are the parked vehicles. There are cars parked on both sides of the road, …The key points here are the parked vehicles. There are cars parked on both sides of the road, … </think>

Based on the video, turning right may not be entirely safe due to the following factors: …

User prompt: Analyze the video for any artifacts or anomalies.

../_images/generation_3.gif

Assistant response:

<component name=”Orange Bottle Interaction”>

<analysis> The left robot arm approaches the orange bottle and makes contact with it. However, instead of grasping the bottle firmly, the gripper wraps around it in an awkward manner, failing to secure a proper hold … suggesting a glitch in the interaction physics. <analysis>