Model Reference#
This page details the options available when running inference with the Cosmos-Predict2.5 base models.
Auto Multiview Inference#
Multiview inference requires a minimum of 8 GPUs with at least 80GB memory each.
The following example runs multi-GPU inference with the example asset:
torchrun --nproc_per_node=8 examples/multiview.py -i assets/multiview/urban_freeway.json -o outputs/multiview_video2world --inference-type=video2world
All variants require sample input videos. For Text2World, they are not used. For Image2World, only the first frame is used. For Video2World, the first 2 frames are used.
Variant |
Arguments |
---|---|
Text2World |
|
Image2World |
|
Video2World |
|
Example Outputs#
The following is an example of output from the Text2World variant:
Robot Action-Conditioned Inference#
The following example runs inference with the example asset:
python examples/action_conditioned.py -i assets/action_conditioned/basic/inference_params.json -o outputs/action_conditioned/basic
Note
Action conditioned inference does not yet support multi-GPU.
Configuration#
Configuration is split into two parts:
Setup Arguments (
ActionConditionedSetupArguments
): Model-related configuration that typically stays the same across runsmodel
: Model variant to use (default: robot/multiview)context_parallel_size
: Context parallelism is not supported for action conditioned model. Set context_parallel_size to 1.output_dir
: Output directory for resultsconfig_file
: Model configuration file
2. Inference Arguments (ActionConditionedInferenceArguments
): Per-run parameters that can vary
input_root
: Root directory containing videos and annotationsinput_json_sub_folder
: Subdirectory containing JSON annotationschunk_size
: Action chunk size for processingguidance
: Guidance scale for generationaction_load_fn
: Function to load action dataAnd many more…
JSON Configuration File#
The following is an example of a JSON configuration file:
{
"name": "my_inference",
"input_root": "/path/to/input/data",
"input_json_sub_folder": "annotations",
"save_root": "/path/to/output",
"chunk_size": 12,
"guidance": 7,
"camera_id": "base",
"start": 0,
"end": 100,
"action_load_fn": "cosmos_predict2.action_conditioned.load_default_action_fn"
}
Custom Action Loading#
To use a custom action loading function, implement a function following this signature:
def custom_action_load_fn():
def load_fn(json_data: dict, video_path: str, args: ActionConditionedInferenceArguments) -> dict:
# Your custom action loading logic here
return {
"actions": actions, # numpy array of actions
"initial_frame": img_array, # first frame
"video_array": video_array, # full video
"video_path": video_path,
}
return load_fn
You can then specify the function in your JSON config:
{
"action_load_fn": "my_module.custom_action_load_fn"
}