CR2#

Overview#

Cosmos-Reason2 (CR2) is an open, customizable, reasoning vision language model (VLM) for physical AI and robotics. It enables vision AI agents to reason like humans, using prior knowledge, physics understanding, and common sense to understand and act in the real world.

Key Capabilities:

Spatial-temporal understanding of space, time, and fundamental physics
Chain-of-thought reasoning to understand world dynamics without human annotations
Object detection with 2D/3D point localization and bounding-box coordinates
Long-context understanding up to 256K input tokens

The model is post-trained with physical common sense and embodied reasoning data, including supervised fine-tuning and reinforcement learning.

Usage#

In the VSS Warehouse Blueprint, Cosmos-Reason2 serves as the Vision Language Model (VLM) for the agentic AI system, providing video understanding and analysis capabilities.

Key Functions:

Video Analysis: Analyzes warehouse camera footage to understand safety incidents and events
Incident Understanding: Interprets video clips to identify what happened, when, and who was involved
Report Generation: Provides detailed analysis that feeds into structured incident reports

Integration:

The agent system uses CR2 alongside the LLM (Nemotron) in a two-tier architecture:

LLM (Nemotron): Handles query routing, reasoning, and report generation
VLM (Cosmos-Reason2): Performs video understanding and visual analysis

For agent configuration and customization, see Agents.

Reference#

For more information, see the Cosmos-Reason2 Documentation.