Cosmos-Reason2#
Cosmos-Reason2 is an open , customizable, reasoning vision language model (VLM) for physical AI and robotics. It enables robots and vision AI agents to reason like humans, using prior knowledge, physics understanding, and common sense to understand and act in the real world. This model understands space, time, and fundamental physics, and can serve as a planning model to reason what steps an embodied agent might take next.
Cosmos-Reason2 excels at navigating the long tail of diverse physical world scenarios with spatial-temporal understanding. The Cosmos-Reason2 model is post-trained with physical common sense and embodied reasoning data, including supervised fine-tuning and reinforcement learning. It uses chain-of-thought reasoning capabilities to understand world dynamics without human annotations.
The following are new features in Cosmos-Reason2:
Enhanced physical AI reasoning with improved spatio-temporal understanding and timestamp precision.
Object detection with 2D/3D point localization and bounding-box coordinates, as well as reasoning explanations and labels.
Improved long-context understanding up to 256K input tokens.