Cosmos-Predict2.5#
Cosmos-Predict2.5 is specialized for simulating and predicting the future state of the world in the form of video. Cosmos-Predict2.5 is a flow based model that unifies Text2World, Image2World, and Video2World into a single model and utilizes Cosmos-Reason1, a Physical AI reasoning vision language model (VLM), as the text encoder. Cosmos-Predict2.5 significantly improves upon Cosmos-Predict1 in both quality and prompt alignment.