Important
NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.
Framework Inference
Query with Prompt Files
When using VideoNeVA models, the inference script generates responses for all prompts specified in an input .jsonl file. Here’s an example of what an input file might look like:
{"video": "001.mp4", "prompt": "What is the name of this famous sight in the video?\n<video>", "category": "conv", "question_id": 0}
{"video": "001.mp4", "prompt": "Describe this clip in detail.\n<video>", "category": "detail", "question_id": 1}
{"video": "001.mp4", "prompt": "What are the possible reasons for the formation of this sight?\n<video>", "category": "complex", "question_id": 2}
...
The crucial fields within each line are “image” and “prompt.” The trained VideoNeVA model will generate responses for each line, which can be viewed both on the console and in the output file.
To facilitate inference with VideoNeVA, follow the configuration steps in this section.
In the
defaults
section ofconf/config.yaml
, adjust thefw_inference
field to reference the NeVA inference configuration file you want. For instance, to use theneva/inference.yaml
configuration, modify theinference
field tovideo_neva/fw_inference
:defaults: - fw_inference: video_neva/inference ...
In the
stages
section ofconf/config.yaml
, ensure thefw_inference
stage is present:stages: - fw_inference ...
Inside
conf/fw_inference/video_neva/inference.yaml
, configure the paths forprompt_file
,inference.media_base_path
,neva_model_file
andbase_model_file
and to correctly point to the locations relevant to your inference task. The linear weights from NeVa model pretraining should also be accurately referenced underneva_model_file
and your LLM should be referenced underbase_model_file
.inference: media_base_path: /path/to/videos_associate_with_prompt_file/ prompt_file: /path/to/input_jsonl_file.jsonl neva_model_file: /path/to/trained_neva_checkpoint.nemo base_model_file: /path/to/base_model.nemo
Execute the launcher pipeline:
python3 main.py
.
Remarks:
Ensure the value of
run.model_train_name
corresponds to the appropriate model size, eithervideo_neva_llama2_7b_chat
orvideo_neva_llama2_13b_chat
.Ensure the tensor model parallel sizes are correctly assigned. By default, the 7B model uses
tensor_model_parallel_size=4
, while the 13B model usestensor_model_parallel_size=8
.
Use the Gradio Server (Experimental)
This section outlines the steps to set up and use a Gradio server for interacting with NeMo models. Follow these instructions after entering the NeMo container environment.
Set Up the Server
Edit the configuration server.
Navigate to the server script at
/opt/NeMo/examples/multimodal/multimodal_llm/neva/eval/gradio_server.py
.Update the script with the correct configurations and the path to your Neva model.
Install Gradio.
Run the following command to install Gradio:
pip install gradio
Set the environment variables and launch the server.
Set the necessary environment variables and start the Gradio server with the following commands:
export NVTE_FLASH_ATTN=0 export NVTE_FUSED_ATTN=0 python /opt/NeMo/examples/multimodal/multimodal_llm/neva/eval/gradio_server.py
Query the Server
Once the server is up and running, you can query it using a Python Gradio client.
The client sends text and image data as base64 encoded strings.
For an example of how to structure your client queries, refer to the script at /opt/NeMo/examples/multimodal/multimodal_llm/neva/eval/gradio_cli.py
.