NVIDIA In-Game Inferencing (NVIGI) Developer Pack 1.1.0 Release

What is NVIGI?

The NVIDIA In-Game Inferencing (NVIGI) SDK streamlines AI model deployment and integration for PC application developers. The SDK pre-configures the PC with the necessary AI models, engines, and dependencies. It orchestrates AI inference seamlessly across PC and cloud from a unified inference API. And it supports all major inference backends, across different hardware accelerators (GPU, NPU, CPU).

The system is meant to be integrated into end-user applications and games to provide selection between running models locally or in the cloud (i.e. hybrid).

High level objectives are:

Allow models to execute across a variety of backends, devices and runtimes
Support a wide range of models and pipelines
Provide a seamless way for application developers to run in cloud or locally
Efficient in-app integration

The NVIDIA IGI SDK is architected as a suite of plugins, containing both core inferencing plugins as well as helper plugins, that is to be integrated into end-user applications. . The “helper” plugins are shared amongst the various inference plugins. Examples of “helper” plugins include network functionalities like gRPC or D3D12 device/queue/command list management for integration of 3D workloads and AI workloads. Core AI inferencing plugins implement many different models using multiple runtimes, but all of them share the same creation and inference APIs as one another. As a result, all of the LLM plugins and all of the ASR (Speech Recognition) plugins share functionality-specific APIs and can be easily swapped in and out for one another by an application with minor code modifications. All of this is possible with the core plugin architecture by creating interfaces that are shared by all plugins that implement the specific functionality.

The Current Release of the Developer Pack

The NVIDIA IGI Developer Pack is a release of several components: the IGI SDK, a 3D Sample and models that is designed to show the architecture and application integration of NVIGI with interactive applications. The pack supports a group of local inference plugins along with a cloud inference plugin. The set of plugins includes:

GGML-based LLMs on CPU or GPU (CUDA) https://github.com/ggerganov/ggml
GGML-based Speech Recognition on CPU or GPU (CUDA) https://github.com/ggerganov/whisper.cpp
GGML-based embeddings on CPU or GPU (CUDA)
ONNX GenAI Runtime-based LLMs on GPU (DirectML) https://github.com/microsoft/onnxruntime-genai
NVIDIA Cloud-based GPT/LLM inference via https://build.nvidia.com/explore/discover
GPU scheduling optimizations with 3D workloads nvigi_core/docs/GpuSchedulingForAI

In addition, several samples exist in two locations. Source code for the samples is provided, precompiled, which allow for instant experimentation with the plugins.:

Command-line samples for specific plugins, located in plugins/SDK/source/samples with (pre-)built binaries in plugins/SDK/bin/x64. These include:
- Basic (nvigi.basic): A command-line sample that shows the use of individual ASR and GPT plugins to implement conversational AI. The user can provide input by typing their queries or by using a microphone to pass their verbal query to speech recognition. The GPT plugin will respond to the query, with conversational context. The GPT plugin may be switched from local to cloud models via the command line.
- Pipeline (nvigi.pipeline): A command-line sample that shows the use of a pipeline plugin, capable of running a sequence of ASR and GPT plugins via a single evaluation call. This sample uses audio input from an audio file
- RAG (nvigi.rag): A command-line sample that shows how to use GPT and embedding to implement Retrieval Automated Generation, or RAG. Specifically, the sample takes a text file to use as its reference, or “corpus” when answering queries, along with a prompt to guide how it uses the corpus. The user may type in queries for the RAG.
A 3D sample exists in a top-level sample directory. It includes a wider range of plugins, as well as a GUI for interaction and a 3D scene rendered at the same time
- Support for local and cloud GPT
- Support for ASR via GUI-based recording.

Where Do I Start?

The recommended order of “getting started” with the pack is to:

Read this entire document
Run the setup_links.bat script in the top-level of this pack, which will create required cross-directory links that are needed by the various components (since links like these cannot be zipped)
Download models as per the section Getting Models. We recommend downloading all models available via download.bat files in the nvigi.models tree in order to be able to immediately run the 3D Sample and all command-line samples. We have also provided a script in the top level of this pack, download_data.bat which will download all supported models. Note that this script must be run after setup_links.bat, and will require pressing the Enter key after each downloaded model. Downloading all of the models via the download_data.bat script will require approximately 9-10GB of disk space.
Read the release notes and known issues sections at the end of this document
Read the 3D Sample’s docs in detail and follow its instructions, especially those for things such as setting up API keys for the Cloud GPT Plugin.
Run the precompiled 3D Sample and interact with it as described in the 3D Sample’s docs
Read the documentation on how to run the command-line SDK Plugins samples, Basic sample (nvigi.basic.exe), Pipeline sample (nvigi.pipeline.exe) and RAG sample (nvigi.rag.exe)
Review the SDK documentation for the other components, which should be linked from this top-level document (NVIGI Core, SDK Plugins docs)
Review the source code of the NVIGI integration into the 3D Sample by reviewing the source code in sample/src/nvigi/NVIGIContext.[cpp, h]
Review the API headers in the include subdirectories of the core and plugin components (nvigi_core, plugins/sdk)
Follow the 3D Sample docs instructions to rebuild the 3D Sample from its source and run it in the debugger, allowing for easy stepping through the code to better understand how it integrates with the SDK

Getting Models

In order to avoid making the standard layout pack very large, the pack does not include any model data files. Models for NVIGI fall into one of a few categories

Cloud models; these consist only of a configuration JSON file. These are shipped with this pack under nvigi.models/<plugin name>/{model GUID}
Manually-downloadable, public models; these consist of a configuration JSON file and a Windows batch file download.bat. These are shipped in this pack under nvigi.models/<plugin name>/{model GUID}, and the download.bat can be double-clicked to use curl to download the model without any form of authentication. In some cases, the batch file will also extract the required files from zip, if the downloaded file is a zipfile. Depending upon the security settings on the local system, Windows may ask for confirmation when running the batch file.

Cloud Models

The supported cloud models in this release include:

Plugin	Model Name	GUID	URL
nvigi.plugin.gpt.cloud.rest	Llama3.2 3b Instruct	01F43B70-CE23-42CA-9606-74E80C5ED0B6	https://integrate.api.nvidia.com/v1/chat/completions
	Nemotron Mini 4B	8E31808B-C182-4016-9ED8-64804FF5B40D	https://integrate.api.nvidia.com/v1/chat/completions
	gpt-3.5-turbo	E9102ACB-8CD8-4345-BCBF-CCF6DC758E58	https://api.openai.com/v1/chat/completions

Manually-Downloadable, Public Models

Plugin	Model Name	GUID
nvigi.plugin.asr.ggml.*	Whisper Small	5CAD3A03-1272-4D43-9F3D-655417526170
nvigi.plugin.embed.ggml.*	E5 Large Unsupervised	5D458A64-C62E-4A9C-9086-2ADBF6B241C7
nvigi.plugin.gpt.ggml.*	Llama3.2 3b Instruct	01F43B70-CE23-42CA-9606-74E80C5ED0B6
	Nemotron Mini 4B	8E31808B-C182-4016-9ED8-64804FF5B40D
	Nemovision 4B Instruct FP16	0BAEDD5C-F2CA-49AA-9892-621C40030D12
	Nemovision 4B Instruct Q4	0BAEDD5C-F2CA-49AA-9892-621C40030D13

Important Documentation in the Developer Pack

The main documentation for using the pack includes:

3D Sample
- The best “starting” doc, as the sample is provided pre-compiled and ready-to-run. It describes how to run, recompile, and debug in the sample, which shows both speech recognition and LLMs for text interaction.
nvigi_core
- Detailed documentation on the components of the NVIGI core library, including:
- Architecture Guide
  - Discusses the high-level architecture of the entire SDK, the data flow, etc
- GPU Scheduling For AI Guide
  - A detailed document on how advanced applications can assist in causing GPU AI work to be best scheduled along with 3D work
Plugin SDK Guide
- The core documentation for getting started with the details of the AI plguins. This describes how to run a set of much more minimal samples and how to run the samples in the debugger.
- Detailed documentation on the components of the SDK plugins, including:
- ASR Whisper Plugins Programming Guide
  - Detailed docs on how to program for the speech recognition plugins
- Embedding Plugin Programming Guide
  - Detailed documentation on how to program for the Embedding plugins
- GPT Plugins Programming Guide
  - Detailed documentation on how to program for the GPT/LLM plugins

Contents of the Developer Pack

The pack consists of:

nvigi_core: The NVIGI Core components, including:
- The headers, libraries and DLLs that make up the main functionality of NVIGI
- Basic documentation on the core architecture
plugins/sdk: The AI Plugins, including:
- The headers (include) and DLLs (bin/x64) that comprise the AI Plugin functionalities such as ASR, GPT, and Embedding.
- Basic, precompiled command-line samples (source/samples and bin/x64) that run a sequence of AI workloads
sample: The 3D Sample, including:
- A precompiled, runnable 3D+GUI sample that allows easy experimentation with components of the SDK, including speech input and text input to an LLM
- Source code for the sample
- Build files to allow the source to be rebuilt as desired
nvigi.models: AI Models for use with the above, including:
- Llama-3 GPT LLM, downloadable manually
- Whisper Speech Recognition, downloadable manually
- Note that this is a junction link, generated by setup_links.bat. It points into the plugins/sdk/data/nvigi.models directory.
nvigi.test: Basic AI Test Data, including:
- A WAV file for use as input to speech recognition in the command-line sample
- Note that this is a junction link, generated by setup_links.bat. It points into the plugins/sdk/data/nvigi.test directory.
docs: Documentation for all of the above components, repackaged and linked from this top-level document.

Known Issues and Important Notes

1.1.0 Release

TBD

1.0.0 Release

Initial public release

Beta 1

A significant change in the Beta 1 release compared to early-access releases is that independent components of NVIGI have been split into their own directories. Specifically, the core APIs and core DLLs for NVIGI are now in the nvigi_core package, and the AI Plugins are in SDK. The Beta 1 pack generally ships all of these, zipped as sibling directories, in a single release pack. However, future releases may be decoupled, especially w.r.t. nvigi_core, which should change much less frequently. Additional, new plugins may also be distributed independently.
Owing to a quirk of conversational/interactive mode in GGML, the GGML GPT plugin can in some cases produce truncated responses or an empty response. This can be seen in the Basic Sample or in the 3D Sample if the user interacts with the GPT for several iterations without resetting the conversation. If desired, the frequency of this can be lowered by increasing the nvigi::GPTRuntimeParameters::tokensToPredict value. A solution to this is being investigated.
When using CiG, it is currently important to keep a reference to the nvigi::plugin::hwi::cuda for the life of the application. This ensures that if plugins using CiG are created and destroyed multiple times, the shared CUDA context stays active. It is important that the nvigi::plugin::hwi::cuda plugin not be unloaded and reloaded multiple times in the application. Keeping a reference via an active interface for the life of the application will do this. See the CiG code in the 3D Sample for an example of this.
The OnnxGenAI Mistral Model responses can include some “system” text; a potential fix to the plugin is being investigated.