Ecosystem Architecture#

This section provides an overview of the hardware and software solutions in the enterprise ecosystem that leverage NVIDIA technology to form an NVIDIA Enterprise AI Factory. Additionally, it contains information regarding our various ecosystem partners who offer solutions for components of the AI Factory, including Enterprise Kubernetes, storage, observability, security and developer tools.

Hardware Infrastructure#

The hardware design for the Enterprise AI Factory prioritizes scalability and elasticity, facilitating horizontal scaling of compute with NVIDIA Blackwell GPUs, networking with Spectrum-X, and services using Enterprise ready Kubernetes Platform. This state-of- art hardware ensures performance for achieving the necessary latency and throughput for real-time inference and complex agent interactions. GPU resource optimization is achieved by leveraging effective scheduling, utilization, and management of high-density GPU resources.

The hardware design follows the NVIDIA Enterprise Reference Architecture (Enterprise RA) guidance which is tailored for enterprise-class deployments, ranging up to 256 GPUs. Depending on the base technology, they include configurations for 4 up to 32 nodes, complete with the appropriate networking topology, switching, and allocations for storage and control plane nodes. Enterprise RAs are right-sized for enterprise-scale deployments, it provides deployment guides, cluster characterization, provisioning automation using BCM, and sizing guides for common enterprise AI implementations. NVIDIA Enterprise RAs are designed to support a diverse range of workloads, including AI pre-training, post-training, long thinking inference, HPC, and data analytics. These designs provide a versatile foundation for enterprise AI with a focus on on-premises, single-tenant, Ethernet-based environments.

For more details on these prescriptive Blackwell design patterns and components for building Enterprise AI Factories—as well as the NVIDIA-Certified system used in the Enterprise AI Factory—please refer to the NVIDIA Enterprise Reference Architecture white paper.

When selecting hardware components for an AI platform, several factors come into play to ensure the system meets the demanding needs of AI workloads.

Accelerated Computing Platform#

Enterprise AI, particularly for complex agentic systems, demands substantial computational resources that challenge traditional data center capabilities. Agentic systems execute diverse workloads, from sequential task processing and logical reasoning to parallel data analysis and model inference. This operational diversity requires a balanced computing architecture that effectively utilizes CPUs for control, orchestration, and serial tasks, alongside GPUs for massively parallel computations inherent in AI model training, inference, and complex data manipulation.

Accelerated computing platforms, integrating powerful GPUs, CPUs, alongside high-speed networking, all optimized via specialized software stacks, provide the necessary performance and efficiency for these demanding workloads. This approach yields significant improvements in processing speed and energy efficiency over CPU-centric computing.

With the following NVIDIA Blackwell accelerated computing platforms in the NVIDIA Enterprise AI Factory Design Guide, enterprises can unlock the full potential of AI in their data center infrastructure, from accelerating simulations and data analysis to enabling real-time generative design and visualization.

  • The NVIDIA RTX PRO™ Server Edition is the ultimate data center GPU for AI and visual computing, delivering breakthrough acceleration for the most demanding enterprise workloads, from multimodal AI inference and physical AI to scientific computing, graphics, and video applications. Optimized for workloads requiring the compute density and scale that deploying in the data center offers, the RTX PRO 6000 features a passively cooled thermal design and 96 GB of ultra-fast GDDR7 memory. Enterprises can configure up to eight NVIDIA RTX PRO 6000 GPUs in a server to deliver unmatched levels of compute power, memory capacity, and throughput to power mission-critical AI-enabled applications and accelerate use cases across industries—from healthcare, manufacturing, and geoscience to retail, media, and live broadcast.

  • The NVIDIA HGX™ B200 propels the data center into a new era of accelerated computing and generative AI, integrating NVIDIA Blackwell Tensor Core GPUs with a high-speed interconnect to accelerate AI performance at scale. Configurations of eight GPUs deliver unparalleled generative AI acceleration alongside a remarkable 1.4 terabytes (TB) of GPU memory and 64 terabytes per second (TB/s) of memory bandwidth for 15X faster real-time trillion-parameter-model inference, 12X lower cost, and 12X less energy. This extraordinary combination positions HGX B200 as a premier accelerated x86 scale-up platform designed for the most demanding generative AI, data analytics, and high-performance computing (HPC) workloads.

As detailed in NVIDIA Enterprise RA’s, these systems are built on NVIDIA-Certified System servers, designed for optimal performance. For inference-focused platforms, selection criteria prioritize these characteristics:

  • Inference Performance: High efficiency at various precisions (e.g., FP16, INT8, and newer formats like FP4/FP6 for Blackwell) delivers low-latency and high-throughput model serving.

  • GPU Memory (VRAM): Sufficient GPU Memory (VRAM) capacity and high bandwidth are paramount for accommodating the large language models (LLMs) prevalent in Retrieval Augmented Generation (RAG) applications, handling large batch sizes during inference, and supporting the extensive context windows often required by these sophisticated AI agents. Modern NVIDIA GPUs, such as the NVIDIA RTX™ PRO Server Edition with its flexible design and substantial VRAM, or the compute focused NVIDIA B200 Tensor Core GPUs which offer exceptionally large memory footprints, are designed to meet these demands. Reference configurations for AI platforms frequently specify significant memory per GPU to ensure that these complex models and their data can be efficiently processed, enabling low-latency responses and high-throughput performance for AI factory operations.

  • Scalability and Interconnects: While massive multi-node training setups might be less of a focus, efficient GPU-to-GPU communication via technologies like NVIDIA NVLink can still be beneficial for certain inference scenarios (e.g., model or pipeline parallelism for very large models) and for accelerating the data processing stages in RAG. Server configurations like PCIe Optimized or HGX systems cater to different scales and performance needs.

Networking#

Low-latency networking facilitates efficient data exchange, which is valuable for AI inference, especially in multi-node scenarios. For user-facing applications like AI agents, low latency directly reduces the perceived delay, improving key metrics like Time-To-First-Token (TTFT) and overall response time for a better user experience. When large models are split across multiple GPUs or nodes– using techniques like pipeline or tensor parallelism for inference– each stage of computation depends on timely communication between devices. In these scenarios, even microsecond-scale delays can accumulate, and tail latency—the slowest portion of the communication distribution—can significantly degrade overall performance. Reducing both average and tail latency is essential for ensuring consistent, fast, and responsive AI services at scale.

The NVIDIA Spectrum-X Networking Platform is purpose-built for AI Factories, delivering advanced transport offloads that accelerate collective operations combined with congestion-aware routing and hardware-based scheduling, directly addressing tail-latency issues that bottleneck multi-node AI workloads.

NVIDIA BlueField data processing units (DPUs) are essential for creating high-performance, secure, and efficient AI factories. They offload and accelerate critical tasks such as software-defined networking, storage, and security, freeing up CPU and GPU resources to focus on AI computation. Leveraging purpose-built hardware accelerators and dedicated Arm cores, BlueField supports faster, more secure cloud deployment, zero-trust multi-tenancy, accelerated data access, and real-time threat detection. This enables enterprises to build AI systems that are more scalable, resilient, and optimized for modern, cloud-native infrastructure.

AI Enterprise Infrastructure Software#

The NVIDIA AI Enterprise Infrastructure software encompasses all necessary components for managing and optimizing infrastructure along with AI workloads. NVIDIA provides Release Branches to meet organizational needs. The NVIDIA Kubernetes Operators facilitate a standardized management of NVIDIA GPUs, AI models, and network resources within Kubernetes environments. The following table outlines the components and versions of the NVIDIA AI Enterprise Infrastructure software.

Component

Software

Version

Notes

GPU Driver

NVIDIA Linux Driver

570.133.20+

Supported by the GPU Operator 25.03+

GPU Management

NVIDIA GPU Operator

25.03+

Simplifies the deployment of NVIDIA AI Enterprise by automating the management of all NVIDIA software components needed to provision GPUs in Kubernetes (drivers, toolkit, DCGM).

Network Management (Hardware)

NVIDIA Network Operator

v25.1.0+

Simplifies the provisioning and management of NVIDIA networking resources in a Kubernetes cluster (NVIDIA NICs, integrates with NetQ.)

Network Management (Software)

NVIDIA NetQ

Required

Validated Ethernet fabric management.

AI Workload and GPU Orchestration

Run:ai

v2.21+

Dynamic scheduling and orchestration to accelerate AI workload throughput and maximize GPU utilization

AI Enterprise Application Software and Tools#

NVIDIA distributes AI and data science tools via NGC container images from its private registry. Release Branches are provided to meet organizational needs. Each image includes the necessary user-space software (i.e. CUDA libraries, cuDNN, TensorRT, and the framework). These NVIDIA AI Enterprise container images, such as the core AI stack and NeMo, are deployed on Kubernetes for easy management, upgrades, and deployment with zero downtime.

Component

Software

Version

Notes

Core AI Stack

NVIDIA AI Enterprise

Latest Supported Version

Includes CUDA, cuDNN, TensorRT, Triton, etc.

Inference Serving

NVIDIA NIM Operator

v1.0.1+

Manages deployment of NIM microservices.

RAG / Data Processing

NVIDIA NeMo Retriever

Latest Supported Version

Component of NeMo framework for RAG.

RAG / Data Processing

NVIDIA NeMo Customizer

Latest Supported Version

For fine-tuning models.

RAG / Data Processing

NVIDIA NeMo Curator

Latest Supported Version

For data curation pipelines.

RAG / Guardrails

NVIDIA NeMo Guardrails

Latest Supported Version

For adding safety layers to LLM apps.

Model Evaluation

NVIDIA NeMo Evaluator

Latest Supported Version

Tools for evaluating LLM performance.

Vector Search Acceleration

NVIDIA cuVS

Latest Supported Version

GPU-accelerated vector search library.

Use Case Examples

NVIDIA Blueprints (e.g., AI-Q)

Latest Supported Version

Reference implementations for common use cases.

Note

A key component of the Agentic AI blueprints is a foundational Retrieval Augmented Generation (RAG) pipeline. NVIDIA NeMo tools are included for implementing RAG pipelines.

Software Partner Integrations#

NVIDIA’s comprehensive ecosystem of technology experts include ISVs that bring advanced skills to design, build, and deliver the AI-accelerated computing solutions by integrating NVIDIA AI Enterprise libraries and developer tools into their platforms. A tight collaboration between NVIDIA and our software partner developers and engineers ensures highly optimized and reliable performance over the lifetime of supported application releases. The following software partners have enterprise product offerings that provide components for building Enterprise AI Factories and are categorized as follows:

Enterprise Kubernetes Platform

  • Canonical Kubernetes - Canonical provides a few offerings for on-premise Kubernetes solutions with full-lifecycle automation and long term support. Each integrates with the NVIDIA GPU Operator for leveraging NVIDIA hardware acceleration. They support the deployment of NVIDIA AI Enterprise, enabling AI workloads with NIM and accelerated libraries. Canonical’s focus on open-source, model-driven operations and ease of use offers enterprises flexible options for building their AI Factory on NVIDIA-accelerated infrastructure

  • Nutanix Kubernetes Platform (NKP) - As part of its on-premise Nutanix Cloud Platform (NCP), the Nutanix Kubernetes Platform (NKP) simplifies enterprise Kubernetes management by reducing operational complexity and ensuring consistent, secure deployment across hybrid multicloud environments. It provides centralized fleet management, policy enforcement, and AI-driven observability to streamline Day 2 operations. For AI workloads, NKP can run NVIDIA AI Enterprise , including NVIDIA NIM and NeMo, enabling enterprises to deploy and scale agentic AI applications efficiently. This collaboration allows IT teams to leverage optimized AI models, GPU-accelerated infrastructure, and secure endpoints while maintaining control over data privacy and costs.

  • Red Hat OpenShift - OpenShift is an enterprise Kubernetes platform due to its comprehensive, production-grade features that extend beyond standard Kubernetes. It offers enhanced security capabilities out-of-the-box (like Security Context Constraints - SCCs, integrated container registry with security scanning), robust developer and operator tools, integrated CI/CD pipelines, and enterprise-level support. For an AI Factory, its ability to manage complex, stateful AI workloads, provide multi-tenancy, and integrate seamlessly with a wide range of hardware (especially GPUs via operators) and software makes it a strong foundation for AI . Its focus on a consistent operational experience across hybrid cloud environments is also a key advantage for enterprises.

  • VMware Tanzu Platform - VMware Tanzu is an application platform that enables enterprises to modernize infrastructure and streamline Kubernetes management, allowing IT teams to deploy scalable, containerized workloads across hybrid environments. Tanzu optimizes AI/ML workloads by leveraging NVIDIA GPU-accelerated Tanzu Kubernetes clusters, with NVIDIA Operators for seamless resource provisioning. This combination supports NVIDIA AI Enterprise and Agentic AI tooling like NVIDIA NIM and NeMo.

Storage Solution

  • DDN - DDN provides high-performance, on-premise storage solutions (e.g., EXAScaler) frequently used in NVIDIA DGX SuperPOD and other large-scale AI/HPC deployments. They are NVIDIA-Certified and offer strong support for NVIDIA GPUDirect Storage, ensuring maximum data throughput to NVIDIA GPUs. DDN’s focus on massive parallelism and scalability makes them ideal for the most demanding AI training workloads and data-intensive tasks within an AI Factory utilizing NVIDIA NIM and accelerated libraries. They provide CSI drivers for Kubernetes integration.

  • Dell - Dell offers on-premise scale-out NAS (PowerScale ) and object storage (ECS ), frequently part of NVIDIA DGX POD and NVIDIA-Certified Systems. These solutions are optimized for NVIDIA AI workloads and provide CSI drivers for Kubernetes, enabling dynamic storage provisioning and accelerated data access for applications using NVIDIA libraries and NIM.

  • Hitachi Vantara - Hitachi Vantara’s on-premise enterprise storage integrates into NVIDIA-powered AI infrastructures and can support NVIDIA GPUDirect Storage for NVIDIA GPUs. They provide CSI drivers for Kubernetes, allowing for automated provisioning and management of persistent storage, enhancing data throughput for AI tasks utilizing NVIDIA accelerated libraries and NIM on NVIDIA hardware.

  • HPE - HPE offers a range of on-premise enterprise storage solutions, including Alletra for mission-critical workloads and HPE GreenLake for File Storage, which are designed to support AI/ML data pipelines. These solutions can be part of NVIDIA-Certified configurations and support technologies like NVIDIA GPUDirect Storage. With CSI drivers for Kubernetes, HPE storage provides a scalable and resilient foundation for AI Factory data, supporting applications using NVIDIA NIM and accelerated libraries on NVIDIA hardware.

  • IBM Storage Scale is built for AI, high-performance computing, and analytics supporting the full range of NVIDIA technologies. IBM Storage Scale provides the flexibility of software-defined global data platform, extensible metadata, multi-tenancy, container native/CSI, and high-throughput object storage, while Storage Scale System, is optimized for clustered low-latency, scalable performance for the most demanding enterprise deployments.

  • Netapp - NetApp provides on-premise, NVIDIA-Certified storage solutions optimized for GPU-accelerated workloads on NVIDIA hardware. Through its Astra Trident CSI driver, it offers seamless and dynamic storage provisioning for Kubernetes, supporting high-performance access for applications using NVIDIA accelerated libraries and NIM. This ensures efficient data handling with robust data management features for AI.

  • Nutanix Unified Storage - As part of its on-premise Nutanix Cloud Platform (NCP), Nutanix Unified Storage offers integrated file, object, and block storage solutions. These are designed to support AI workloads running on the HCI platform, which itself supports NVIDIA AI Enterprise and vGPU. The storage is provisioned and managed within the Nutanix ecosystem, providing a simplified and scalable data foundation for AI applications, including those using NVIDIA NIM and accelerated libraries on NVIDIA hardware, with CSI driver support for Kubernetes.

  • Pure Storage - Pure Storage provides on-premise all-flash solutions like FlashBlade and FlashArray, optimized for NVIDIA GPU-direct technologies and NVIDIA-Certified Systems. They offer robust Kubernetes integration through their Pure Service Orchestrator (CSI driver) and Portworx by Pure Storage for cloud-native storage and data management, ensuring high IOPS and low latency for AI workloads using NVIDIA accelerated libraries and NIM on NVIDIA hardware.

  • Vast - Vast Data’s on-premise platform is designed for high-throughput, low-latency access, often integrating with NVIDIA GPUDirect Storage to accelerate AI/ML workloads (using NIM and accelerated libraries) on NVIDIA GPUs. VAST InsightEngine eliminates the bottlenecks of traditional AI architectures, enabling real-time, event-driven AI decision-making. Its CSI driver enables dynamic provisioning and simplified storage management for containerized AI applications within Kubernetes environments, ideal for massive datasets and vector databases.

  • Weka - Weka offers a high-performance, on-premise parallel file system (WEKApod often built on NVIDIA-Certified Systems) specifically engineered for AI/ML and HPC workloads. It provides exceptional throughput and low latency, supports NVIDIA GPUDirect Storage, and is frequently chosen for large-scale NVIDIA GPU deployments. Its robust CSI driver ensures seamless integration with Kubernetes for demanding AI training and inference tasks utilizing NVIDIA NIM and accelerated libraries.

Agentic AI Developer Partner Tools

  • Accenture AI Refinery - AI Refinery is an Enterprise Gen AI / Agentic AI platform designed to help companies turn raw AI technology into useful business solutions. Built on NVIDIA technology and NVIDIA AI Enterprise software, AI Refinery supports the entire life cycle of the enterprise Generative and Agentic AI – from model customization & serving, agent building & evaluation, knowledge & data processing to governance & observability. Leveraging AI Factory, AI Refinery can deploy pre-built industry solutions on-premise at an accelerated rate. AI Refinery platform enables orchestration of agents from ecosystem providers through Accenture’s proprietary trusted agent huddle.

  • CrewAI - CrewAI is an open-source framework for orchestrating role-playing, autonomous AI agents. Deployable on-premise, it allows developers to build sophisticated multi-agent systems that can collaborate to solve complex tasks. In an NVIDIA AI Factory, CrewAI can leverage LLMs deployed as NVIDIA NIMs for agent reasoning and decision-making, with the underlying computations accelerated by NVIDIA hardware. This enables the creation of powerful, customized agentic workflows that benefit from NVIDIA’s accelerated computing and AI software stack.

  • Dataiku - Dataiku is available as an on-premise enterprise AI platform that integrates with Kubernetes environments running NVIDIA GPUs. It allows for the development and operationalization of models that can leverage NVIDIA accelerated libraries and supports workflows that may incorporate NVIDIA NIM for inference, all accelerated by NVIDIA hardware. It also offers both a low-code and advanced workflow building platform.

  • DataStax - DataStax provides enterprise solutions that integrate NVIDIA technologies built on Apache Cassandra® and with OpenSearch, deployable on-premise via Kubernetes operators like KubeStax (or the open-source K8ssandra). These offerings deliver a highly scalable NoSQL database with integrated vector search capabilities, making them well-suited as a foundational data layer for an AI Factory, particularly for powering real-time Generative AI and RAG applications. The ability to handle massive datasets and provide low-latency vector search is critical for AI agents developed with frameworks like Langflow that rely on retrieving context for LLMs like NVIDIA NIM. By serving as a robust backend for contextual data and vector embeddings, DataStax’s on-premise solutions support AI agents accelerated by NVIDIA hardware and leveraging NVIDIA’s accelerated libraries. DataStax products include Hyper Converged Database (on premise database), AI Platform (powered by NVIDIA AI Enterprise), Langflow (with NIM integrations) and Hybrid Search (semantic and vector search powered by NeMo Retriever)

  • DataRobot - DataRobot significantly accelerates the AI development lifecycle by automating many of the complex and time-consuming tasks involved in building, training, deploying, and managing and agents, including embedded support for deploying NVIDIA AI Enterprise and NVIDIA NIM . For an AI Factory focused on agents, DataRobot can help rapidly prototype and deploy the resources that might power the intelligence of these agents, allowing developers to focus more on the agentic logic and integration rather than model tuning from scratch. Its features also ensure models are monitored and managed effectively in production.

  • Deloitte Zora AI - Zora AI is an Enterprise Agentic AI platform that simplifies operations, boosts productivity and efficiency, and drives more confident decision-making in enterprises. Zora AI agents deliver industry-specific solutions, augmented with extensive industry knowledge and reasoning capabilities leveraging NVIDIA NIM and NeMo. Zora AI enacts Deloitte’s Trustworthy AITM principles, including a human feedback loop, to establish transparency and trust with users. With the NVIDIA Enterprise AI Factory, Zora AI can help its customers in regulated industries deploy AI systems on-premises quickly, offering strong data security guarantees and flexibility in technology options within a trusted ecosystem.

  • Domino Data Lab - Domino Data Lab, an NVIDIA AI Accelerated program partner, offers an on-premise MLOps platform integrating with NVIDIA AI Enterprise. It scales GPU infrastructure efficiently on NVIDIA hardware and governs models, supporting AI agent development using NVIDIA accelerated libraries and NIM.

  • HPE Private Cloud AI - HPE provides enterprise-grade solutions for AI, with HPE Ezmeral software enabling the deployment and management of containerized AI/ML workloads on HPE’s infrastructure, which includes NVIDIA-Certified Systems. HPE Private Cloud AI (PC AI), co-developed with NVIDIA, offers a turnkey solution that integrates NVIDIA AI computing, networking, and software (like NVIDIA AI Enterprise) with HPE’s Ezmeral software and infrastructure, aiming to simplify and accelerate AI adoption for enterprises.

  • Elastic - The Elastic Stack (Elasticsearch, Kibana, Beats, Logstash) provides on-premise solutions for search, observability, and security. For AI Factories, Elasticsearch’s vector search is essential for RAG applications (storing embeddings generated by models on NVIDIA GPUs and queried by NIM-powered agents). The broader stack enables log aggregation, metrics monitoring (including NVIDIA GPU metrics via DCGM exporters), and visualization (Kibana) of the entire AI platform, supporting workloads using NVIDIA accelerated libraries and NIM.

  • EY.ai Agentic Platform – EY.ai Agentic Platform is designed to deliver secure and scalable AI solutions for organizations, starting with tax, risk, and finance. It automates processes and enhances decision-making for better business outcomes, while maintaining data privacy and regulatory compliance. The platform will be built on NVIDIA AI Enterprise software, NIM microservices, and the NeMo Framework, trained on EY’s curated data and shaped by EY’s deep domain expertise. EY.ai Agentic Platform with NVIDIA Enterprise AI Factory delivers data security, control and low latency to the enterprises that need them most in their transformation journey.

  • H2O.ai - H2O.ai offers an on-premise AI platform (H2O AI Cloud, including Enterprise h2oGPTe and Driverless AI) designed for building and deploying both predictive and generative AI models, including agentic AI applications. Their software is optimized for NVIDIA GPUs (leveraging NVIDIA RAPIDS and accelerated libraries) and supports Kubernetes for scalability. This enables enterprises to develop and operationalize AI agents that can utilize NVIDIA NIM for inference, all within their own data centers on NVIDIA hardware.

  • JFrog Artifactory - Artifactory is a universal artifact repository manager. In an AI Factory, this is crucial for managing the lifecycle of all binaries, including container images from NGC for AI applications and agents, Python packages, model files, and other dependencies. It provides a single source of truth for all build artifacts, supports versioning, and integrates with CI/CD tools to ensure reproducible and reliable builds and deployments. Its security features, particularly when combined with JFrog Xray, provide deep artifact analysis, vulnerability scanning, and license compliance, which are critical for maintaining a secure software supply chain. JFrog integrates with NVIDIA NIM by embedding NIM microservices and models into Artifactory’s unified artifact management framework, enabling centralized governance, secure distribution, and streamlined DevSecOps workflows post-organizational approval.

  • Nutanix Enterprise AI - Nutanix Enterprise AI provides a full-stack, on-premise AI software platform built on the Nutanix Cloud Platform (NCP), often incorporating their “GPT-in-a-Box” concept. Beyond core HCI, it offers integrated MLOps capabilities, tools for managing large language models, and simplified deployment of AI workloads. It’s designed in partnership with NVIDIA to run NVIDIA AI Enterprise software, including NIMs and accelerated libraries, on NVIDIA-Certified Systems or systems with NVIDIA GPUs, providing a streamlined path for developers to build and deploy AI agents.

  • OpenShift AI - OpenShift AI extends the core OpenShift Container Platform with a dedicated platform for on-premise AI/ML development and deployment. It provides data scientists and developers with integrated tools for the entire model lifecycle, including Jupyter notebooks, model training services, model serving capabilities ( integrating with NVIDIA NIM), and monitoring tools. Its value lies in streamlining AI workflows and providing a consistent open source environment for developing AI agents on NVIDIA-accelerated infrastructure, leveraging NVIDIA AI Enterprise and its libraries.

  • SuperAnnotate - SuperAnnotate provides a comprehensive data annotation platform that supports on-premise data storage and workflows, crucial for AI data preparation. It integrates with NVIDIA technologies like NVIDIA NeMo Evaluator, allowing AI teams to incorporate both human and AI-assisted (LLM-as-a-judge) evaluation for data quality and model assessment. This supports the development of high-quality datasets for training models that will be accelerated by NVIDIA hardware and potentially served via NIM.

  • Unstructured.io - Through its integration with NVIDIA NeMo Retriever Extraction, Unstructured enables high-performance processing of multimodal content—such as text, tables, and charts—from large-scale, complex documents like enterprise PDFs. This collaboration empowers enterprises to prepare vast and varied data efficiently for AI agents and RAG systems, leveraging the speed and accuracy of NVIDIA-accelerated libraries and hardware to meet the demands of scalable, high-performance AI deployments.

  • VMware Private AI Foundation with NVIDIA - VMware Private AI Foundation with NVIDIA allows enterprises to leverage their existing VMware infrastructure (vSphere, vCenter, Tanzu) to run AI/ML workloads using NVIDIA AI Enterprise and NVIDIA GPUs (including vGPU capabilities). This solution provides a familiar operational model for IT teams, enabling them to manage and scale AI applications alongside traditional enterprise workloads, with a focus on governance, security, and efficient resource utilization in virtualized environments.

Observability Partner Tools

  • Arize AI - Arize AI’s engineering platform offers AI observability and evaluation, integrating with NVIDIA NeMo microservices to enable enterprises to build reliable agentic AI. This collaboration creates an AI data flywheel, enhancing LLM performance by combining Arize’s evaluation tools with NVIDIA NeMo’s capabilities for model training, evaluation, and safety guardrails. Consequently, enterprises can automatically identify LLM failure modes, route complex cases for human feedback, and continuously refine their models through targeted fine-tuning. This process facilitates the development and deployment of accurate agentic AI systems, supported by Arize’s self-hosted deployment option for on-premise AI application management.

  • Datadog - Datadog is a comprehensive observability platform providing AI application monitoring and security insights. It offers broad visibility across infrastructure, including NVIDIA GPUs, applications, and logs. For an NVIDIA Enterprise AI Factory, Datadog can monitor the performance of deployed AI agents, track custom business metrics, provide distributed tracing to understand request flows through complex agent interactions, and offer security monitoring for production workloads. Its ability to correlate data from various sources, including from OTEL, helps in quickly identifying and resolving issues.

  • Dynatrace - Dynatrace delivers full-stack observability, and AI-powered analytics to enterprises. It automatically discovers, maps, and monitors complex, dynamic cloud environments, providing real-time visibility into applications, infrastructure, user experience, and key business metrics. With its proprietary AI engine, Davis®, Dynatrace enables DevOps, SRE, and security teams to proactively detect anomalies, accelerate issue resolution, and optimize performance at scale. As an NVIDIA partner, Dynatrace also offers on-premise observability for Kubernetes clusters running NVIDIA GPUs, AI workloads and applications leveraging NVIDIA AI Enterprise, NIM, and accelerated libraries.

  • Fiddler AI - Fiddler AI is an AI Observability and Model Performance Management (MPM) platform designed to help MLOps and data science teams monitor, explain, analyze, and improve production AI models. For an AI Factory deploying numerous agents powered by potentially complex models (including LLMs), Fiddler AI provides crucial capabilities for detecting model drift, data integrity issues, performance degradation, and biases. Its explainability features help understand model predictions, which is vital for debugging agent behavior and ensuring responsible AI practices. Integration with NVIDIA platforms can ensure that models running on NVIDIA hardware are effectively monitored for operational health and predictive performance.

  • Weights & Biases - Weights & Biases, an NVIDIA partner provides powerful tools for visualizing, debugging, and iterating on AI/ML models, with W&B Weave being a primary component for these tasks. Weave allows AI Developers to create dynamic, interactive dashboards and reports to deeply analyze model outputs, track predictions, compare model versions, and understand complex datasets. While W&B also supports experiment tracking (logging metrics, hyperparameters, and artifacts), its strength with Weave in providing rich, shareable insights into model behavior and data makes it invaluable for collaboration on complex AI agent development, debugging, and performance management within the NVIDIA Enterprise AI Factory.

Security Partner Tools

  • ActiveFence - ActiveFence provides a platform that connects to on-premise solutions for Trust & Safety, integrating NVIDIA NeMo Guardrails, specializing in detecting and mitigating harmful content (e.g., hate speech, disinformation, CSAM) in online platforms. For an AI Factory deploying agents that interact with user-generated content or generate content themselves, ActiveFence helps ensure agent outputs are safe and compliant, protecting the integrity of applications running on NVIDIA hardware.

  • CrowdStrike - CrowdStrike provides robust endpoint detection and response (EDR) and threat intelligence capabilities. In an AI Factory, CrowdStrike secures the hardware, containers, and AI application endpoints. Its cloud-native platform provides real-time visibility and threat hunting capabilities leveraging NVIDIA AI Enterprise, which are essential for securing the valuable IP (models, data) and the operational integrity of the AI platform.

  • Galileo - Galileo’s AI reliability platform enables enterprise-scale evaluation, iteration, monitoring, and protection of generative AI applications, with on-premises deployment. Its integration with NVIDIA NeMo microservices facilitates AI data flywheels for continuous optimization and high accuracy in agentic AI. This is achieved through comprehensive evaluation with NVIDIA NeMo Evaluator, assessing agent reasoning and awareness; real-time observability for production insights feeding the flywheel; and Galileo Protect with NVIDIA NeMo Guardrails for robust, low-latency safety measures against hallucinations and malicious inputs while ensuring compliance.

  • Securiti.ai - Securiti.ai offers an AI-powered Data+AI Security & Governance platform that can be deployed on-premise. For an AI Factory, it provides critical capabilities for discovering, classifying, and securing sensitive data used in AI model training and RAG pipelines, including those leveraging NVIDIA NIM and accelerated libraries. Its ability to enforce data privacy and governance policies across the AI lifecycle is crucial for responsible AI development on NVIDIA hardware.

  • Trend Micro - Trend Micro provides comprehensive cybersecurity solutions that can be deployed on-premise, offering protection for servers, containers, and networks within the AI Factory. Their solutions leverage NVIDIA AI and accelerated computing to help secure the underlying infrastructure (including NVIDIA-Certified systems) and workloads running NVIDIA AI Enterprise, NIM, and accelerated libraries from malware, vulnerabilities, and other threats, contributing to the overall security posture of the NVIDIA hardware-accelerated environment.