Skip to main content
Ctrl+K
NVIDIA NeMo Microservices - Home NVIDIA NeMo Microservices - Home

NVIDIA NeMo Microservices

NVIDIA NeMo Microservices - Home NVIDIA NeMo Microservices - Home

NVIDIA NeMo Microservices

Table of Contents

About NeMo Microservices

  • Overview
  • Key Features
  • Concepts
    • Platform
    • Entities
    • Customization
    • Evaluation
    • Inference
    • Guardrails
  • Release Notes

Get Started

  • About Getting Started
  • Demo Cluster Setup on Minikube
    • Requirements
    • Set Up Using Deployment Scripts
    • Set Up Manually
  • Advanced Installation on Kubernetes
    • Install
    • Ingress Setup
    • Upgrade
    • Uninstall
  • Install the NeMo Microservices Python SDK
  • Beginner Tutorials
    • Deploy NIM
    • Customize and Evaluate LLMs
    • Add Safety Checks to LLMs
    • Use Llama Stack APIs

Jupyter Notebooks

  • Data Flywheel and Tool Calling

Manage Entities

  • About Managing Entities
  • Tutorials
    • Set Up Organizational Entities
    • Create Dataset Files
  • Namespaces
    • Create Namespace
    • Update Namespace
    • Get Namespace
    • List Namespaces
    • Delete Namespace
  • Projects
    • Create Project
    • Update Project
    • Get Project
    • List Projects
    • Delete Project
  • Datasets
    • Create Dataset
    • Get Dataset
    • Update Dataset
    • List Datasets
    • Delete Dataset
  • Models
    • Get Details of a Model
    • Update Model
    • List Models
    • Delete Model
  • Entity Fields Reference

Fine-Tune

  • About Fine-Tuning
  • Tutorials
    • Format Training Dataset
    • Start a LoRA Model Customization Job
    • Start a Full SFT Customization Job
    • Start a Knowledge Distillation (KD) Customization Job
    • Checking Your Customization Job Metrics
    • Optimize for Tokens/GPU Throughput
  • Model Catalog
    • Llama Models
    • Llama Nemotron Models
    • Phi Models
  • Manage Targets
    • Create Target
    • Get Target Details
    • List Targets
    • Update Target
    • Delete Target
    • Target Values
  • Manage Configs
    • Create Config
    • Get Config Details
    • List Configs
    • Update Config
  • Manage Jobs
    • Create Job
    • Get Job Status
    • List Active Jobs
    • Cancel Job
    • Hyperparameter Options

Evaluate

  • About Evaluating
  • Install Evaluator
    • Helm Chart
    • Docker Compose
    • Chart Config Options
  • Tutorials
    • Run a Simple Job
  • Evaluation Types
    • Agentic
    • BFCL
    • BigCode
    • Custom
    • LM Harness
    • RAG
    • Retriever
  • Targets
    • Create Target
    • Delete Target
    • Data Source Targets
    • LLM Model Targets
    • Retriever Pipeline Targets
    • RAG Pipeline Targets
    • Target Schema
  • Configurations
    • Create Config
    • Delete Config
    • Config Schema
  • Jobs
    • Create Job
    • Get Job Details
    • Get Job Status
    • List Jobs
    • Get Job Results
    • Download Detailed Results
    • Get Job Logs
    • Delete Job
    • Job Target & Config Matrix
    • Job Durations
    • Job Schema
  • Live Evaluations
  • Custom Evaluations
    • Data Format
    • Task Templating
    • Metrics
    • Output Format & Results
  • Results
  • Filter and Sort Responses
  • Support Matrix

Deploy NIM and Run Inference

  • About Deploying and Running Inference on NIM
  • Tutorials
    • Deploy NIM
    • Run Inference on NIM
  • Manage NIM Deployments
    • Deploy NIM Microservice
    • Get NIM Deployment Details
    • List Deployments
    • Update Deployment
    • Delete NIM Deployment
    • Create Configuration
    • Get Configuration
    • List Configurations
    • Update Configuration
    • Delete Configuration
  • Run Inference on NIM
    • Health Check
    • List Models
    • Chat Completions
    • Completions
    • Embeddings

Manage Guardrails

  • About Guardrails
  • Terminology
  • Tutorials
    • Demo Configuration
    • Multiple NIM for LLMs
    • NemoGuard NIM
    • Multimodal Data
    • Injection Detection
    • Custom Dependencies
    • Custom HTTP Headers
    • Custom LLM Providers
    • Deploying with Docker
  • Manage Configurations
    • Configuration Store
    • Creating a Configuration
    • Listing Configurations
    • Getting a Configuration
    • Updating a Configuration
    • Deleting a Configuration
  • Manage Access to Models
  • Check a Guardrail
  • Inference with Guardrails
  • Streaming Output
  • Reference

Admin Setup

  • About Admin Setup
  • Helm Installation Scenarios
  • Helm Installation Overview
  • Install as Platform
    • Install
    • Ingress Setup
    • Upgrade
    • Uninstall
  • Install Individually
    • NeMo Data Store
    • NeMo Entity Store
    • NeMo Operator
    • NeMo Customizer
    • NeMo Evaluator
      • Helm Chart
      • Docker Compose
      • Chart Config Options
    • NeMo Guardrails
    • DGX Cloud Admission Controller
    • NeMo Deployment Management
    • NeMo NIM Proxy
  • Configure Models
  • Manage GPUs
    • Configure Cluster GPUs
    • Model Configurations Matrix
    • Troubleshooting GPU Jobs
  • Custom Resource Definitions
  • Manage Storage
    • Databases
      • PostgreSQL
      • Milvus
    • PVCs
      • AWS Peristent Volumes
      • Oracle Persistent Volumes
    • Object Storage
      • Amazon S3
    • Backup and Restore
  • Manage Secrets
    • Secrets for Accessing NGC Catalog
    • External Database Secrets
    • JSON Web Token Secrets
    • Object Store Secrets
    • MLFlow Customizer Secrets
    • Weights & Biases Keys
  • Open Telemetry Setup
  • Tenant Configuration Options
  • Security for NeMo Microservices

Reference

  • NeMo Microservice APIs
    • Platform
    • Entity Store
    • Customizer
    • Evaluator
    • Guardrails
    • NIM Proxy
    • Deployment Management
  • NeMo Microservices Python SDK
    • Client APIs
    • Resource APIs
      • Customization
        • Customization Configs Resource
        • Customization Resource
        • Customization Jobs Resource
        • Customization Targets Resource
      • Deployment
        • Deployment Resource
        • Deployment Configs Resource
        • Deployment Model Deployments Resource
      • Entity
        • Entity Datasets Resource
        • Entity Models Resource
        • Entity Namespaces Resource
        • Entity Projects Resource
      • Evaluation
        • Evaluation Resource
        • Evaluation Configs Resource
        • Evaluation Jobs Resource
        • Evaluation Results Resource
        • Evaluation Targets Resource
      • Guardrails
        • Guardrail Resource
        • Guardrail Configs Resource
        • Guardrail Models Resource
        • Guardrail Completions Resource
      • Inference
        • Inference Resource
        • Inference Models Resource
        • Inference Chat Resource
        • Inference Chat Completions Resource
        • Inference Completions Resource
        • Inference Embeddings Resource
    • Type APIs
      • Chat Types
        • ChatCompletionResponse
        • ChatCompletionStreamResponse
        • CompletionCreateParams
      • Customization Types
        • ConfigCreateParams
        • ConfigUpdateParams
        • CustomizationJob
        • DatasetCu
        • DatasetParameters
        • Hyperparameters
        • JobEntry
        • JobCreateParams
        • LoraParameters
        • PTuningParameters
        • SftParameters
        • WandBIntegration
      • Deployment Types
        • ConfigCreateParams for Deployment
        • ConfigListParams
        • ConfigUpdateParams for Deployment
        • DeploymentConfigFilterParam
        • DeploymentConfigsPage
        • ModelDeployment
        • ModelDeploymentCreateParams
        • ModelDeploymentFilterParam
        • ModelDeploymentListParams
        • ModelDeploymentStatusDetails
        • ModelDeploymentUpdateParams
        • ModelDeploymentsPage
      • Evaluation Types
        • ConfigCreateParams for Evaluation
        • ConfigListParams for Evaluation
        • ConfigUpdateParams for Evaluation
        • EvaluationConfigsPage
        • EvaluationJob
        • EvaluationJobFilter
        • EvaluationJobFilterParam
        • EvaluationJobsPage
        • EvaluationResultFilter
        • EvaluationResultFilterParam
        • EvaluationResultsPage
        • EvaluationTargetsPage
        • GroupResultParam
        • JobCreateParams for Evaluation
        • JobListParams
        • MetricResultParam
        • ResultCreateParams
        • ResultListParams
        • ResultUpdateParams
        • TargetCreateParams
        • TargetListParams
        • TargetUpdateParams
        • TaskResultParam
      • Guardrail Types
        • CompletionCreateParams
        • ConfigCreateParams
        • ConfigListParams
        • ConfigUpdateParams
        • GuardrailCompletionResponse
        • GuardrailCompletionStreamResponse
        • GuardrailConfigFilterParam
        • GuardrailConfigsPage
        • ModelListParams
        • ModelsResponse
        • ModelsResponseEntry
      • Shared Types
        • ActionRails
        • APIEndpointData
        • APIEndpointFormat
        • ArtifactStatus
        • AutoAlignOptions
        • AutoAlignRailConfig
        • BackendEngineType
        • ChatCompletionAssistantMessageParam
        • ChatCompletionContentPartImageParam
        • ChatCompletionContentPartTextParam
        • ChatCompletionFunctionMessageParam
        • ChatCompletionMessageToolCallParam
        • ChatCompletionMessageToolCall
        • ChatCompletionMessage
        • ChatCompletionResponseChoice
        • ChatCompletionResponseStreamChoice
        • ChatCompletionSystemMessageParam
        • ChatCompletionTokenLogprob
        • ChatCompletionToolMessageParam
        • ChatCompletionUserMessageParam
        • ChoiceDeltaFunctionCall
        • ChoiceDeltaToolCallFunction
        • ChoiceDeltaToolCall
        • ChoiceLogprobs
        • ClavataRailConfig
        • ClavataRailOptions
        • CompletionResponseChoice
        • CompletionResponseStreamChoice
        • ConfigDataParam
        • ConfigData
        • DeleteResponse
        • DeltaMessage
        • DialogRails
        • ErrorResponse
        • FactCheckingRailConfig
        • FiddlerGuardrails
        • FinetuningType
        • FunctionCall
        • Function
        • GenericSortField
        • GuardrailConfigParam
        • GuardrailConfig
        • GuardrailModel
        • HTTPValidationError
        • ImageURL
        • InferenceParams
        • InjectionDetection
        • InputRails
        • Instruction
        • JailbreakDetectionConfig
        • JobStatus
        • LogProbs
        • LoraFinetuningData
        • MessageTemplate
        • ModelArtifact
        • ModelPrecision
        • ModelSpec
        • OutputRailsStreamingConfig
        • OutputRails
        • Ownership
        • PTuningFinetuningData
        • PaginationData
        • ParameterEfficientFinetuningData
        • PatronusEvaluateAPIParams
        • PatronusEvaluateConfigParam
        • PatronusEvaluateConfig
        • PatronusEvaluationSuccessStrategy
        • PatronusRailConfigParam
        • PatronusRailConfig
        • PrivateAIDetectionOptions
        • PrivateAIDetection
        • PromptData
        • RailsConfigDataParam
        • RailsConfigData
        • RailsParam
        • Rails
        • ReasoningModelConfig
        • ReasoningParams
        • RetrievalRails
        • SensitiveDataDetectionOptions
        • SensitiveDataDetection
        • SingleCallConfig
        • TaskPrompt
        • TopLogprob
        • UsageInfo
        • UserMessagesConfig
        • ValidationError
        • VersionTag
      • Shared Params Types
        • ActionRails
        • APIEndpointData
        • APIEndpointFormat
        • ArtifactStatus
        • AutoAlignOptions
        • AutoAlignRailConfig
        • BackendEngineType
        • ChatCompletionAssistantMessageParam
        • ChatCompletionContentPartImageParam
        • ChatCompletionContentPartTextParam
        • ChatCompletionFunctionMessageParam
        • ChatCompletionMessageToolCallParam
        • ChatCompletionSystemMessageParam
        • ChatCompletionToolMessageParam
        • ChatCompletionUserMessageParam
        • ClavataRailConfig
        • ClavataRailOptions
        • ConfigDataParam
        • DialogRails
        • FactCheckingRailConfig
        • FiddlerGuardrails
        • FinetuningType
        • Function
        • FunctionCall
        • GenericSortField
        • GuardrailConfigParam
        • GuardrailModel
        • ImageURL
        • InferenceParams
        • InjectionDetection
        • InputRails
        • Instruction
        • JailbreakDetectionConfig
        • JobStatus
        • LoraFinetuningData
        • MessageTemplate
        • ModelArtifact
        • ModelPrecision
        • ModelSpec
        • OutputRails
        • OutputRailsStreamingConfig
        • Ownership
        • ParameterEfficientFinetuningData
        • PatronusEvaluateAPIParams
        • PatronusEvaluateConfigParam
        • PatronusEvaluationSuccessStrategy
        • PatronusRailConfigParam
        • PTuningFinetuningData
        • PrivateAIDetection
        • PrivateAIDetectionOptions
        • PromptData
        • RailsConfigDataParam
        • RailsParam
        • ReasoningModelConfig
        • ReasoningParams
        • RetrievalRails
        • SensitiveDataDetection
        • SensitiveDataDetectionOptions
        • SingleCallConfig
        • TaskPrompt
        • UserMessagesConfig
        • VersionTag
      • Common Types
        • ActivatedRail
        • ArtifactStatusDe
        • BackendEngineTypeDe
        • BaseModelFilterParam
        • BaseModelFilter
        • CachedOutputsDataParam
        • CachedOutputsData
        • CompletionCreateParams
        • CompletionResponse
        • CompletionStreamResponse
        • CreateEmbeddingResponse
        • CreatedAtFilterParam
        • CreatedAtFilter
        • CustomizationConfigParam
        • CustomizationConfig
        • CustomizationTargetParam
        • CustomizationTarget
        • CustomizationTrainingOptionParam
        • CustomizationTrainingOption
        • DatasetCreateParams
        • DatasetEvParam
        • DatasetEv
        • DatasetFilterParam
        • DatasetFilter
        • DatasetListParams
        • DatasetSortField
        • DatasetUpdateParams
        • Dataset
        • DatasetsPage
        • DateTimeFilterParam
        • DateTimeFilter
        • DeploymentConfigParam
        • DeploymentConfig
        • EmbeddingCreateParams
        • Embedding
        • EvaluationConfigFilterParam
        • EvaluationConfigFilter
        • EvaluationConfigParam
        • EvaluationConfig
        • EvaluationLiveParams
        • EvaluationParamsParam
        • EvaluationParams
        • EvaluationResult
        • EvaluationStatusDetailsParam
        • EvaluationStatusDetails
        • EvaluationTargetFilterParam
        • EvaluationTargetFilter
        • EvaluationTargetParam
        • EvaluationTarget
        • ExecutedAction
        • ExternalEndpointConfigParam
        • ExternalEndpointConfig
        • FinetuningTypeDe
        • GenerationLogOptionsParam
        • GenerationLog
        • GenerationOptionsParam
        • GenerationRailsOptionsParam
        • GenerationStats
        • GroupConfigParam
        • GroupConfig
        • GroupResult
        • GuardrailCheckParams
        • GuardrailCheckResponse
        • GuardrailConfigDeParam
        • GuardrailConfigDe
        • GuardrailsDataParam
        • GuardrailsData
        • LabelSelectorRequirementParam
        • LabelSelectorRequirement
        • LabelSelectorTermParam
        • LabelSelectorTerm
        • LiveEvaluation
        • LlmCallInfo
        • MetricConfigParam
        • MetricConfig
        • MetricResult
        • ModelArtifactDeParam
        • ModelArtifactDe
        • ModelCreateParams
        • ModelDeParam
        • ModelDe
        • ModelEvParam
        • ModelEv
        • ModelFilterParam
        • ModelFilter
        • ModelListParams
        • ModelParam
        • ModelPeftFilterParam
        • ModelPeftFilter
        • ModelPrecisionDe
        • ModelSortField
        • ModelSpecDeParam
        • ModelSpecDe
        • ModelUpdateParams
        • Model
        • ModelsPage
        • NamespaceCreateParams
        • NamespaceListParams
        • NamespaceUpdateParams
        • Namespace
        • NamespacesPage
        • NIMDeploymentConfigParam
        • NIMDeploymentConfig
        • NodeAffinityParam
        • NodeAffinity
        • NodeSelectorParam
        • NodeSelectorTermParam
        • NodeSelectorTerm
        • NodeSelector
        • ParameterEfficientFinetuningDataDeParam
        • ParameterEfficientFinetuningDataDe
        • PreferredSchedulingTermParam
        • PreferredSchedulingTerm
        • ProjectCreateParams
        • ProjectFilterParam
        • ProjectFilter
        • ProjectListParams
        • ProjectSortField
        • ProjectUpdateParams
        • Project
        • ProjectsPage
        • PromptDataDeParam
        • PromptDataDe
        • RagPipelineDataParam
        • RagPipelineData
        • RagTargetParam
        • RagTarget
        • RailStatus
        • RetrieverPipelineDataParam
        • RetrieverPipelineData
        • RetrieverTargetParam
        • RetrieverTarget
        • ScoreParam
        • ScoreStatsParam
        • ScoreStats
        • Score
        • StatusEnum
        • TargetStatus
        • TargetType
        • TaskConfigParam
        • TaskConfig
        • TaskResult
        • TaskStatus
        • TolerationParam
        • Toleration
        • TrainingPodSpecParam
        • TrainingPodSpec
        • TrainingType
  • Requirements
  • NeMo Helm Chart Values
    • Platform
    • DGX Cloud Admission Controller
  • Troubleshooting Guide
    • Troubleshoot Customizer
    • Troubleshoot Evaluator
    • Troubleshoot Guardrails
    • Troubleshoot Setup
  • NVIDIA Distribution in Llama Stack
  • Governing Terms

Resources

  • OSS License Acknowledgements
  • NeMo Microservices Python SDK
  • Resource APIs
  • Inference Resource APIs

Inference Resource APIs#

The Inference Resources are for creating and managing chat sessions through inference microservices.

  • Inference Resource
  • Inference Models Resource
  • Inference Chat Resource
  • Inference Chat Completions Resource
  • Inference Completions Resource
  • Inference Embeddings Resource

previous

Guardrail Completions Resource

next

Inference Resource

NVIDIA NVIDIA
Privacy Policy | Manage My Privacy | Do Not Sell or Share My Data | Terms of Service | Accessibility | Corporate Policies | Product Security | Contact

Copyright © 2024-2025, NVIDIA Corporation.

Last updated on Jul 11, 2025.