- Introduction
- Release Notes
- Getting Started
- Deployment Guide
- Tutorials
- Multi-node Deployment
- Deploying with Helm
- Configuring a NIM
- Model Profiles
- Benchmarking
- Models
- Support Matrix
- Hardware
- Software
- GPUs
- Supported Models
- Code Llama 13B Instruct
- Code Llama 34B Instruct
- Code Llama 70B Instruct
- (Meta) Llama 2 7B Chat
- (Meta) Llama 2 13B Chat
- (Meta) Llama 2 70B Chat
- Llama 3 SQLCoder 8B Instruct
- Llama 3 Swallow 70B Instruct V0.1
- Llama 3 Taiwan 70B Instruct
- Llama 3.1 8B Base
- Llama 3.1 8B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 405B Instruct
- Llama 3.1 Nemotron 70B Instruct
- Meta Llama 3 8B Instruct
- Meta Llama 3 70B Instruct
- Mistral 7B Instruct V0.3
- Mistral NeMo Minitron 8B 8K Instruct
- Mistral NeMo 12B Instruct
- Mixtral 8x7B Instruct V0.1
- Mixtral 8x22B Instruct V0.1
- Nemotron 4 340B Instruct
- Nemotron 4 340B Instruct 128K
- Nemotron 4 340B Reward
- Phi 3 Mini 4K Instruct
- Phind Codellama 34B V2
- Code Llama 13B Instruct
- Examples with system role
- API Reference
- Function Calling
- Parameters
<a href="function-calling.html#tool-choice-options">tool_choice</a>
- Example Workflows
- Parameters
- Using Reward Models
- Llama Stack API (Experimental)
- Utilities
- Fine-tuned model support
- Observability
- Structured Generation
- Parameter-Efficient Fine-Tuning
- KV Cache Reuse (a.k.a. prefix caching)
- Acknowledgements
- accelerate
- aiohttp
- aiosignal
- annotated-types
- anyio
- apscheduler
- async-timeout
- attrs
- build
- certifi
- charset-normalizer
- click
- cmake
- coloredlogs
- datasets
- dill
- diskcache
- dnspython
- einops
- email_validator
- exceptiongroup
- fastapi
- fastapi-cli
- filelock
- flash-attn
- frozenlist
- fsspec
- h11
- h5py
- httpcore
- httptools
- httpx
- huggingface-hub
- humanfriendly
- idna
- importlib_metadata
- interegular
- jinja2
- joblib
- jsonschema
- jsonschema-specifications
- lark
- llvmlite
- lm-format-enforcer
- markdown-it-py
- markupsafe
- mdurl
- ml-dtypes
- mpi4py
- mpmath
- msgpack
- multidict
- multiprocess
- nest-asyncio
- networkx
- ninja
- numba
- numpy
- onnx
- optimum
- orjson
- outlines
- packaging
- pandas
- polygraphy
- prometheus_client
- protobuf
- psutil
- pulp
- py-cpuinfo
- pyarrow
- pyarrow-hotfix
- pydantic
- pydantic_core
- pygments
- pynvml
- python-dateutil
- python-dotenv
- python-multipart
- pytz
- pyyaml
- ray
- referencing
- regex
- requests
- rich
- rpds-py
- scipy
- shellingham
- six
- sniffio
- starlette
- strenum
- sympy
- tensorstore
- tiktoken
- tomli
- torch
- tqdm
- transformers
- typer
- typing_extensions
- tzdata
- tzlocal
- ujson
- urllib3
- uvicorn
- uvloop
- vllm
- watchfiles
- websockets
- xformers
- xxhash
- yarl
- zipp
- Eula
NVIDIA NIM for Large Language Models (1.2.0)
Large Language Models (1.2.0)