OpenAI Compatible Services#
Connect to hosted model endpoints that implement the OpenAI API format, such as build.nvidia.com.
Before You Start#
Rate Limits#
OpenAI API compatible services typically have rate limits on:
Number of requests per minute
Number of tokens per minute
Total tokens per request
For high-volume data generation, consider using NeMo Deploy to host your own models without rate limits.
Usage#
The following code demonstrates how to connect to build.nvidia.com to query Mixtral 8x7B Instruct:
from openai import OpenAI
from nemo_curator import OpenAIClient
openai_client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key="<insert NVIDIA API key>",
)
client = OpenAIClient(openai_client)
responses = client.query_model(
model="mistralai/mixtral-8x7b-instruct-v0.1",
messages=[
{
"role": "user",
"content": "Write a limerick about the wonders of GPU computing.",
}
],
temperature=0.2,
top_p=0.7,
max_tokens=1024,
)
print(responses[0])
# Output:
# A GPU with numbers in flight, Brings joy to programmers late at night.
# With parallel delight, Solving problems, so bright,
# In the realm of computing, it's quite a sight!