OpenAI Compatible Services#
Connect to hosted model endpoints that implement the OpenAI API format, such as build.nvidia.com.
Before You Start#
Rate Limits#
OpenAI API compatible services typically have rate limits on:
- Number of requests per minute 
- Number of tokens per minute 
- Total tokens per request 
For high-volume data generation, consider using NeMo Deploy to host your own models without rate limits.
Usage#
The following code demonstrates how to connect to build.nvidia.com to query Mixtral 8x7B Instruct:
from openai import OpenAI
from nemo_curator import OpenAIClient
openai_client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key="<insert NVIDIA API key>",
)
client = OpenAIClient(openai_client)
responses = client.query_model(
    model="mistralai/mixtral-8x7b-instruct-v0.1",
    messages=[
        {
            "role": "user",
            "content": "Write a limerick about the wonders of GPU computing.",
        }
    ],
    temperature=0.2,
    top_p=0.7,
    max_tokens=1024,
)
print(responses[0])
# Output:
# A GPU with numbers in flight, Brings joy to programmers late at night.
# With parallel delight, Solving problems, so bright,
# In the realm of computing, it's quite a sight!