Guardrail Resource#

class nemo_microservices.resources.guardrail.GuardrailResource(client: NeMoMicroservices)#

Bases: SyncAPIResource

property completions: CompletionsResource#
property chat: ChatResource#
property configs: ConfigsResource#
property models: ModelsResource#
property with_raw_response: GuardrailResourceWithRawResponse#

This property can be used as a prefix for any HTTP method call to return the raw response object instead of the parsed content.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#accessing-raw-response-data-e-g-headers

property with_streaming_response: GuardrailResourceWithStreamingResponse#

An alternative to .with_raw_response that doesn’t eagerly read the response body.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#with_streaming_response

check(
*,
messages: Iterable[guardrail_check_params.Message],
model: str,
best_of: int | NotGiven = NOT_GIVEN,
echo: bool | NotGiven = NOT_GIVEN,
frequency_penalty: float | NotGiven = NOT_GIVEN,
function_call: str | object | NotGiven = NOT_GIVEN,
guardrails: GuardrailsDataParam | NotGiven = NOT_GIVEN,
ignore_eos: bool | NotGiven = NOT_GIVEN,
logit_bias: Dict[str, float] | NotGiven = NOT_GIVEN,
logprobs: bool | NotGiven = NOT_GIVEN,
max_tokens: int | NotGiven = NOT_GIVEN,
n: int | NotGiven = NOT_GIVEN,
presence_penalty: float | NotGiven = NOT_GIVEN,
response_format: Dict[str, str] | NotGiven = NOT_GIVEN,
seed: int | NotGiven = NOT_GIVEN,
stop: List[str] | str | NotGiven = NOT_GIVEN,
stream: bool | NotGiven = NOT_GIVEN,
suffix: str | NotGiven = NOT_GIVEN,
system_fingerprint: str | NotGiven = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
tool_choice: str | object | NotGiven = NOT_GIVEN,
tools: List[str] | NotGiven = NOT_GIVEN,
top_logprobs: int | NotGiven = NOT_GIVEN,
top_p: float | NotGiven = NOT_GIVEN,
user: str | NotGiven = NOT_GIVEN,
vision: bool | NotGiven = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
) GuardrailCheckResponse#

Chat completion for the provided conversation.

Parameters:
  • messages – A list of messages comprising the conversation so far

  • model – The model to use for completion. Must be one of the available models.

  • best_of – Not supported. Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return - best_of must be greater than n.

  • echo – Not supported. If echo is true, the response will include the prompt and optionally its tokens ids and logprobs.

  • frequency_penalty – Positive values penalize new tokens based on their existing frequency in the text.

  • function_call – Not Supported. Deprecated in favor of tool_choice. ‘none’ means the model will not call a function and instead generates a message. ‘auto’ means the model can pick between generating a message or calling a function. Specifying a particular function via {‘name’: ‘my_function’} forces the model to call that function.

  • guardrails – Guardrails specific options for the request.

  • ignore_eos – Ignore the eos when running

  • logit_bias – Not Supported. Modify the likelihood of specified tokens appearing in the completion.

  • logprobs – Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message

  • max_tokens – The maximum number of tokens that can be generated in the chat completion.

  • n – How many chat completion choices to generate for each input message.

  • presence_penalty – Positive values penalize new tokens based on whether they appear in the text so far.

  • response_format – Format of the response, can be ‘json_object’ to force the model to output valid JSON.

  • seed – If specified, attempts to sample deterministically.

  • stop – Up to 4 sequences where the API will stop generating further tokens.

  • stream – If set, partial message deltas will be sent, like in ChatGPT.

  • suffix – Not supported. If echo is set, the prompt is returned with the completion.

  • system_fingerprint – Represents the backend configuration that the model runs with. Used with seed for determinism.

  • temperature – What sampling temperature to use, between 0 and 2.

  • tool_choice – Not Supported. Favored over function_call. Controls which (if any) function is called by the model.

  • tools – A list of tools the model may call.

  • top_logprobs – The number of most likely tokens to return at each token position.

  • top_p – An alternative to sampling with temperature, called nucleus sampling.

  • user – Not Supported. A unique identifier representing your end-user.

  • vision – Whether this is a vision-capable request with image inputs.

  • extra_headers – Send extra headers

  • extra_query – Add additional query parameters to the request

  • extra_body – Add additional JSON properties to the request

  • timeout – Override the client-level default timeout for this request, in seconds

create_from_dict(data: dict[str, object]) object#
class nemo_microservices.resources.guardrail.AsyncGuardrailResource(client: AsyncNeMoMicroservices)#

Bases: AsyncAPIResource

property completions: AsyncCompletionsResource#
property chat: AsyncChatResource#
property configs: AsyncConfigsResource#
property models: AsyncModelsResource#
property with_raw_response: AsyncGuardrailResourceWithRawResponse#

This property can be used as a prefix for any HTTP method call to return the raw response object instead of the parsed content.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#accessing-raw-response-data-e-g-headers

property with_streaming_response: AsyncGuardrailResourceWithStreamingResponse#

An alternative to .with_raw_response that doesn’t eagerly read the response body.

For more information, see https://docs.nvidia.com/nemo/microservices/latest/pysdk/index.html#with_streaming_response

async check(
*,
messages: Iterable[guardrail_check_params.Message],
model: str,
best_of: int | NotGiven = NOT_GIVEN,
echo: bool | NotGiven = NOT_GIVEN,
frequency_penalty: float | NotGiven = NOT_GIVEN,
function_call: str | object | NotGiven = NOT_GIVEN,
guardrails: GuardrailsDataParam | NotGiven = NOT_GIVEN,
ignore_eos: bool | NotGiven = NOT_GIVEN,
logit_bias: Dict[str, float] | NotGiven = NOT_GIVEN,
logprobs: bool | NotGiven = NOT_GIVEN,
max_tokens: int | NotGiven = NOT_GIVEN,
n: int | NotGiven = NOT_GIVEN,
presence_penalty: float | NotGiven = NOT_GIVEN,
response_format: Dict[str, str] | NotGiven = NOT_GIVEN,
seed: int | NotGiven = NOT_GIVEN,
stop: List[str] | str | NotGiven = NOT_GIVEN,
stream: bool | NotGiven = NOT_GIVEN,
suffix: str | NotGiven = NOT_GIVEN,
system_fingerprint: str | NotGiven = NOT_GIVEN,
temperature: float | NotGiven = NOT_GIVEN,
tool_choice: str | object | NotGiven = NOT_GIVEN,
tools: List[str] | NotGiven = NOT_GIVEN,
top_logprobs: int | NotGiven = NOT_GIVEN,
top_p: float | NotGiven = NOT_GIVEN,
user: str | NotGiven = NOT_GIVEN,
vision: bool | NotGiven = NOT_GIVEN,
extra_headers: Headers | None = None,
extra_query: Query | None = None,
extra_body: Body | None = None,
timeout: float | httpx.Timeout | None | NotGiven = NOT_GIVEN,
) GuardrailCheckResponse#

Chat completion for the provided conversation.

Parameters:
  • messages – A list of messages comprising the conversation so far

  • model – The model to use for completion. Must be one of the available models.

  • best_of – Not supported. Generates best_of completions server-side and returns the “best” (the one with the highest log probability per token). Results cannot be streamed. When used with n, best_of controls the number of candidate completions and n specifies how many to return - best_of must be greater than n.

  • echo – Not supported. If echo is true, the response will include the prompt and optionally its tokens ids and logprobs.

  • frequency_penalty – Positive values penalize new tokens based on their existing frequency in the text.

  • function_call – Not Supported. Deprecated in favor of tool_choice. ‘none’ means the model will not call a function and instead generates a message. ‘auto’ means the model can pick between generating a message or calling a function. Specifying a particular function via {‘name’: ‘my_function’} forces the model to call that function.

  • guardrails – Guardrails specific options for the request.

  • ignore_eos – Ignore the eos when running

  • logit_bias – Not Supported. Modify the likelihood of specified tokens appearing in the completion.

  • logprobs – Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message

  • max_tokens – The maximum number of tokens that can be generated in the chat completion.

  • n – How many chat completion choices to generate for each input message.

  • presence_penalty – Positive values penalize new tokens based on whether they appear in the text so far.

  • response_format – Format of the response, can be ‘json_object’ to force the model to output valid JSON.

  • seed – If specified, attempts to sample deterministically.

  • stop – Up to 4 sequences where the API will stop generating further tokens.

  • stream – If set, partial message deltas will be sent, like in ChatGPT.

  • suffix – Not supported. If echo is set, the prompt is returned with the completion.

  • system_fingerprint – Represents the backend configuration that the model runs with. Used with seed for determinism.

  • temperature – What sampling temperature to use, between 0 and 2.

  • tool_choice – Not Supported. Favored over function_call. Controls which (if any) function is called by the model.

  • tools – A list of tools the model may call.

  • top_logprobs – The number of most likely tokens to return at each token position.

  • top_p – An alternative to sampling with temperature, called nucleus sampling.

  • user – Not Supported. A unique identifier representing your end-user.

  • vision – Whether this is a vision-capable request with image inputs.

  • extra_headers – Send extra headers

  • extra_query – Add additional query parameters to the request

  • extra_body – Add additional JSON properties to the request

  • timeout – Override the client-level default timeout for this request, in seconds

create_from_dict(
data: dict[str, object],
) object#