NVIDIA Agent Intelligence Toolkit API Server Endpoints#

There are currently four workflow transactions that can be initiated using HTTP or WebSocket when the AIQ toolkit server is running: generate non-streaming,generate streaming, chat non-streaming, and chat streaming. The following are types of interfaces you can use to interact with your running workflows.

Generate Interface: Uses the transaction schema defined by your workflow. The interface documentation is accessible using Swagger while the server is running http://localhost:8000/docs.
Chat Interface: OpenAI API Documentation provides details on chat formats compatible with the AIQ toolkit server.

Generate Non-Streaming Transaction#

Route: /generate
Description: A non-streaming transaction that waits until all workflow data is available before sending the result back to the client. The transaction schema is defined by the workflow.

HTTP Request Example:

curl --request POST \
  --url http://localhost:8000/generate \
  --header 'Content-Type: application/json' \
  --data '{
    "input_message": "Is 4 + 4 greater than the current hour of the day"
  }'

HTTP Response Example:

{
  "value":"No, 4 + 4 is not greater than the current hour of the day."
}

Generate Streaming Transaction#

Route: /generate/stream
Description: A streaming transaction that allows data to be sent in chunks as it becomes available from the workflow, rather than waiting for the complete response to be available.

HTTP Request Example:

curl --request POST \
  --url http://localhost:8000/generate/stream \
  --header 'Content-Type: application/json' \
  --data '{
    "input_message": "Is 4 + 4 greater than the current hour of the day"
  }'

HTTP Intermediate Step Stream Example:

"intermediate_data": {
  "id": "ba5191e6-b818-4206-ac14-863112e597fe",
  "parent_id": "5db32854-d9b2-4e75-9001-543da6a55dd0",
  "type": "markdown",
  "name": "meta/llama-3.1-70b-instruct",
  "payload": "**Input:**\n```python\n[SystemMessage(content='\\nAnswer the following questions as best you can. You
              may ask the human to use the following tools:\\n\\ncalculator_multiply: This is a mathematical tool used to multiply
              two numbers together. It takes 2 numbers as an input and computes their numeric product as the output.. . Arguments
              must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str,
              required=True)}\\ncalculator_inequality: This is a mathematical tool used to perform an inequality comparison
              between two numbers. It takes two numbers as an input and determines if one is greater or are equal.. . Arguments
              must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str,
              required=True)}\\ncurrent_datetime: Returns the current date and time in human readable format.. . Arguments must
              be provided as a valid JSON object following this format: {\\'unused\\': FieldInfo(annotation=str, required=True)}
              \\ncalculator_divide: This is a mathematical tool used to divide one number by another. It takes 2 numbers as an
              input and computes their numeric quotient as the output.. . Arguments must be provided as a valid JSON object
              following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\n\\nYou may respond in one of two
              formats.\\nUse the following format exactly to ask the human to use a tool:\\n\\nQuestion: the input question you
              must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of
              [calculator_multiply,calculator_inequality,current_datetime,calculator_divide]\\nAction Input: the input to the
              action (if there is no required input, include \"Action Input: None\")  \\nObservation: wait for the human to
              respond with the result from the tool, do not assume the response\\n\\n... (this Thought/Action/Action
              Input/Observation can repeat N times. If you do not need to use a tool, or after asking the human to use any tools
              and waiting for the human to respond, you might know the final answer.)\\nUse the following format once you have
              the final answer:\\n\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input
              question\\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='\\nQuestion: Is 4 + 4 greater
              than the current hour of the day\\n', additional_kwargs={}, response_metadata={}), AIMessage(content='Thought:
              To answer this question, I need to know the current hour of the day and compare it to 4 + 4.\\n\\nAction:
              current_datetime\\nAction Input: None\\n\\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='The
              current time of day is 2025-03-11 16:05:11', additional_kwargs={}, response_metadata={}),
              AIMessage(content=\"Thought: Now that I have the current time, I can extract the hour and compare it to 4 + 4.
              \\n\\nAction: calculator_multiply\\nAction Input: {'text': '4 + 4'}\", additional_kwargs={}, response_metadata={}),
              HumanMessage(content='The product of 4 * 4 is 16', additional_kwargs={}, response_metadata={}),
              AIMessage(content=\"Thought: Now that I have the result of 4 + 4, which is 8, I can compare it to the current
              hour.\\n\\nAction: calculator_inequality\\nAction Input: {'text': '8 &gt; 16'}\", additional_kwargs={},
              response_metadata={}), HumanMessage(content='First number 8 is less than the second number 16',
              additional_kwargs={}, response_metadata={})]\n```\n\n**Output:**\nThought: I now know the final answer\n\nFinal
              Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16)."
}

HTTP Response Example:

"data": { "value": "No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 15)." }

Generate Streaming Full Transaction#

Route: /generate/full
Description: Same as /generate/stream but provides raw IntermediateStep objects without any step adaptor translations. Use the filter_steps query parameter to filter steps by type (comma-separated list) or set to ‘none’ to suppress all intermediate steps.

HTTP Request Example:

curl --request POST \
--url http://localhost:8000/generate/full \
--header 'Content-Type: application/json' \
--data '{
  "input_message": "Is 4 + 4 greater than the current hour of the day"
}'

HTTP Intermediate Step Stream Example:

"intermediate_data": {"id":"dda55b33-edd1-4dde-b938-182676a42a19","parent_id":"8282eb42-01dd-4db6-9fd5-915ed4a2a032","type":"LLM_END","name":"meta/llama-3.1-70b-instruct","payload":"{\"event_type\":\"LLM_END\",\"event_timestamp\":1744051441.449566,\"span_event_timestamp\":1744051440.5072863,\"framework\":\"langchain\",\"name\":\"meta/llama-3.1-70b-instruct\",\"tags\":null,\"metadata\":{\"chat_responses\":[{\"text\":\"Thought: I now know the final answer\\n\\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11).\",\"generation_info\":null,\"type\":\"ChatGenerationChunk\",\"message\":{\"content\":\"Thought: I now know the final answer\\n\\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11).\",\"additional_kwargs\":{},\"response_metadata\":{\"finish_reason\":\"stop\",\"model_name\":\"meta/llama-3.1-70b-instruct\"},\"type\":\"AIMessageChunk\",\"name\":null,\"id\":\"run-dda55b33-edd1-4dde-b938-182676a42a19\"}}],\"chat_inputs\":null,\"tool_inputs\":null,\"tool_outputs\":null,\"tool_info\":null},\"data\":{\"input\":\"First number 8 is less than the second number 11\",\"output\":\"Thought: I now know the final answer\\n\\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11).\",\"chunk\":null},\"usage_info\":{\"token_usage\":{\"prompt_tokens\":37109,\"completion_tokens\":902,\"total_tokens\":38011},\"num_llm_calls\":0,\"seconds_between_calls\":0},\"UUID\":\"dda55b33-edd1-4dde-b938-182676a42a19\"}"}

HTTP Response Example:

"data": {"value":"No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11)."}

HTTP Request Example with Filter: By default, all intermediate steps are streamed. Use the filter_steps query parameter to filter steps by type (comma-separated list) or set to none to suppress all intermediate steps.

Suppress all intermediate steps (only get final output):

curl --request POST \
  --url 'http://localhost:8000/generate/full?filter_steps=none' \
  --header 'Content-Type: application/json' \
  --data '{"input_message": "Is 4 + 4 greater than the current hour of the day"}'

Get only specific step types:

curl --request POST \
  --url 'http://localhost:8000/generate/full?filter_steps=LLM_END,TOOL_END' \
  --header 'Content-Type: application/json' \
  --data '{"input_message": "Is 4 + 4 greater than the current hour of the day"}'

Chat Non-Streaming Transaction#

Route: /chat
Description: An OpenAI compatible non-streaming chat transaction.

HTTP Request Example:

curl --request POST \
--url http://localhost:8000/chat \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content":  "Is 4 + 4 greater than the current hour of the day"
    }
  ],
  "use_knowledge_base": true
}'

HTTP Response Example:

{
  "id": "b92d1f05-200a-4540-a9f1-c1487bfb3685",
  "object": "chat.completion",
  "model": "",
  "created": "2025-03-11T21:12:43.671665Z",
  "choices": [
      {
          "message": {
              "content": "No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16).",
              "role": null
          },
          "finish_reason": "stop",
          "index": 0
      }
  ],
  "usage": {
      "prompt_tokens": 0,
      "completion_tokens": 20,
      "total_tokens": 20
  }
}

Chat Streaming Transaction#

Route: /chat/stream
Description: An OpenAI compatible streaming chat transaction.

HTTP Request Example:

curl --request POST \
--url http://localhost:8000/chat/stream \
--header 'Content-Type: application/json' \
--data '{
  "messages": [
    {
      "role": "user",
      "content":  "Is 4 + 4 greater than the current hour of the day"
    }
  ],
  "use_knowledge_base": true
}'

HTTP Intermediate Step Example:

"intermediate_data": {
  "id": "9ed4bce7-191c-41cb-be08-7a72d30166cc",
  "parent_id": "136edafb-797b-42cd-bd11-29153359b193",
  "type": "markdown",
  "name": "meta/llama-3.1-70b-instruct",
  "payload": "**Input:**\n```python\n[SystemMessage(content='\\nAnswer the following questions as best you can. You
              may ask the human to use the following tools:\\n\\ncalculator_multiply: This is a mathematical tool used to multiply
              two numbers together. It takes 2 numbers as an input and computes their numeric product as the output.. . Arguments
              must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str,
              required=True)}\\ncalculator_inequality: This is a mathematical tool used to perform an inequality comparison
              between two numbers. It takes two numbers as an input and determines if one is greater or are equal.. .
              Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str,
              required=True)}\\ncurrent_datetime: Returns the current date and time in human readable format.. . Arguments
              must be provided as a valid JSON object following this format: {\\'unused\\': FieldInfo(annotation=str,
              required=True)}\\ncalculator_divide: This is a mathematical tool used to divide one number by another. It takes
              2 numbers as an input and computes their numeric quotient as the output.. . Arguments must be provided as a
              valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\n\\nYou may
              respond in one of two formats.\\nUse the following format exactly to ask the human to use a tool:\\n\\nQuestion:
              the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to
              take, should be one of [calculator_multiply,calculator_inequality,current_datetime,calculator_divide]\\nAction
              Input: the input to the action (if there is no required input, include \"Action Input: None\")  \\nObservation:
              wait for the human to respond with the result from the tool, do not assume the response\\n\\n...
              (this Thought/Action/Action Input/Observation can repeat N times. If you do not need to use a tool, or after
              asking the human to use any tools and waiting for the human to respond, you might know the final answer.)\\nUse
              the following format once you have the final answer:\\n\\nThought: I now know the final answer\\nFinal Answer:
              the final answer to the original input question\\n', additional_kwargs={}, response_metadata={}),
              HumanMessage(content='\\nQuestion: Is 4 + 4 greater than the current hour of the day\\n', additional_kwargs={},
              response_metadata={}), AIMessage(content='Thought: To answer this question, I need to know the current hour of
              the day and compare it to 4 + 4.\\n\\nAction: current_datetime\\nAction Input: None\\n\\n', additional_kwargs={},
              response_metadata={}), HumanMessage(content='The current time of day is 2025-03-11 16:24:52',
              additional_kwargs={}, response_metadata={}), AIMessage(content=\"Thought: Now that I have the current time, I can
              extract the hour and compare it to 4 + 4.\\n\\nAction: calculator_multiply\\nAction Input: {'text': '4 + 4'}\",
              additional_kwargs={}, response_metadata={}), HumanMessage(content='The product of 4 * 4 is 16',
              additional_kwargs={}, response_metadata={}), AIMessage(content=\"Thought: Now that I have the result of 4 + 4,
              which is 8, I can compare it to the current hour.\\n\\nAction: calculator_inequality\\nAction Input:
              {'text': '8 &gt; 16'}\", additional_kwargs={}, response_metadata={}), HumanMessage(content='First number 8 is
              less than the second number 16', additional_kwargs={}, response_metadata={})]\n```\n\n**Output:**\nThought: I now
              know the final answer\n\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day
              (which is 16)."
}

HTTP Response Example:

"data": {
  "id": "194d22dc-6c1b-44ee-a8d7-bf2b59c1cb6b",
  "choices": [
      {
          "message": {
              "content": "No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16).",
              "role": null
          },
          "finish_reason": "stop",
          "index": 0
      }
  ],
  "created": "2025-03-11T21:24:56.961939Z",
  "model": "",
  "object": "chat.completion.chunk"
}

Evaluation Endpoint#

You can also evaluate workflows via the AIQ toolkit evaluate endpoint. For more information, refer to the AIQ toolkit Evaluation Endpoint documentation.

Choosing between Streaming and Non-Streaming#

Use streaming if you need real-time updates or live communication where users expect immediate feedback. Use non-streaming if your workflow responds with simple updates and less feedback is needed.

AIQ Toolkit API Server Interaction Guide#

A custom user interface can communicate with the API server using both HTTP requests and WebSocket connections. For details on proper WebSocket messaging integration, refer to the WebSocket Messaging Interface documentation.