NVIDIA Agent Intelligence Toolkit API Server Endpoints#
There are currently four workflow transactions that can be initiated using HTTP or WebSocket when the AIQ toolkit server is running: generate non-streaming
,generate streaming
, chat non-streaming
, and chat streaming
. The following are types of interfaces you can use to interact with your running workflows.
Generate Interface: Uses the transaction schema defined by your workflow. The interface documentation is accessible using Swagger while the server is running
http://localhost:8000/docs
.Chat Interface: OpenAI API Documentation provides details on chat formats compatible with the AIQ toolkit server.
Generate Non-Streaming Transaction#
Route:
/generate
Description: A non-streaming transaction that waits until all workflow data is available before sending the result back to the client. The transaction schema is defined by the workflow.
HTTP Request Example:
curl --request POST \ --url http://localhost:8000/generate \ --header 'Content-Type: application/json' \ --data '{ "input_message": "Is 4 + 4 greater than the current hour of the day" }'
HTTP Response Example:
{ "value":"No, 4 + 4 is not greater than the current hour of the day." }
Generate Streaming Transaction#
Route:
/generate/stream
Description: A streaming transaction that allows data to be sent in chunks as it becomes available from the workflow, rather than waiting for the complete response to be available.
HTTP Request Example:
curl --request POST \ --url http://localhost:8000/generate/stream \ --header 'Content-Type: application/json' \ --data '{ "input_message": "Is 4 + 4 greater than the current hour of the day" }'
HTTP Intermediate Step Stream Example:
"intermediate_data": { "id": "ba5191e6-b818-4206-ac14-863112e597fe", "parent_id": "5db32854-d9b2-4e75-9001-543da6a55dd0", "type": "markdown", "name": "meta/llama-3.1-70b-instruct", "payload": "**Input:**\n```python\n[SystemMessage(content='\\nAnswer the following questions as best you can. You may ask the human to use the following tools:\\n\\ncalculator_multiply: This is a mathematical tool used to multiply two numbers together. It takes 2 numbers as an input and computes their numeric product as the output.. . Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\ncalculator_inequality: This is a mathematical tool used to perform an inequality comparison between two numbers. It takes two numbers as an input and determines if one is greater or are equal.. . Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\ncurrent_datetime: Returns the current date and time in human readable format.. . Arguments must be provided as a valid JSON object following this format: {\\'unused\\': FieldInfo(annotation=str, required=True)} \\ncalculator_divide: This is a mathematical tool used to divide one number by another. It takes 2 numbers as an input and computes their numeric quotient as the output.. . Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\n\\nYou may respond in one of two formats.\\nUse the following format exactly to ask the human to use a tool:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [calculator_multiply,calculator_inequality,current_datetime,calculator_divide]\\nAction Input: the input to the action (if there is no required input, include \"Action Input: None\") \\nObservation: wait for the human to respond with the result from the tool, do not assume the response\\n\\n... (this Thought/Action/Action Input/Observation can repeat N times. If you do not need to use a tool, or after asking the human to use any tools and waiting for the human to respond, you might know the final answer.)\\nUse the following format once you have the final answer:\\n\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='\\nQuestion: Is 4 + 4 greater than the current hour of the day\\n', additional_kwargs={}, response_metadata={}), AIMessage(content='Thought: To answer this question, I need to know the current hour of the day and compare it to 4 + 4.\\n\\nAction: current_datetime\\nAction Input: None\\n\\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='The current time of day is 2025-03-11 16:05:11', additional_kwargs={}, response_metadata={}), AIMessage(content=\"Thought: Now that I have the current time, I can extract the hour and compare it to 4 + 4. \\n\\nAction: calculator_multiply\\nAction Input: {'text': '4 + 4'}\", additional_kwargs={}, response_metadata={}), HumanMessage(content='The product of 4 * 4 is 16', additional_kwargs={}, response_metadata={}), AIMessage(content=\"Thought: Now that I have the result of 4 + 4, which is 8, I can compare it to the current hour.\\n\\nAction: calculator_inequality\\nAction Input: {'text': '8 > 16'}\", additional_kwargs={}, response_metadata={}), HumanMessage(content='First number 8 is less than the second number 16', additional_kwargs={}, response_metadata={})]\n```\n\n**Output:**\nThought: I now know the final answer\n\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16)." }
HTTP Response Example:
"data": { "value": "No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 15)." }
Generate Streaming Full Transaction#
Route:
/generate/full
Description: Same as
/generate/stream
but provides rawIntermediateStep
objects without any step adaptor translations. Use thefilter_steps
query parameter to filter steps by type (comma-separated list) or set to ‘none’ to suppress all intermediate steps.HTTP Request Example:
curl --request POST \ --url http://localhost:8000/generate/full \ --header 'Content-Type: application/json' \ --data '{ "input_message": "Is 4 + 4 greater than the current hour of the day" }'
HTTP Intermediate Step Stream Example:
"intermediate_data": {"id":"dda55b33-edd1-4dde-b938-182676a42a19","parent_id":"8282eb42-01dd-4db6-9fd5-915ed4a2a032","type":"LLM_END","name":"meta/llama-3.1-70b-instruct","payload":"{\"event_type\":\"LLM_END\",\"event_timestamp\":1744051441.449566,\"span_event_timestamp\":1744051440.5072863,\"framework\":\"langchain\",\"name\":\"meta/llama-3.1-70b-instruct\",\"tags\":null,\"metadata\":{\"chat_responses\":[{\"text\":\"Thought: I now know the final answer\\n\\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11).\",\"generation_info\":null,\"type\":\"ChatGenerationChunk\",\"message\":{\"content\":\"Thought: I now know the final answer\\n\\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11).\",\"additional_kwargs\":{},\"response_metadata\":{\"finish_reason\":\"stop\",\"model_name\":\"meta/llama-3.1-70b-instruct\"},\"type\":\"AIMessageChunk\",\"name\":null,\"id\":\"run-dda55b33-edd1-4dde-b938-182676a42a19\"}}],\"chat_inputs\":null,\"tool_inputs\":null,\"tool_outputs\":null,\"tool_info\":null},\"data\":{\"input\":\"First number 8 is less than the second number 11\",\"output\":\"Thought: I now know the final answer\\n\\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11).\",\"chunk\":null},\"usage_info\":{\"token_usage\":{\"prompt_tokens\":37109,\"completion_tokens\":902,\"total_tokens\":38011},\"num_llm_calls\":0,\"seconds_between_calls\":0},\"UUID\":\"dda55b33-edd1-4dde-b938-182676a42a19\"}"}
HTTP Response Example:
"data": {"value":"No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 11)."}
HTTP Request Example with Filter: By default, all intermediate steps are streamed. Use the
filter_steps
query parameter to filter steps by type (comma-separated list) or set tonone
to suppress all intermediate steps.Suppress all intermediate steps (only get final output):
curl --request POST \ --url 'http://localhost:8000/generate/full?filter_steps=none' \ --header 'Content-Type: application/json' \ --data '{"input_message": "Is 4 + 4 greater than the current hour of the day"}'
Get only specific step types:
curl --request POST \ --url 'http://localhost:8000/generate/full?filter_steps=LLM_END,TOOL_END' \ --header 'Content-Type: application/json' \ --data '{"input_message": "Is 4 + 4 greater than the current hour of the day"}'
Chat Non-Streaming Transaction#
Route:
/chat
Description: An OpenAI compatible non-streaming chat transaction.
HTTP Request Example:
curl --request POST \ --url http://localhost:8000/chat \ --header 'Content-Type: application/json' \ --data '{ "messages": [ { "role": "user", "content": "Is 4 + 4 greater than the current hour of the day" } ], "use_knowledge_base": true }'
HTTP Response Example:
{ "id": "b92d1f05-200a-4540-a9f1-c1487bfb3685", "object": "chat.completion", "model": "", "created": "2025-03-11T21:12:43.671665Z", "choices": [ { "message": { "content": "No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16).", "role": null }, "finish_reason": "stop", "index": 0 } ], "usage": { "prompt_tokens": 0, "completion_tokens": 20, "total_tokens": 20 } }
Chat Streaming Transaction#
Route:
/chat/stream
Description: An OpenAI compatible streaming chat transaction.
HTTP Request Example:
curl --request POST \ --url http://localhost:8000/chat/stream \ --header 'Content-Type: application/json' \ --data '{ "messages": [ { "role": "user", "content": "Is 4 + 4 greater than the current hour of the day" } ], "use_knowledge_base": true }'
HTTP Intermediate Step Example:
"intermediate_data": { "id": "9ed4bce7-191c-41cb-be08-7a72d30166cc", "parent_id": "136edafb-797b-42cd-bd11-29153359b193", "type": "markdown", "name": "meta/llama-3.1-70b-instruct", "payload": "**Input:**\n```python\n[SystemMessage(content='\\nAnswer the following questions as best you can. You may ask the human to use the following tools:\\n\\ncalculator_multiply: This is a mathematical tool used to multiply two numbers together. It takes 2 numbers as an input and computes their numeric product as the output.. . Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\ncalculator_inequality: This is a mathematical tool used to perform an inequality comparison between two numbers. It takes two numbers as an input and determines if one is greater or are equal.. . Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\ncurrent_datetime: Returns the current date and time in human readable format.. . Arguments must be provided as a valid JSON object following this format: {\\'unused\\': FieldInfo(annotation=str, required=True)}\\ncalculator_divide: This is a mathematical tool used to divide one number by another. It takes 2 numbers as an input and computes their numeric quotient as the output.. . Arguments must be provided as a valid JSON object following this format: {\\'text\\': FieldInfo(annotation=str, required=True)}\\n\\nYou may respond in one of two formats.\\nUse the following format exactly to ask the human to use a tool:\\n\\nQuestion: the input question you must answer\\nThought: you should always think about what to do\\nAction: the action to take, should be one of [calculator_multiply,calculator_inequality,current_datetime,calculator_divide]\\nAction Input: the input to the action (if there is no required input, include \"Action Input: None\") \\nObservation: wait for the human to respond with the result from the tool, do not assume the response\\n\\n... (this Thought/Action/Action Input/Observation can repeat N times. If you do not need to use a tool, or after asking the human to use any tools and waiting for the human to respond, you might know the final answer.)\\nUse the following format once you have the final answer:\\n\\nThought: I now know the final answer\\nFinal Answer: the final answer to the original input question\\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='\\nQuestion: Is 4 + 4 greater than the current hour of the day\\n', additional_kwargs={}, response_metadata={}), AIMessage(content='Thought: To answer this question, I need to know the current hour of the day and compare it to 4 + 4.\\n\\nAction: current_datetime\\nAction Input: None\\n\\n', additional_kwargs={}, response_metadata={}), HumanMessage(content='The current time of day is 2025-03-11 16:24:52', additional_kwargs={}, response_metadata={}), AIMessage(content=\"Thought: Now that I have the current time, I can extract the hour and compare it to 4 + 4.\\n\\nAction: calculator_multiply\\nAction Input: {'text': '4 + 4'}\", additional_kwargs={}, response_metadata={}), HumanMessage(content='The product of 4 * 4 is 16', additional_kwargs={}, response_metadata={}), AIMessage(content=\"Thought: Now that I have the result of 4 + 4, which is 8, I can compare it to the current hour.\\n\\nAction: calculator_inequality\\nAction Input: {'text': '8 > 16'}\", additional_kwargs={}, response_metadata={}), HumanMessage(content='First number 8 is less than the second number 16', additional_kwargs={}, response_metadata={})]\n```\n\n**Output:**\nThought: I now know the final answer\n\nFinal Answer: No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16)." }
HTTP Response Example:
"data": { "id": "194d22dc-6c1b-44ee-a8d7-bf2b59c1cb6b", "choices": [ { "message": { "content": "No, 4 + 4 (which is 8) is not greater than the current hour of the day (which is 16).", "role": null }, "finish_reason": "stop", "index": 0 } ], "created": "2025-03-11T21:24:56.961939Z", "model": "", "object": "chat.completion.chunk" }
Evaluation Endpoint#
You can also evaluate workflows via the AIQ toolkit evaluate
endpoint. For more information, refer to the AIQ toolkit Evaluation Endpoint documentation.
Choosing between Streaming and Non-Streaming#
Use streaming if you need real-time updates or live communication where users expect immediate feedback. Use non-streaming if your workflow responds with simple updates and less feedback is needed.
AIQ Toolkit API Server Interaction Guide#
A custom user interface can communicate with the API server using both HTTP requests and WebSocket connections. For details on proper WebSocket messaging integration, refer to the WebSocket Messaging Interface documentation.