Generic API status response message
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
response_msg | string | response message |
|
status | APIStatus | API response status code as defined in `APIStatus` |
ASR Result
Field | Type | Label | Description |
results | StreamingRecognitionResult | Complete ASR Response in Riva Skills ASR result schema |
|
latency_ms | float |
|
|
start_time | string | start time in ISO8601 format, e.g. 2024-03-08T13:33:30.736Z |
|
stop_time | string | stop time in ISO8601 format |
Chat Engine Result json
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
result | string | chat engine result |
|
latency_ms | float |
|
Request message for Chat API which will be sent to chat engine
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
bot_name | string | bot name with version like {bot_name}_v{bot_version}, e.g chitchat_bot_v1. |
|
query | string | query |
|
query_id | string | unique id for identifying the query |
|
user_id | string | user id |
|
source_language | string | The language of the supplied query string as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: "en-US". |
|
target_language | string | The language of the response required from chat engine as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: "en-US". |
|
is_standalone | bool | Flag to send standalone text requests, when set true reponse is not sent to TTS when set to false reponse will be sent to TTS |
|
user_context | ChatRequest.UserContextEntry | repeated | key-value pair for user context to be sent to chat engine |
metadata | ChatRequest.MetadataEntry | repeated | key-value pair for meta data to be sent to chat engine |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Response message from chat engine for Chat API invocation
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
query | string | query |
|
query_id | string | unique id for identifying the query |
|
user_id | string | user id |
|
session_id | string | session id if generated by chat engine |
|
text | string | chat engine response for the query passed in `ChatRequest` |
|
cleaned_text | string | chat engine cleaned up response text after markdown language tags removal |
|
is_final | bool | flag to indicate whether this is final response or intermediate response, when true there will be no more responses for the requested `ChatRequest` |
|
json_response | string | chat engine response in json format |
Field | Type | Label | Description |
bot_name | string | bot name with version like {bot_name}_v{bot_version}, e.g chitchat_bot_v1. |
|
conversation | ConversationInstance | repeated |
|
Field | Type | Label | Description |
role | Role |
|
|
content | string |
|
Request message for Event API which will be sent to chat engine
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
bot_name | string | bot name with version like {bot_name}_v{bot_version}, e.g chitchat_bot_v1. |
|
event_type | string | event type |
|
event_id | string | unique event id |
|
user_id | string | user id |
|
user_context | EventRequest.UserContextEntry | repeated | key-value pair for user context to be sent to chat engine |
metadata | EventRequest.MetadataEntry | repeated | key-value pair for meta data to be sent to chat engine |
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Field | Type | Label | Description |
key | string |
|
|
value | string |
|
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
event_type | string | event type |
|
event_id | string | unique event id |
|
user_id | string | user id |
|
text | string | text response |
|
cleaned_text | string |
|
|
is_final | bool |
|
|
json_response | string |
|
|
events | string | repeated |
|
GetStatusRequest used to get on demand Chat controller pipeline status
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
Chat controller pipeline status response
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
pipeline_state | PipelineStateResponse |
|
PipelineRequest is used to create/free pipeline specified using stream_id
Field | Type | Label | Description |
stream_id | string | A unique id sent by the client to identify the client connection. It is mapped to a unique pipeline on the Chat Controller server. |
|
user_id | string | user id |
Chat controller pipeline state response
Field | Type | Label | Description |
state | PipelineState |
|
ReceiveAudioRequest is used to request audio data for specified stream_id.
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
Receive Audio API Response
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
audio_content | bytes | synthesized audio data |
|
encoding | AudioEncoding | The encoding of the audio data |
|
sample_rate_hertz | int32 | The sample rate in hertz (Hz) of the audio data |
|
audio_channel_count | int32 | The number of channels in the audio data. Only mono is supported |
|
frame_size | int32 | frame size of audio data |
Reload Speech Configs Request
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
The SendAudioRequest is used to send either StreamingRecognitionConfig message
or audio content. The first SendAudioRequest message must contain a
StreamingRecognitionConfig message, followed by the audio content messages.
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
streaming_config | StreamingRecognitionConfig | Provides information to the recognizer that specifies how to process the request. The first `SendAudioRequest` message must contain a `streaming_config` message. |
|
audio_content | bytes | The audio data to be recognized. Sequential chunks of audio data are streamed from client. |
|
source_id | string | source id of the audio data |
|
create_time | string | audio buffer creation timestamp in ISO8601 format |
Alternative hypotheses (a.k.a. n-best list).
Field | Type | Label | Description |
transcript | string | Transcript text representing the words that the user spoke. |
|
confidence | float | The non-normalized confidence estimate. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for a non-streaming result or, of a streaming result where `is_final=true`. This field is not guaranteed to be accurate and users should not rely on it to be always provided. |
|
words | WordInfo | repeated | A list of word-specific information for each recognized word. Only populated if is_final=true |
SpeechRecognitionControlRequest is used for controlling input to
ASR internally muting ASR.
It is also used to disable DM-TTS flow for the incoming ASR input
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
is_standalone | bool | Flag to mention whether asr transcripts to be passed to DM-TTS or get only transcripts |
Provides information to the ASR recognizer about incoming audio data
Field | Type | Label | Description |
encoding | AudioEncoding | The encoding of the audio data sent in the request. All encodings support only 1 channel (mono) audio. |
|
sample_rate_hertz | int32 | The sample rate in hertz (Hz) of the audio data sent in the `SendAudioRequest` message. |
|
language_code | string | The language of the supplied audio as a [BCP-47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt) language tag. Example: "en-US". Default is en-US. |
|
audio_channel_count | int32 | The number of channels in the input audio data. |
|
model | string | Which model to select for the given request. |
A streaming speech recognition result corresponding to a portion of the audio
that is currently being processed.
Field | Type | Label | Description |
alternatives | SpeechRecognitionAlternative | repeated | May contain one or more recognition hypotheses (up to the maximum specified in `max_alternatives`). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer. |
is_final | bool | If `false`, this `StreamingRecognitionResult` represents an interim result that may change. If `true`, this is the final time the speech service will return this particular `StreamingRecognitionResult`, the recognizer will not return any further hypotheses for this portion of the transcript and corresponding audio. |
|
stability | float | An estimate of the likelihood that the recognizer will not change its guess about this interim result. Values range from 0.0 (completely unstable) to 1.0 (completely stable). This field is only provided for interim results (`is_final=false`). The default of 0.0 is a sentinel value indicating `stability` was not set. |
|
channel_tag | int32 | For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audio_channel_count = N, its output values can range from '1' to 'N'. |
|
audio_processed | float | Length of audio processed so far in seconds |
StreamingSpeechResultsRequest is used to request various results from chat
controller.
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
request_id | string | uuid to identify concurrent client request |
Chat controller Metadata streaming response
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
message_type | MessageType | message type as defined in `MessageType` |
|
asr_result | ASRResult |
|
|
chat_engine_response | ChatEngineResponse |
|
|
tts_result | TTSResult |
|
|
pipeline_state | PipelineStateResponse |
|
|
display_text | string |
|
Request message for standalone TTS synthesis of provided text transcript
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
transcript | string | transcript text to be synthesized |
TTS result metadata
Field | Type | Label | Description |
latency_ms | float | TTS latency in milliseconds |
|
time_till_eos_ms | int32 | time in millisecond remained to complete tts audio rendering. This is applicable when tts is set to streaming and realtime from pipeline graph. In non-streaming mode this is expected to be 0. |
UserContext data containing user specific information.
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
user_id | string | user id |
|
bot_name | string | bot name with version like {bot_name}_v{bot_version}, e.g chitchat_bot_v1. |
|
conversation_history | ConversationHistory | repeated | conversation history of user |
context_json | string | json formatted data of user context |
UserContextRequest used to request user context
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
user_id | string | user id |
UserParametersRequest is used to set user parameters
Field | Type | Label | Description |
stream_id | string | unique id to identify the client connection |
|
user_id | string | used id |
|
bot_name | string | bot name with version like {bot_name}_v{bot_version}, e.g chitchat_bot_v1. |
Word-specific information for recognized words.
Field | Type | Label | Description |
start_time | int32 | Time offset relative to the beginning of the audio in ms and corresponding to the start of the spoken word. This field is only set if `enable_word_time_offsets=true` and only in the top hypothesis. |
|
end_time | int32 | Time offset relative to the beginning of the audio in ms and corresponding to the end of the spoken word. This field is only set if `enable_word_time_offsets=true` and only in the top hypothesis. |
|
word | string | The word corresponding to this set of information. |
|
confidence | float | The non-normalized confidence estimate. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set. |
Generic Chat controller API status
Name | Number | Description |
UNKNOWN_STATUS | 0 | |
SUCCESS | 1 | |
PIPELINE_AVAILABLE | 2 | |
PIPELINE_NOT_AVAILABLE | 3 | |
BUSY | 4 | |
ERROR | 5 | |
INFO | 6 |
AudioEncoding specifies the encoding of the audio bytes in the encapsulating message.
Name | Number | Description |
UNKNOWN | 0 | Not specified. |
LINEAR_PCM | 1 | Uncompressed 16-bit signed little-endian samples (Linear PCM). |
FLAC | 2 | `FLAC` (Free Lossless Audio Codec) is the recommended encoding because it is lossless--therefore recognition is not compromised--and requires only about half the bandwidth of `LINEAR16`. `FLAC` stream encoding supports 16-bit and 24-bit samples, however, not all fields in `STREAMINFO` are supported. |
MULAW | 3 | 8-bit samples that compand 14-bit audio samples using G.711 PCMU/mu-law. |
ALAW | 5 | 8-bit samples that compand 13-bit audio samples using G.711 PCMU/a-law. |
Message type field for Chat controller metadata streaming
Name | Number | Description |
UNKNOWN_RESPONSE | 0 | |
ASR_RESPONSE | 1 | |
CHAT_ENGINE_RESPONSE | 2 | |
TTS_RESPONSE | 3 | |
PIPELINE_STATE_RESPONSE | 4 | |
DISPLAY_TEXT | 5 |
Chat controller Pipeline States
Name | Number | Description |
INIT | 0 | |
IDLE | 1 | |
WAIT_FOR_TRIGGER | 2 | |
ASR_ACTIVE | 3 | |
DM_ACTIVE | 4 | |
TTS_ACTIVE | 5 |
Used in storing conversation history for user and bot
Name | Number | Description |
UNDEFINED | 0 | |
USER | 1 | |
BOT | 2 | |
SYSTEM | 3 |
The AceAgentGrpc service provides apis to interact with chat engine and speech
components.
Method Name | Request Type | Response Type | Description |
CreatePipeline | PipelineRequest | APIStatusResponse | CreatePipeline API is used to create new pipeline with Chat controller, It creates a Chat controller pipeline with a unique stream_id populated by the client in PipelineRequest. |
FreePipeline | PipelineRequest | APIStatusResponse | FreePipeline API is used to free up a pipeline with Chat controller, created by using CreatePipeline API. Client needs to pass same stream_id in PipelineRequest as used in CreatePipeline. |
SendAudio | SendAudioRequest stream | APIStatusResponse | SendAudio API is used to stream audio content to ASR from Chat controller. This is a client side streaming API. |
ReceiveAudio | ReceiveAudioRequest | ReceiveAudioResponse stream | ReceiveAudio API is used to receive synthesized audio from TTS through Chat controller. This is a server side streaming API. |
StreamSpeechResults | StreamingSpeechResultsRequest | StreamingSpeechResultsResponse stream | StreamSpeechResults API is used to receive all the meta data from Chat controller like ASR transcripts, Chat engine responses, Pipeline states etc. This is a broadcasting API i.e it can fan out responses to multiple concurrent client instances using same stream_id. This is a server side streaming API. |
StartRecognition | SpeechRecognitionControlRequest | APIStatusResponse | StartRecognition API is used to start the ASR recognition in Chat controller for the audio content streamed from SendAudio API. This API also provides a flag to mark the ASR recognition as standalone, i.e Chat Engine and TTS will not be invoked for the ASR transcript. |
StopRecognition | SpeechRecognitionControlRequest | APIStatusResponse | StopRecognition API is used to stop the ASR recognition for the audio content streamed from SendAudio API. |
SetUserParameters | UserParametersRequest | APIStatusResponse | SetUserParameters API can be used to set the runtime user parameters like user_id on Chat controller pipeline. |
GetStatus | GetStatusRequest | GetStatusResponse | GetStatus API can be used to get the latest state of Chat controller pipeline. This API is not valid if UMIM is enabled |
ReloadSpeechConfigs | ReloadSpeechConfigsRequest | APIStatusResponse | ReloadSpeechConfigs API can be used to reload the ASR word boosting and TTS Arpbet configs in Chat controller. |
SynthesizeSpeech | SynthesizeSpeechRequest | APIStatusResponse | SynthesizeSpeech API is used to send text transcript directly to the TTS for standalone TTS audio synthesis. The generated audio will be routed to the path specified in the pipeline graph provided in Chat controller. e.g. if the TTS audio is routed to A2F in the graph, the audio will be sent to A2F server. If the TTS audio is routed to Grpc client then it will be available through the server side streaming ReceiveAudio API. |
GetUserContext | UserContextRequest | UserContext | GetUserContext API is used to get the current user context from Chat Engine. The API returns a UserContext message containing the current conversation history and any context attached to the active user_id. This API is not valid if UMIM is enabled |
SetUserContext | UserContext | APIStatusResponse | SetUserContext API is used to set the current user context in Chat Engine. The API accepts a UserContext message containing the conversation history and any context to be attached to the active user_id. This API is not valid if UMIM is enabled |
UpdateUserContext | UserContext | APIStatusResponse | UpdateUserContext API is used to update the current user context from Chat Engine. The API accepts a UserContext message containing any context to be attached to the active user_id. This API is not valid if UMIM is enabled |
DeleteUserContext | UserContextRequest | APIStatusResponse | DeleteUserContext API is used to delete the current user context attached to a user_id in Chat Engine. This API is not valid if UMIM is enabled |
Chat | ChatRequest | ChatResponse stream | Chat API is used to send text queries to Chat Engine via Chat controller. This API also provides a flag to disable TTS synthesis for the response generated by Chat Engine. This can be used for a text in and text out type of scenario. This API is not valid if UMIM is enabled |
Event | EventRequest | EventResponse stream | Event API is used to send events to Chat Engine via Chat controller. This API is not valid if UMIM is enabled |
.proto Type | Notes | C++ | Java | Python | Go | C# | PHP | Ruby |
double | double | double | float | float64 | double | float | Float | |
float | float | float | float | float32 | float | float | Float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long | int64 | long | integer/string | Bignum |
uint32 | Uses variable-length encoding. | uint32 | int | int/long | uint32 | uint | integer | Bignum or Fixnum (as required) |
uint64 | Uses variable-length encoding. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum or Fixnum (as required) |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long | int64 | long | integer/string | Bignum |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 2^28. | uint32 | int | int | uint32 | uint | integer | Bignum or Fixnum (as required) |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 2^56. | uint64 | long | int/long | uint64 | ulong | integer/string | Bignum |
sfixed32 | Always four bytes. | int32 | int | int | int32 | int | integer | Bignum or Fixnum (as required) |
sfixed64 | Always eight bytes. | int64 | long | int/long | int64 | long | integer/string | Bignum |
bool | bool | boolean | boolean | bool | bool | boolean | TrueClass/FalseClass | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode | string | string | string | String (UTF-8) |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str | []byte | ByteString | string | String (ASCII-8BIT) |