Standard Events and Actions

The UMIM Specification defines a set of standard events, actions, and context data. They are grouped in several sections, each corresponding to a specific input modality, output modality or functionality.

Use Cases

While UMIM was designed to be easily extensible and to support a variety of interactive systems and types of multimodal interactions, the standard modalities, events and actions are optimized with the following use cases in mind.

Use Cases

Interactive System Type

Description

IM Features

Supported Channels

Supported UMIM Actions

Chatbot

Text-focused chat interface,
strict turn-taking interaction (user, bot, user, bot, …)
Dialog-focused conversation logic
Clear turn taking structure
Text input / Text output
May support showing images, URLs, etc. to the user
UtteranceBotAction
UtteranceUserAction
GestureBotAction
VisualInformationSceneAction
PresenceUserAction

Voice assistant

Same as chatbot but with a voice interface (speech in speech out)
Dialog focused conversations
Clear turn taking structure
Audio in/out
ASR
TTS
UtteranceBotAction
UtteranceUserAction
PresenceUserAction

Chatbot with avatar

At the core a dialog based system that is driving an avatar experience, still a focus on turn taking logic
Dialog focused conversation, but possibility to emit additional events that can control avatar (e.g. change an animation at certain points)
Avatar visualization
ASR
TTS
Audio in / out
UI
Camera/ Vision
UtteranceBotAction
UtteranceUserAction
PresenceUserAction

Interactive avatar

Multimodal experience with an avatar, the IM can consider multiple modalities when generating multimodal outputs. The interaction is not tied to a turn taking structure
Multimodal IM that can react in realtime to events and orchestrate multiple dependent events relieving on different modalities
Avatar visualization
ASR
TTS
Audio in / out
UI
Camera/ Vision
UtteranceBotAction
UtteranceUserAction
PresenceUserAction
GestureBotAction
PostureBotAction
FacialExpressionBotAction
ShotCameraAction
EffectCameraAction
VisualInformationSceneAction
VisualChoiceSceneAction
VisualFormSceneAction

Core Events

All event types in UMIM are built based on a few generic core events. This section defines these generic events. As a developer you most likely will rarely interact directly with these base events, but rather you will use more specific events.

Event(event_created_at: datetime, source_uid: str, tags: Dict[str, Any], type: str, uid: str)

The base event class. All other events inherit the properties from this base spec. By default, we allow extra properties, but the ones present in the spec should follow the spec.

Parameters
  • event_created_at (datetime) – The timestamp of when the event was created.

  • source_uid (str) – A unique id of the source component that generated the event.

  • tags (Dict[str, Any]) – A list of key-value pairs associated with the event.Tags are propagated from input events to output events.

  • type (str) – The type of the event.

  • uid (str) – A unique identifier for the event.

CustomEvent(data: Dict[str, Any], name: str)

A custom event that can be used in addition to the standardized ones.

Parameters
  • data (Dict[str, Any]) – Any data relevant for the event.

  • name (str) – Name of the custom event.

  • ... – Additional parameters/payload inherited from Event().

UserIntent(intent: str)

The structured representation of the intent of the user. This event should be generated by the IM when it has inferred a user’s intent. A user intent can be inferred based on verbal or non-verbal communication of the user.

Parameters
  • intent (str) – Canonical form of the user’s intent

  • ... – Additional parameters/payload inherited from Event().

BotIntent(intent: str)

The structured representation of the intent of the bot. This event should be generated by the IM if it communicates the current intent of the bot. A bot intent can lead to different multimodal expressions of that intent.

Parameters
  • intent (str) – Canonical form of the bot’s intent

  • ... – Additional parameters/payload inherited from Event().

Base Action Specification

This section defines the abstract base action events that actions can inherit from. All action events inherit from one of these base specs.

Action lifecycle and Action IDs. Actions are referenced by a unique ID called action_uid . All events related to the same action need to reference the same action_uid . In this way the individual events with the same action_uid represent the lifecycle of an action from start to finish and potentially updates in between. The component sending out the StartAction and ActionStarted events should generate a new unique action_uid . For what event the UID is initially generated depends on whether the action is a bot or user action. For bot actions (that are started by the IM) the IM is responsible for generating the unique ID. For user actions on the other hand, the Action Server is responsible for generating the UID once the user action starts. Once the UID has been generated for the action all future events for the same action need to reference the generated action_uid. In this way all individual events can be associated with a particular action “instance”.

Lifetime of an action Besides the affected modality UMIM actions can be categorized into two main categories in the way the actions are typically performed by an interactive system or how the system can recognize these actions.

  • Effect Actions. Effect actions (e.g. a GestureBotAction or a UtteranceBotAction), have a limited lifetime and an immediate effect. These actions are typically only started but not actively stopped by the IM due to the limited lifetime.

  • State Actions. Actions that lead to a temporary state change (e.g. a PostureBotAction, PositionBotAction). These actions do not finish “on their own” but must rather be stopped. When such an action is stopped the affected state will be put back into the state as it was before running the action (e.g. override modality).

The main difference between these two action types is that state actions run until explicitly stopped, while effect actions will run until they are naturally completed. To make this property of an action available to programmers and story designers all actions have a property action_info_lifetime that is either indefinite or limited

ActionEvent(action_info_modality: ActionModality, action_info_modality_policy: ActionModalityPolicy, action_uid: str, action_info_lifetime: Optional[umim.messages.types.ActionLifetime])

An event related to the lifetime of an action

Parameters
  • action_info_modality (ActionModality) – The name of the modality. Specific to each action. Cannot be changed.

  • action_info_modality_policy (ActionModalityPolicy) – The policy of the modality. Specific to each action and its modality.

  • action_uid (str) – A unique id for the action.

  • action_info_lifetime (Optional[umim.messages.types.ActionLifetime]) – Indicates if the Action has a limited lifetime or will run indefinitely until actively stopped by the IM

  • ... – Additional parameters/payload inherited from Event().

StartAction()

Event to start an action. All other actions that can be started inherit from this base spec. The action_uid is used to differentiate between multiple runs of the same action.

Parameters

... – Additional parameters/payload inherited from ActionEvent().

ActionStarted(action_started_at: datetime)

The execution of an action has started.

Parameters
  • action_started_at (datetime) – The timestamp of when the action has started.

  • ... – Additional parameters/payload inherited from ActionEvent().

ChangeAction()

The parameters of a running action needs to be changed. Updating running actions is useful for longer running actions (e.g. an avatar animation) which can adapt their behavior dynamically. For example, a nodding animation can change its speed depending on the voice activity level.

Parameters

... – Additional parameters/payload inherited from ActionEvent().

ActionUpdated(action_updated_at: datetime)

A running action provides a (partial) result. Ongoing actions can provide partial updates on the current status of the action. An ActionUpdated should always update the payload of the action object and provide the type of update.

Parameters
  • action_updated_at (datetime) – The timestamp of when the action has been updated.

  • ... – Additional parameters/payload inherited from ActionEvent().

StopAction()

An action needs to be stopped. This should be used to proactively stop an action that can take a longer period of time, e.g., a gesture.

Parameters

... – Additional parameters/payload inherited from ActionEvent().

ActionFinished(action_finished_at: datetime, is_success: bool, failure_reason: Optional[str], was_stopped: Optional[bool])

An action has finished its execution. An action can finish either because the action has completed or failed (natural completion) or it can finish because it was stopped by the IM. The success (or failure) of the execution is marked using the status_code attribute.

Parameters
  • action_finished_at (datetime) – The timestamp of when the action has finished.

  • is_success (bool) – Did the action finish successfully

  • failure_reason (Optional[str]) – Reason for action failure in case the action did not execute successfully

  • was_stopped (Optional[bool]) – Was the action stopped by a Stop event

  • ... – Additional parameters/payload inherited from ActionEvent().

User Actions

All user actions are based on the UserAction base specification. User actions represent UMIM actions that are started by the interactive system and not by the IM. That is why these actions do not have StartXY events associated.

UserActionStarted(user_id: Optional[str])
Parameters
  • user_id (Optional[str]) – An ID identifying the user performing the action. This field is required if you support multi-user interactions.

  • ... – Additional parameters/payload inherited from ActionStarted().

UserActionUpdated(user_id: Optional[str])
Parameters
  • user_id (Optional[str]) – An ID identifying the user performing the action. This field is required if you support multi-user interactions.

  • ... – Additional parameters/payload inherited from ActionUpdated().

StopUserAction(user_id: Optional[str])
Parameters
  • user_id (Optional[str]) – An ID identifying the user performing the action. This field is required if you support multi-user interactions.

  • ... – Additional parameters/payload inherited from ActionEvent().

UserActionFinished(user_id: Optional[str])
Parameters
  • user_id (Optional[str]) – An ID identifying the user performing the action. This field is required if you support multi-user interactions.

  • ... – Additional parameters/payload inherited from ActionFinished().

Bot Actions

All bots actions are based on the BotAction base specification. Bot actions represent UMIM actions that are initiated by IM.

StartBotAction(bot_id: Optional[str])
Parameters
  • bot_id (Optional[str]) – An ID identifying the bot performing the action. This field is required if you support multi-bot interactions.

  • ... – Additional parameters/payload inherited from ActionEvent().

BotActionStarted(bot_id: Optional[str])
Parameters
  • bot_id (Optional[str]) – An ID identifying the bot performing the action. This field is required if you support multi-bot interactions.

  • ... – Additional parameters/payload inherited from ActionStarted().

BotActionUpdated(bot_id: Optional[str])
Parameters
  • bot_id (Optional[str]) – An ID identifying the bot performing the action. This field is required if you support multi-bot interactions.

  • ... – Additional parameters/payload inherited from ActionUpdated().

StopBotAction(bot_id: Optional[str])
Parameters
  • bot_id (Optional[str]) – An ID identifying the bot performing the action. This field is required if you support multi-bot interactions.

  • ... – Additional parameters/payload inherited from ActionEvent().

BotActionFinished(bot_id: Optional[str])
Parameters
  • bot_id (Optional[str]) – An ID identifying the bot performing the action. This field is required if you support multi-bot interactions.

  • ... – Additional parameters/payload inherited from ActionFinished().

Custom User Action

This section defines a custom user action that can be used in addition to the standardized actions. If an interactive system can identify actions from the user that are not part of the UMIM standard the Custom User Action can be used to allow the IM to interact with such actions. This also serves as a mechanism of how new actions can be integrated into UMIM.

CustomUserActionStarted(custom_action_name: str, parameters: Dict[str, Any])

The execution of the custom user action has started.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • parameters (Dict[str, Any]) – Any parameters for the action.

  • ... – Additional parameters/payload inherited from UserActionStarted().

ChangeCustomUserAction(custom_action_name: str, parameters: Dict[str, Any])

The parameters of a running action needs to be changed. Updating running actions is useful for longer running actions (e.g. an avatar animation) which can adapt their behavior dynamically. For example, a nodding animation can change its speed depending on the voice activity level.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • parameters (Dict[str, Any]) – Any parameters for the action change.

  • ... – Additional parameters/payload inherited from ChangeUserAction().

CustomUserActionUpdated(custom_action_name: str, updates: Dict[str, Any])

A running action provides a (partial) result. Ongoing actions can provide partial updates on the current status of the action. An ActionUpdated should always update the payload of the action object and provide the type of update.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • updates (Dict[str, Any]) – Any updates from the action.

  • ... – Additional parameters/payload inherited from UserActionUpdated().

CustomUserActionFinished(custom_action_name: str, results: Dict[str, Any])

An action has finished its execution. An action can finish either because the action has completed or failed (natural completion) or it can finish because it was stopped by the IM. The success (or failure) of the execution is marked using the status_code attribute.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • results (Dict[str, Any]) – Any results from the action.

  • ... – Additional parameters/payload inherited from UserActionFinished().

Custom Bot Action

This section defines a custom bot action that can be used in addition to the standardized actions. If an interactive system is able to execute additional actions that are not part of the UMIM standard the Custom Bot Action can be used to allow the IM to interact and execute such actions. This also serves as a mechanism of how new actions can be integrated into UMIM.

StartCustomBotAction(custom_action_name: str, parameters: Dict[str, Any])

Event to start an action. All other actions that can be started inherit from this base spec. The action_uid is used to differentiate between multiple runs of the same action.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • parameters (Dict[str, Any]) – Any parameters for the action.

  • ... – Additional parameters/payload inherited from StartBotAction().

CustomBotActionStarted(custom_action_name: str, parameters: Dict[str, Any])

The execution of the custom bot action has started.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • parameters (Dict[str, Any]) – Any parameters for the action.

  • ... – Additional parameters/payload inherited from BotActionStarted().

ChangeCustomBotAction(custom_action_name: str, parameters: Dict[str, Any])

Change parameters of the custom action (if supported by the custom action)

Parameters
  • custom_action_name (str) – The name of the custom action.

  • parameters (Dict[str, Any]) – Any parameters for the action change.

  • ... – Additional parameters/payload inherited from ChangeBotAction().

CustomBotActionUpdated(custom_action_name: str, updates: Dict[str, Any])

Something happened during the execution of the custom action (if supported by the custom action).

Parameters
  • custom_action_name (str) – The name of the custom action.

  • updates (Dict[str, Any]) – Any updates from the action.

  • ... – Additional parameters/payload inherited from BotActionUpdated().

StopCustomBotAction(custom_action_name: str, parameters: Dict[str, Any])

An action needs to be stopped.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • parameters (Dict[str, Any]) – Any parameters for the action.

  • ... – Additional parameters/payload inherited from StopBotAction().

CustomBotActionFinished(custom_action_name: str, results: Dict[str, Any])

The custom action has finished its execution.

Parameters
  • custom_action_name (str) – The name of the custom action.

  • results (Dict[str, Any]) – Any results from the action.

  • ... – Additional parameters/payload inherited from BotActionFinished().

Expectation Bot Action

To optimize the multimodal experience, it can be beneficial for the interactive system to know what events the interaction manager is expecting next from the user or the system. This allows the interactive system to optimize its functionality to better detect that particular event. UMIM defines a special action to communicate these expectations to the interactive system. This is a completely optional action that does not need to be implemented by an interactive system. However interactive systems that react can improve the overall user experience.

A few examples:

  • IM is expecting a UtteranceUserActionStarted() event (the user said something). The interactive system can therefore optimize the ASR pipeline, e.g. turning down speaker volume, putting microphone sensitivity up. In noisy environments this could even mean that ASR is only active when the IM is expecting the user to speak (Note: this can severely impact the naturalness of the conversation with e.g. an interactive avatar)

  • In a chatbot system the UI might want to show a thinking indicator when the bot is processing a request. This use case is supported by communicating expectations, since during processing the IM stops all ongoing expectations. This allows the UI to indicate that “processing” state. If the IM is expecting e.g. a text answer it will communicate that again with an expectation action. This is different from actively waiting for a user input which is an action we introduce below.

  • Running computer vision algorithms is typically resource intensive. Since the IM can communicate the types of vision events it is currently expecting at this point in the interaction with the user, the interactive system can disable/enable vision algorithms on the fly. Examples for this include a QR code reading, object recognition, user movement detection and more.

Important

Communicating Bot Expectations does not come with any guaranteed action by the interactive system. It is the sole responsibility of the interactive system to decide what measures to take (or to do nothing) when certain expectations are communicated.

So even if an interactive system supports this action the IM cannot make any assumptions about effects of sending out expectations.

StartExpectationBotAction(expected_event: Annotated[Union[umim.messages.modalities.chat.ChangeUtteranceBotAction, umim.messages.modalities.chat.StartUtteranceBotAction, umim.messages.modalities.chat.StopUtteranceBotAction, umim.messages.modalities.chat.UtteranceBotActionFinished, umim.messages.modalities.chat.UtteranceBotActionStarted, umim.messages.modalities.chat.UtteranceBotActionScriptUpdated, umim.messages.modalities.chat.UtteranceUserActionFinished, umim.messages.modalities.chat.UtteranceUserActionStarted, umim.messages.modalities.chat.UtteranceUserActionTranscriptUpdated, umim.messages.modalities.chat.UtteranceUserActionIntensityUpdated], FieldInfo(default=PydanticUndefined, discriminator='type', extra={})])

The bot expects a certain event on the UMIM event bus in the near future. This optional event can allow the Action Servers to optimize their functions. As an example a AS responsible for processing camera frames can enable / disable certain vision algorithms depending on what the IM is expecting (e.g. BotExpectation(event=PositionChangeUserActionStarted) can allow the AS to start a computationally more expensive motion tracker for better resolution/accuracy.)

Parameters
  • expected_event (Annotated[Union[umim.messages.modalities.chat.ChangeUtteranceBotAction, umim.messages.modalities.chat.StartUtteranceBotAction, umim.messages.modalities.chat.StopUtteranceBotAction, umim.messages.modalities.chat.UtteranceBotActionFinished, umim.messages.modalities.chat.UtteranceBotActionStarted, umim.messages.modalities.chat.UtteranceBotActionScriptUpdated, umim.messages.modalities.chat.UtteranceUserActionFinished, umim.messages.modalities.chat.UtteranceUserActionStarted, umim.messages.modalities.chat.UtteranceUserActionTranscriptUpdated, umim.messages.modalities.chat.UtteranceUserActionIntensityUpdated], FieldInfo(default=PydanticUndefined, discriminator='type', extra={})]) –

  • ... – Additional parameters/payload inherited from StartBotAction().

ExpectationBotActionStarted()

The interactive system communicates to the IM that it is able to handle the expectation for the specified events. In case the system is able to handle the expectation it has to send out the ExpectationBotActionStarted event. Receiving the ActionStarted event does not come with any guarantees on how the expectation is handled, but it provides the IM with a way to know if the system is even capable of handling expectations. For expectations for events that are not supported by any Action Server in the interactive system, no ExpectationBotActionStarted event will be sent out. If a system is not capable of handling certain bot expectations the IM might stop communicating them.

Parameters

... – Additional parameters/payload inherited from BotActionStarted().

StopExpectationBotAction()

The IM communicates that it stopped its expectations. This normally happens when the expectation has been met (e.g. the event has been received) or something else happened to change the course of the interaction.

Parameters

... – Additional parameters/payload inherited from StopBotAction().

ExpectationBotActionFinished()

The interactive system acknowledges that the bot expectation is finished.

Parameters

... – Additional parameters/payload inherited from BotActionFinished().

Expectation Signaling Bot Action

Besides communicating to the interactive system that the IM is expecting certain events to happen it might be important to signal to the user that the bot is waiting for an input on a certain user modality. For this UMIM offers the ExpectationSignaling action. This action is meant to allow the interactive system to provide subtle clues to the user about what the bot is expecting from the user (e.g., the avatar’s ears could grow if it is waiting for user input). This action is not meant to communicate specific expectations to the user (e.g., you cannot communicate something like “Please show me a picture of an elephant” - for this you have to explicitly communicate the bot expectation as part of the interaction).

Modality. The active waiting action is associated with its own modality “BotExpectationSignaling” with an override policy.

A few examples on how this action can be used

  • In a chatbot system the user might be required to enter a certain information in order for the process to complete. (e.g. something like “Please enter your date of birth to confirm the order”. ). In such a situation the IM wants to signal to the user that it is actively waiting for the user to respond. For this the IM sends out an StartExpectationSignalingBotAction(modality=UserSpeech) event

  • The interactive avatar might be waiting for a specific gesture from the user. The IM might want to actively communicate this with the user (e.g. showing an animation). For this the IM sends out StartExpectationSignalingBotAction(modality=UserGesture). If there are other ongoing actions with conflicting system channels (e.g. multiple upper body animations) it is the responsibility of the action server to resolve any potential conflicts.

Action Server Implementation Notes:

  • The action is not meant to be used to implement an active listening behavior by an avatar. For an interactive avatar the IM might first start ActiveWaiting for UserSpeech and once the UtteranceUserActionStarted has been received a certain flow would handle starting small head nodding and reacting with vocal bursts to certain events.

StartExpectationSignalingBotAction(modality: ActionModality)

The bot is waiting for an event for a specific modality.

Parameters
  • modality (ActionModality) –

  • ... – Additional parameters/payload inherited from StartBotAction().

ExpectationSignalingBotActionStarted()

The bot has started actively waiting for an event on the specified modality.

Parameters

... – Additional parameters/payload inherited from BotActionStarted().

StopExpectationSignalingBotAction()

Stop waiting for an event on the modality.

Parameters

... – Additional parameters/payload inherited from StopBotAction().

ExpectationSignalingBotActionFinished()

Bot has stopped actively waiting. Note that this action is only stopped on explicit request, by calling the sending the StopExpectationSignalingBotAction . Otherwise the action will continue indefinitely.

Parameters

... – Additional parameters/payload inherited from BotActionFinished().