Examples#

There are multiple example clips provided in the UI Application which can be explored to better understand how the agent works. For each example, we’ve provided a Prompt, Caption Summarization Prompt, and Summary Aggregation Prompt to get started. Refer to Tuning Prompts for more details on the prompts.

Traffic Camera Video#

This is a 2-minute 10-second video (its.mp4) consisting of a synthetically generated traffic scene.

Chunk Size: 10 sec

Prompt

You are an intelligent traffic system.
You must monitor and take note of all traffic related events.
Start and end each sentence with a time stamp.

Caption Summarization Prompt

You will be given captions from sequential clips of a video.
Aggregate captions in the format start_time:end_time:caption
based on whether captions are related to one another or create
a continuous scene.

Summary Aggregation Prompt

Based on the available information, generate a traffic report
that is organized chronologically and in logical sections.
This should be a concise, yet descriptive summary of all the important events.
The format should be intuitive and easy for a user to read and understand what happened.
Format the output in Markdown so it can be displayed nicely.

Sample questions

  1. Did a car crash occur?

  2. When do the police arrive at the crash?

  3. What cars were involved in the crash?

Warehouse Video (short)#

This is a 3-minute 30-second video (warehouse.mp4) consisting of clips within a warehouse environment.

Chunk Size: 10 sec

Prompt

You are a warehouse monitoring system. Describe the events
in this warehouse and look for any anomalies.
Start and end each sentence with a time stamp.

Caption Summarization Prompt

Summarize similar captions that are sequential to one another,
while maintaining the details of each caption, in the format
start_time:end_time:caption. The output should be bullet points
in the format start_time:end_time: detailed_event_description.

Summary Aggregation Prompt

Aggregate captions in the format start_time:end_time:caption
based on whether captions are related to one another or create
a continuous scene. The output should only be bullet points in
the format start_time:end_time: detailed_event_description.

Sample questions

  1. When did the forklift first arrive?

  2. Did a worker drop any boxes?

  3. What breaches of safety protocol took place?

Bridge Inspection Video#

This is a 3-minute video (bridge.mp4) consisting of drone footage inspecting a bridge.

Chunk Size: 20 sec

Prompt

You are a bridge inspection system. Describe the condition of
the bridge. Start and end each sentence with a time stamp.

Caption Summarization Prompt

You will be given captions from sequential clips of a video.
Aggregate captions in the format start_time:end_time:caption
based on whether captions are related to one another or create
a continuous scene.

Summary Aggregation Prompt

Based on the available information, generate a summary that
describes the condition of the bridge. The summary should be
organized chronologically and in logical sections. This should be a
concise, yet descriptive summary of all the important events.
The format should be intuitive and easy for a user to understand what happened.
Format the output in Markdown so it can be displayed nicely.

Sample questions

  1. Where is graffiti located?

  2. At what time do you see graffiti?

  3. Are there people on the bridge?

Warehouse Video (long)#

This is an 82-minute video (warehouse_82min.mp4) consisting of clips within a warehouse environment.

Chunk Size: 60 sec

Prompt

Write a concise and clear dense caption for the provided warehouse video,
focusing on irregular or hazardous events such as boxes falling, workers
not wearing PPE, workers falling, workers taking photographs,
workers chitchatting, forklift stuck, etc. Start and end each sentence
with a time stamp.

Caption Summarization Prompt

You should summarize the following events of a warehouse in the format
start_time:end_time:caption. For start_time and end_time use . to
separate seconds, minutes, hours. If during a time segment only regular
activities happen, then ignore them, else note any irregular activities
in detail. The output should be bullet points in the format
start_time:end_time:detailed_event_description. Don't return anything
else except the bullet points.

Summary Aggregation Prompt

You are a warehouse monitoring system. Given the caption in the form
start_time:end_time:caption, Aggregate the following captions in the
format start_time:end_time:event_description. If the event_description
is the same as another event_description, aggregate the captions in
the format start_time1:end_time1,...,start_timek:end_timek:event_description.
If any two adjacent end_time1 and start_time2 is within a few tenths of a second,
merge the captions in the format start_time1:end_time2. The output should
only contain bullet points.  Cluster the output into Unsafe Behavior,
Operational Inefficiencies, Potential Equipment Damage and Unauthorized Personnel.

Sample questions

  1. When did the red forklift first appear in the scene?

  2. Was anyone not wearing personal protective equipment?

  3. What were workers doing?

Traffic Camera Images#

This example is a sequence of images sampled from a traffic camera with a timestamp overlay. Multiple images can be provided under the “IMAGE FILE SUMMARIZATION & Q&A” tab of the UI.

Images

Select multiple images from the samples list sequentially until all images are loaded, as shown below.

Select multiple images from samples.

Prompt

You are an intelligent traffic system. You will be given a set of images from a traffic intersection.
Write a detailed caption for each image to capture all traffic related events and details. For each caption,
include the timestamp from the image.

Caption Summarization Prompt

Combine the captions if needed. Do not lose any information.

Summary Aggregation Prompt

You will be given a set of captions describing several images from a traffic intersection.
Write a summary of the events from the captions and include the timestamp information.

Sample questions

  1. What time did the car crash occur?

  2. Did a police car respond to the crash?

  3. When did the firetruck arrive?