Configure the Databases#

VSS supports integrating different databases for storing and retrieving video data.

The default deployment will use the Milvus database as the vector database and Neo4j database as the graph database, but you might want to use a different database depending on your specific needs.

To use different databases, you need to modify the config.yaml file. For information on how to obtain the config.yaml file, refer to Deploy Using Docker Compose X86.

ArangoDB#

CA-RAG config:

functions:
  summarization:
    type: batch_summarization
    params:
      batch_size: 6 # Use even batch size if speech recognition enabled.
      batch_max_concurrency: 20
      prompts:
        caption: "Write a concise and clear dense caption for the provided warehouse video, focusing on irregular or hazardous events such as boxes falling, workers not wearing PPE, workers falling, workers taking photographs, workers chitchatting, forklift stuck, etc. Start and end each sentence with a time stamp."
        caption_summarization: "You should summarize the following events of a warehouse in the format start_time:end_time:caption. For start_time and end_time use . to seperate seconds, minutes, hours. If during a time segment only regular activities happen, then ignore them, else note any irregular activities in detail. The output should be bullet points in the format start_time:end_time: detailed_event_description. Don't return anything else except the bullet points."
        summary_aggregation: "You are a warehouse monitoring system. Given the caption in the form start_time:end_time: caption, Aggregate the following captions in the format start_time:end_time:event_description. If the event_description is the same as another event_description, aggregate the captions in the format start_time1:end_time1,...,start_timek:end_timek:event_description. If any two adjacent end_time1 and start_time2 is within a few tenths of a second, merge the captions in the format start_time1:end_time2. The output should only contain bullet points.  Cluster the output into Unsafe Behavior, Operational Inefficiencies, Potential Equipment Damage and Unauthorized Personnel"
    tools:
      llm: summarization_llm
      db: arango_db

  ingestion_function:
    type: graph_ingestion
    params:
      batch_size: 1
    tools:
      db: arango_db
      llm: chat_llm

  retriever_function:
    type: graph_retrieval
    params:
      top_k: 5
    tools:
      db: arango_db
      llm: chat_llm
tools:
  arango_db:
    type: arango
    params:
      host: !ENV ${ARANGO_DB_HOST}
      port: !ENV ${ARANGO_DB_PORT}
      username: !ENV ${ARANGO_DB_USERNAME}
      password: !ENV ${ARANGO_DB_PASSWORD}
    tools:
      embedding: nvidia_embedding

Note

ArangoDB is not supported on aarch64 platforms.

Elasticsearch#

CA-RAG config:

functions:
  summarization:
    type: batch_summarization
    params:
      batch_size: 6 # Use even batch size if speech recognition enabled.
      batch_max_concurrency: 20
      prompts:
        caption: "Write a concise and clear dense caption for the provided warehouse video, focusing on irregular or hazardous events such as boxes falling, workers not wearing PPE, workers falling, workers taking photographs, workers chitchatting, forklift stuck, etc. Start and end each sentence with a time stamp."
        caption_summarization: "You should summarize the following events of a warehouse in the format start_time:end_time:caption. For start_time and end_time use . to seperate seconds, minutes, hours. If during a time segment only regular activities happen, then ignore them, else note any irregular activities in detail. The output should be bullet points in the format start_time:end_time: detailed_event_description. Don't return anything else except the bullet points."
        summary_aggregation: "You are a warehouse monitoring system. Given the caption in the form start_time:end_time: caption, Aggregate the following captions in the format start_time:end_time:event_description. If the event_description is the same as another event_description, aggregate the captions in the format start_time1:end_time1,...,start_timek:end_timek:event_description. If any two adjacent end_time1 and start_time2 is within a few tenths of a second, merge the captions in the format start_time1:end_time2. The output should only contain bullet points.  Cluster the output into Unsafe Behavior, Operational Inefficiencies, Potential Equipment Damage and Unauthorized Personnel"
    tools:
      llm: summarization_llm
      db: elasticsearch_db

  ingestion_function:
    type: vector_ingestion
    tools:
      db: elasticsearch_db
      llm: chat_llm

  retriever_function:
    type: vector_retrieval
    params:
      top_k: 5
    tools:
      db: elasticsearch_db
      reranker: nvidia_reranker
      llm: chat_llm
tools:
  elasticsearch_db:
    type: elasticsearch
    params:
      host: !ENV ${ES_HOST}
      port: !ENV ${ES_PORT}
    tools:
      embedding: nvidia_embedding