Embedding Search Providers | NVIDIA NeMo Guardrails Library Developer Guide

NeMo Guardrails utilizes embedding search, also known as vector databases, for implementing the guardrails process and for the knowledge base functionality.

To enhance the efficiency of the embedding search process, NeMo Guardrails can employ a caching mechanism for embeddings. This mechanism stores computed embeddings, thereby reducing the need for repeated computations and accelerating the search process. By default, the caching mechanism is disabled.

The default embedding search uses FastEmbed for computing embeddings (the all-MiniLM-L6-v2 model) and an exact NumPy index for similarity search. The default configuration is as follows:

1 core:
2   embedding_search_provider:
3     name: default
4     parameters:
5       embedding_engine: FastEmbed
6       embedding_model: all-MiniLM-L6-v2
7       use_batching: False
8       max_batch_size: 10
9       max_batch_hold: 0.01
10       search_threshold: None
11     cache:
12       enabled: False
13       key_generator: sha256
14       store: filesystem
15       store_config: {}
16 
17 knowledge_base:
18   embedding_search_provider:
19     name: default
20     parameters:
21       embedding_engine: FastEmbed
22       embedding_model: all-MiniLM-L6-v2
23       use_batching: False
24       max_batch_size: 10
25       max_batch_hold: 0.01
26       search_threshold: None
27     cache:
28       enabled: False
29       key_generator: sha256
30       store: filesystem
31       store_config: {}

The default embedding search provider can also work with OpenAI embeddings:

1 core:
2   embedding_search_provider:
3     name: default
4     parameters:
5       embedding_engine: openai
6       embedding_model: text-embedding-3-small
7     cache:
8       enabled: False
9       key_generator: sha256
10       store: filesystem
11       store_config: {}
12 
13 knowledge_base:
14   embedding_search_provider:
15     name: default
16     parameters:
17       embedding_engine: openai
18       embedding_model: text-embedding-3-small
19     cache:
20       enabled: False
21       key_generator: sha256
22       store: filesystem
23       store_config: {}

The default implementation is also designed to support asynchronous execution of the embedding computation process, thereby enhancing the efficiency of the search functionality.

The cache configuration is optional. If enabled, it uses the specified key_generator and store to cache the embeddings. The store_config can be used to provide additional configuration options required for the store. The default cache configuration uses the sha256 key generator and the filesystem store. The cache is disabled by default.

Batch Implementation

The default embedding provider includes a batch processing feature designed to optimize the embedding generation process. This feature is designed to initiate the embedding generation process after a predefined latency of 10 milliseconds.

Custom Embedding Search Providers

You can implement your own custom embedding search provider by subclassing EmbeddingsIndex. For quick reference, the complete interface is included below:

1 class EmbeddingsIndex:
2     """The embeddings index is responsible for computing and searching a set of embeddings."""
3 
4     @property
5     def embedding_size(self):
6         raise NotImplementedError
7 
8     @property
9     def cache_config(self):
10       raise NotImplementedError
11 
12     async def _get_embeddings(self, texts: List[str]):
13         raise NotImplementedError
14 
15     async def add_item(self, item: IndexItem):
16         """Adds a new item to the index."""
17         raise NotImplementedError()
18 
19     async def add_items(self, items: List[IndexItem]):
20         """Adds multiple items to the index."""
21         raise NotImplementedError()
22 
23     async def build(self):
24         """Build the index, after the items are added.
25 
26         This is optional, might not be needed for all implementations."""
27         pass
28 
29     async def search(self, text: str, max_results: int, threshold: Optional[float]) -> List[IndexItem]:
30         """Searches the index for the closest matches to the provided text."""
31         raise NotImplementedError()
32 
33 @dataclass
34 class IndexItem:
35     text: str
36     meta: Dict = field(default_factory=dict)

In order to use your custom embedding search provider, you have to register it in your config.py:

1 def init(app: LLMRails):
2     app.register_embedding_search_provider("simple", SimpleEmbeddingSearchProvider)

For a complete example, check out this test configuration.