Embedding Search Providers
NeMo Guardrails utilizes embedding search, also known as vector databases, for implementing the guardrails process and for the knowledge base functionality.
To enhance the efficiency of the embedding search process, NeMo Guardrails can employ a caching mechanism for embeddings. This mechanism stores computed embeddings, thereby reducing the need for repeated computations and accelerating the search process. By default, the caching mechanism is disabled.
The default embedding search uses FastEmbed for computing embeddings (the all-MiniLM-L6-v2 model) and an exact NumPy index for similarity search. The default configuration is as follows:
The default embedding search provider can also work with OpenAI embeddings:
The default implementation is also designed to support asynchronous execution of the embedding computation process, thereby enhancing the efficiency of the search functionality.
The cache configuration is optional. If enabled, it uses the specified key_generator and store to cache the embeddings. The store_config can be used to provide additional configuration options required for the store.
The default cache configuration uses the sha256 key generator and the filesystem store. The cache is disabled by default.
Batch Implementation
The default embedding provider includes a batch processing feature designed to optimize the embedding generation process. This feature is designed to initiate the embedding generation process after a predefined latency of 10 milliseconds.
Custom Embedding Search Providers
You can implement your own custom embedding search provider by subclassing EmbeddingsIndex. For quick reference, the complete interface is included below:
In order to use your custom embedding search provider, you have to register it in your config.py:
For a complete example, check out this test configuration.