Embedding Search Providers

NeMo Guardrails utilizes embedding search, also known as vector databases, for implementing the guardrails process and for the knowledge base functionality.

To enhance the efficiency of the embedding search process, NeMo Guardrails can employ a caching mechanism for embeddings. This mechanism stores computed embeddings, thereby reducing the need for repeated computations and accelerating the search process. By default, the caching mechanism is disabled.

The default embedding search uses FastEmbed for computing the embeddings (the all-MiniLM-L6-v2 model) and Annoy for performing the search. The default configuration is as follows:

core:
  embedding_search_provider:
    name: default
    parameters:
      embedding_engine: FastEmbed
      embedding_model: all-MiniLM-L6-v2
      use_batching: False
      max_batch_size: 10
      max_batch_hold: 0.01
      search_threshold: None
    cache:
      enabled: False
      key_generator: md5
      store: filesystem
      store_config: {}

knowledge_base:
  embedding_search_provider:
    name: default
    parameters:
      embedding_engine: FastEmbed
      embedding_model: all-MiniLM-L6-v2
      use_batching: False
      max_batch_size: 10
      max_batch_hold: 0.01
      search_threshold: None
    cache:
      enabled: False
      key_generator: md5
      store: filesystem
      store_config: {}

The default embedding search provider can also work with OpenAI embeddings:

core:
  embedding_search_provider:
    name: default
    parameters:
      embedding_engine: openai
      embedding_model: text-embedding-ada-002
    cache:
      enabled: False
      key_generator: md5
      store: filesystem
      store_config: {}

knowledge_base:
  embedding_search_provider:
    name: default
    parameters:
      embedding_engine: openai
      embedding_model: text-embedding-ada-002
    cache:
      enabled: False
      key_generator: md5
      store: filesystem
      store_config: {}

The default implementation is also designed to support asynchronous execution of the embedding computation process, thereby enhancing the efficiency of the search functionality.

The cache configuration is optional. If enabled, it uses the specified key_generator and store to cache the embeddings. The store_config can be used to provide additional configuration options required for the store. The default cache configuration uses the md5 key generator and the filesystem store. The cache is disabled by default.

Batch Implementation

The default embedding provider includes a batch processing feature designed to optimize the embedding generation process. This feature is designed to initiate the embedding generation process after a predefined latency of 10 milliseconds.

Custom Embedding Search Providers

You can implement your own custom embedding search provider by subclassing EmbeddingsIndex. For quick reference, the complete interface is included below:

class EmbeddingsIndex:
    """The embeddings index is responsible for computing and searching a set of embeddings."""

    @property
    def embedding_size(self):
        raise NotImplementedError

    @property
    def cache_config(self):
      raise NotImplementedError

    async def _get_embeddings(self, texts: List[str]):
        raise NotImplementedError

    async def add_item(self, item: IndexItem):
        """Adds a new item to the index."""
        raise NotImplementedError()

    async def add_items(self, items: List[IndexItem]):
        """Adds multiple items to the index."""
        raise NotImplementedError()

    async def build(self):
        """Build the index, after the items are added.

        This is optional, might not be needed for all implementations."""
        pass

    async def search(self, text: str, max_results: int) -> List[IndexItem]:
        """Searches the index for the closest matches to the provided text."""
        raise NotImplementedError()

@dataclass
class IndexItem:
    text: str
    meta: Dict = field(default_factory=dict)

In order to use your custom embedding search provider, you have to register it in your config.py:

def init(app: LLMRails):
    app.register_embedding_search_provider("simple", SimpleEmbeddingSearchProvider)

For a complete example, check out this test configuration.