aiq.retriever.nemo_retriever.retriever#

Attributes#

Exceptions#

CollectionUnavailableError

Common base class for all non-exit exceptions.

Classes#

Collection

RetrieverPayload

NemoRetriever

Client for retrieving document chunks from a Nemo Retriever service.

NemoLangchainRetriever

Abstract base class for a Document retrieval system.

Functions#

_wrap_nemo_results(output, content_field)

_wrap_nemo_single_results(output, content_field)

_flatten(→ list[str])

Module Contents#

logger#
class Collection(/, **data: Any)#

Bases: pydantic.BaseModel

id: str#
name: str#
meta: Any#
pipeline: str#
created_at: str#
class RetrieverPayload(/, **data: Any)#

Bases: pydantic.BaseModel

query: str#
top_k: int = None#
exception CollectionUnavailableError#

Bases: aiq.retriever.models.RetrieverError

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

class NemoRetriever(
uri: str | pydantic.HttpUrl,
timeout: int = 60,
nvidia_api_key: str = None,
**kwargs,
)#

Bases: aiq.retriever.interface.AIQRetriever

Client for retrieving document chunks from a Nemo Retriever service.

base_url = ''#
timeout = 60#
_search_func#
api_key#
_bound_params = []#
bind(**kwargs) None#

Bind default values to the search method. Cannot bind the ‘query’ parameter.

Args:

kwargs (dict): Key value pairs corresponding to the default values of search parameters.

get_unbound_params() list[str]#

Returns a list of unbound parameters which will need to be passed to the search function.

async get_collections(client) list[Collection]#

Get a list of all available collections as pydantic Collection objects

async get_collection_by_name(collection_name, client) Collection#

Retrieve a collection using it’s name. Will return the first collection found if the name is ambiguous.

async search(query: str, **kwargs)#

Retireve max(top_k) items from the data store based on vector similarity search (implementation dependent).

Retrieve document chunks from the configured Nemo Retriever Service.

_wrap_nemo_results(output: list[dict], content_field: str)#
_wrap_nemo_single_results(output: dict, content_field: str)#
_flatten(obj: dict, output_fields: list[str]) list[str]#
class NemoLangchainRetriever(/, **data: Any)#

Bases: langchain_core.retrievers.BaseRetriever, pydantic.BaseModel

Abstract base class for a Document retrieval system.

A retrieval system is defined as something that can take string queries and return the most ‘relevant’ Documents from some source.

Usage:

A retriever follows the standard Runnable interface, and should be used via the standard Runnable methods of invoke, ainvoke, batch, abatch.

Implementation:

When implementing a custom retriever, the class should implement the _get_relevant_documents method to define the logic for retrieving documents.

Optionally, an async native implementations can be provided by overriding the _aget_relevant_documents method.

Example: A retriever that returns the first 5 documents from a list of documents

from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever
from typing import List

class SimpleRetriever(BaseRetriever):
    docs: List[Document]
    k: int = 5

    def _get_relevant_documents(self, query: str) -> List[Document]:
        """Return the first k documents from the list of documents"""
        return self.docs[:self.k]

    async def _aget_relevant_documents(self, query: str) -> List[Document]:
        """(Optional) async native implementation."""
        return self.docs[:self.k]

Example: A simple retriever based on a scikit-learn vectorizer

from sklearn.metrics.pairwise import cosine_similarity

class TFIDFRetriever(BaseRetriever, BaseModel):
    vectorizer: Any
    docs: List[Document]
    tfidf_array: Any
    k: int = 4

    class Config:
        arbitrary_types_allowed = True

    def _get_relevant_documents(self, query: str) -> List[Document]:
        # Ip -- (n_docs,x), Op -- (n_docs,n_Feats)
        query_vec = self.vectorizer.transform([query])
        # Op -- (n_docs,1) -- Cosine Sim with each doc
        results = cosine_similarity(self.tfidf_array, query_vec).reshape((-1,))
        return [self.docs[i] for i in results.argsort()[-self.k :][::-1]]

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

client: NemoRetriever#
abstractmethod _get_relevant_documents(query, *, run_manager, **kwargs)#

Get documents relevant to a query.

Args:

query: String to find relevant documents for. run_manager: The callback handler to use.

Returns:

List of relevant documents.

async _aget_relevant_documents(query, *, run_manager, **kwargs)#

Asynchronously get documents relevant to a query.

Args:

query: String to find relevant documents for run_manager: The callback handler to use

Returns:

List of relevant documents