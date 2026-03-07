Use NVIDIA NeMo Retriever Reranking NIM#
This section provides some examples of reranking, some best practices, and describes some security issues you need to consider when you work with NVIDIA NeMo Retriever Reranking NIM.
Examples#
Shell (cURL)#
Ranking#
Request
curl -X 'POST' \
'http://localhost:8000/v1/ranking' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "nvidia/llama-nemotron-rerank-1b-v2",
"query": {"text": "which way should i go?"},
"passages": [
{"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"},
{"text": "then took the other, as just as fair, and having perhaps the better claim because it was grassy and wanted wear, though as for that the passing there had worn them really about the same,"},
{"text": "and both that morning equally lay in leaves no step had trodden black. oh, i marked the first for another day! yet knowing how way leads on to way i doubted if i should ever come back."},
{"text": "i shall be telling this with a sigh somewhere ages and ages hence: two roads diverged in a wood, and i, i took the one less traveled by, and that has made all the difference."}
],
"truncate": "END"
}'
Response
{
"rankings": [
{ "index": 0, "logit": -1.2421875 },
{ "index": 3, "logit": -3.029296875 },
{ "index": 2, "logit": -5.41015625 },
{ "index": 1, "logit": -8.2421875 }
]
}
Best Practices#
A request to the NeMo Retriever Reranking API includes a
query, a list of
passages, and an optional
truncate parameter (either
NONE or
END, defaulting to
NONE). It then reranks the
passages based on relevance. While many datastores return scores for
passages, those scores are not used by the Reranking API. Only the text of the
query and candidate
passages are used, and are ranked according to the model’s understanding of the content.
If
truncate is
NONE, the container returns an error for inputs whose tokenized representation exceeds the token limit for the underlying model. If
truncate is
END, all tokens beyond the token limit are ignored (see below).
Token Limits & Truncation#
The NeMo Retriever Reranking API allows over 9,000 characters of text to be passed in for
query and
passages, however this far higher than current model limits. The token limit is a function of the underlying model. For
NV-Rerank-QA-Mistral-4B, the total token limit is
503 including the query. So if your
query is 200 tokens and a
passage is
400 tokens, the rightmost
97 tokens will get truncated.
Note that this means if your
query is
503 tokens and
truncate is
END, the entire
passage will be truncated, rendering the reranking service useless.
Max passages#
You can pass up to 512
passages in a single reranking call.
Understanding results#
The results from a reranking request will include a list of objects with
index and
logit keys. They will be sorted descending by
logit value.
logit is the raw, unnormalized predictions that a model generates for each query / passage pair.
The
index references the
index of the passage being referred to in the request. So if the request list included passages
["bears", "house", "grass"] and the indexes in the response are
1,2,0 then the response is saying that the sorted
passages order is
["house", "grass", "bears"].
Security & Authentication#
As a Developer, you are responsible for securing access to any application using the NeMo ecosystem, including an authentication layer between users and your application and securing communication between services in your application.
Rate Limiting#
NeMo Retriever Reranking NIM does not impose rate limits. If you want to restrict access to your application, it is your responsibility to implement a strategy.
Ports#
NeMo Retriever Reranking NIM uses multiple ports, but only the API Port of 8000 needs to be accessible outside of the cluster. The services’ ports are set at start up for both NeMo Retriever Embedding NIM and NeMo Retriever Reranking NIM.
Additional Security Reminders#
As a Developer, you must secure your own API endpoints. We suggest using a proxy as well as HTTPS/TLS 1.2.
Incident Response#
Secrets#
If you deploy Text Retriever NIM components using Helm charts, please follow the instructions in the Creating Secrets section.