Configuration#
This section describes how to configure the MSA Search NIM container using environment variables, GPU selection, and volume mounting.
GPU Selection#
By default, Docker uses all available GPUs on the system when started with the NVIDIA Container Runtime:
docker run --runtime=nvidia ...
To expose only specific GPUs to the container, use the --gpus flag:
docker run --gpus='"device=1"' ...
The device IDs are listed in the output of nvidia-smi -L. For more information on GPU selection, refer to the NVIDIA Container Toolkit documentation.
Environment Variables#
The following environment variables can be passed to the NIM using the -e flag with docker run:
ENV |
Required? |
Default |
Notes |
|---|---|---|---|
|
Yes |
None |
You must set this variable to the value of your personal NGC API key. |
|
No |
|
Location (in container) where the container caches model artifacts. |
|
No |
|
Port inside the container where the NIM service is published. When using the |
|
No |
|
Controls the GPU Server for MMSeqs2. The GPU Server is enabled by default for optimal performance with the included ColabFold databases. Set to |
|
No |
|
Sets the maximum number of MSA sequences returned, and is reflected as both the default and the maximum allowed value for the |
|
No |
|
Maximum number of databases that can be specified in a single API request. |
|
No |
|
Maximum value for the |
|
No |
|
Maximum number of chains (sequences) allowed in a paired MSA search request. |
|
No |
|
Maximum length of an input protein sequence. |
|
No |
|
Maximum value for the |
|
No |
|
Regex pattern used to validate input protein sequences. Modify to accept non-standard amino acid characters. |
|
No |
|
Comma-separated list of databases to search for structural templates by default. |
|
No |
|
Database used as the profile (first) database in the cascaded ColabFold search pipeline. Must be one of the available databases. |
|
No |
|
Comma-separated list of databases that use the full cascaded expansion pipeline. Databases not in this list use a simpler non-expansion follow-up search. |
|
No |
|
Comma-separated list of |
|
No |
(empty) |
Comma-separated list of |
|
No |
|
Specifies the level of logging detail. Available options: |
|
No |
None |
Path to custom database directory inside the container. Use with volume mounts to load external or custom MMSeqs2 databases. Refer to Task-Specific Profiles and Custom Databases for details. |
|
No |
None |
Select a specific database profile by ID. Profiles allow reduced storage by loading only the databases needed for a particular task. Refer to Task-Specific Profiles and Custom Databases. |
|
No |
|
Glob pattern for discovering sequence databases. The NIM scans for files matching this pattern under |
|
No |
|
Glob pattern for discovering the PDB mmCIF structure archive. Used for retrieving 3D structures during structural template search. |
|
No |
|
Maximum time (seconds) to wait for a GPU server to become ready at startup. |
|
No |
|
Interval (seconds) between GPU server health-check status reports during startup. |
|
No |
|
Controls telemetry collection. Set to |
|
No |
|
Enables logging for telemetry operations when set to |
Volumes#
The following container paths can be mounted to local directories for persistent storage.
Container Path |
Required |
Notes |
Docker Argument Example |
|---|---|---|---|
|
Recommended |
Directory where model artifacts and databases are cached. If not mounted, the container will re-download on each startup. Ensure the local directory has read-write-execute permissions ( |
|
Custom path (set using |
Optional |
Mount custom databases by setting |
|
Note
The MODEL_PATH environment variable is also supported for backward compatibility with the same behavior as NIM_MODEL_NAME.
MMseqs2 Pipeline Tuning Environment Variables#
The MSA Search NIM exposes several environment variables that control internal MMseqs2 pipeline parameters. These are set at container startup and apply globally to all requests. The cascaded search pipeline (used by both colabfold and template search) follows these steps per database:
Search — Iterative profile search to find initial hits
Expand — Recover remote homologs similar to initial hits
Align — Sensitive realignment of expanded candidates
Filter — Quality filtering (for example, query coverage, and sequence identity)
result2msa — Convert filtered results to A3M alignment
The variables below tune the align and filter steps, which directly affect per-database output depth and, consequently, the size of the merged colabfold alignment.
Align-Step Parameters#
These control how many targets and alternative alignments are kept during the post-expansion realignment. Together, max_accept and alt_ali determine the upper bound U on the untrimmed per-database sequence count: U ≤ max_accept × (1 + alt_ali).
Variable |
Default |
Description |
|---|---|---|
|
|
Maximum number of accepted targets per query in the profile (first-database) align step. Corresponds to |
|
|
Same as above, but for follow-up (subsequent database) align steps. |
|
|
Maximum number of alternative non-overlapping alignments per accepted target in the profile align step. Corresponds to |
|
|
Same as above, but for follow-up align steps. |
|
|
E-value threshold for the profile align step. Alignments above this e-value are discarded. |
|
|
E-value threshold for the follow-up search step. |
|
|
E-value threshold for the follow-up align step. |
With defaults, each database can produce up to 100 × (1 + 10) = 1,100 sequences before trimming. Per-database API results are trimmed to max_msa_sequences (N, default 500), but the merged colabfold entry concatenates untrimmed results from all D databases, yielding up to D × 1,100 sequences.
Filter-Step Parameters#
These control quality filtering applied after the align step. Separate variables are provided for the profile (first-database) and follow-up (subsequent database) filter steps.
Variable |
Default |
Description |
|---|---|---|
|
|
Minimum sequence identity with query (0–1) in the profile filter step. Set to 0 to disable. |
|
|
Minimum score per aligned residue with query sequence ( |
|
|
Diversity filter: keep at least this many sequences in each MSA block of length 50 ( |
|
|
Maximum pairwise sequence identity (0–1) for redundancy reduction in the profile filter step. Set to 1.0 to disable. |
|
|
Only apply filtering when the MSA has more than this many sequences in the profile filter step. Set to 0 to always filter. |
|
|
Minimum sequence identity with query (0–1) in the follow-up filter step. Set to 0 to disable. |
|
|
Minimum score per aligned residue with query sequence (range −50 to 100) in the follow-up filter step. |
|
|
Diversity filter: keep at least this many sequences in each MSA block of length 50 in the follow-up filter step. Set to 0 to disable diversity filtering. |
|
|
Maximum pairwise sequence identity (0–1) for redundancy reduction in the follow-up filter step. Set to 1.0 to disable. |
|
|
Only apply filtering when the MSA has more than this many sequences in the follow-up filter step. Set to 0 to always filter. |
Expand-Step Parameters#
Variable |
Default |
Description |
|---|---|---|
|
|
E-value threshold for the follow-up expansion step. |
Search Iteration Parameters#
Variable |
Default |
Description |
|---|---|---|
|
|
Number of profile-search iterations in the profile (first-database) search. |
|
|
Number of profile-search iterations in follow-up searches. |
|
|
Number of profile-search iterations in paired (multimer) search. |
|
|
Number of profile-search iterations for follow-up databases that are not in |
Other Pipeline Parameters#
Variable |
Default |
Description |
|---|---|---|
|
|
Default number of threads for MMseqs2 commands. |
|
|
Sensitivity parameter for template search. Higher values are more sensitive but slower. |
|
|
E-value threshold for template search hits. |
|
|
Output format for |
|
|
Maximum accepted targets in paired search alignment. Set very high by default to maximize pairing candidates. |
|
|
E-value threshold for paired search alignment. |
|
|
Maximum sequence identity threshold during paired search expansion. |
Note
These environment variables are global: they apply to all requests served by the NIM instance. To change them, restart the container with the desired values. A future release may expose some of these as per-request API parameters.