Configuration#

This section describes how to configure the MSA Search NIM container using environment variables, GPU selection, and volume mounting.

GPU Selection#

By default, Docker uses all available GPUs on the system when started with the NVIDIA Container Runtime:

docker run --runtime=nvidia ...

To expose only specific GPUs to the container, use the --gpus flag:

docker run --gpus='"device=1"' ...

The device IDs are listed in the output of nvidia-smi -L. For more information on GPU selection, refer to the NVIDIA Container Toolkit documentation.

Environment Variables#

The following environment variables can be passed to the NIM using the -e flag with docker run:

ENV

Required?

Default

Notes

NGC_API_KEY

Yes

None

You must set this variable to the value of your personal NGC API key.

NIM_CACHE_PATH

No

/opt/nim/.cache

Location (in container) where the container caches model artifacts.

NIM_HTTP_API_PORT

No

8000

Port inside the container where the NIM service is published. When using the -p flag with docker run, ensure the right-hand side of the colon matches this value (for example, -p 8000:8000).

NIM_DISABLE_GPU_SERVER

No

False

Controls the GPU Server for MMSeqs2. The GPU Server is enabled by default for optimal performance with the included ColabFold databases. Set to True to disable if using custom databases that do not support GPU Server.

NIM_GLOBAL_MAX_MSA_DEPTH

No

500

Sets the maximum number of MSA sequences returned, and is reflected as both the default and the maximum allowed value for the max_msa_sequences API parameter. Must be set globally for all requests when GPU Server is enabled (default in version 2.0.0).

NIM_MSA_API_MAX_DATABASES

No

5

Maximum number of databases that can be specified in a single API request.

NIM_MSA_API_MAX_ITERATIONS

No

6

Maximum value for the iterations API parameter.

NIM_MSA_API_MAX_CHAINS

No

20

Maximum number of chains (sequences) allowed in a paired MSA search request.

NIM_MSA_API_MAX_SEQUENCE_LENGTH

No

4096

Maximum length of an input protein sequence.

NIM_MSA_API_MAX_STRUCTURES

No

1000

Maximum value for the max_structures API parameter in template search requests.

NIM_MSA_API_SEQ_PATTERN

No

^[ARNDCQEGHILKMFPSTWYVX]+$

Regex pattern used to validate input protein sequences. Modify to accept non-standard amino acid characters.

NIM_MSA_API_DEFAULT_STRUCTURAL_TEMPLATE_DBS

No

pdb70_220313

Comma-separated list of databases to search for structural templates by default.

NIM_MSA_COLABFOLD_DEFAULT_PROFILE_DB

No

uniref30_2302

Database used as the profile (first) database in the cascaded ColabFold search pipeline. Must be one of the available databases.

NIM_MSA_EXPANDABLE_DBS

No

uniref30_2302,colabfold_envdb_202108

Comma-separated list of databases that use the full cascaded expansion pipeline. Databases not in this list use a simpler non-expansion follow-up search.

NIM_MSA_DB_NAME_MAPPINGS

No

uniref30_2302:Uniref30_2302,pdb70_220313:PDB70_220313

Comma-separated list of lowercase:DisplayName mappings for database names in API responses when using ["all"]. Used for backwards compatibility; will be removed in MSA 3.0 when all database names will resolve to lowercase.

NIM_MSA_DB_DISPLAY_NAMES

No

(empty)

Comma-separated list of db_name:Display Name overrides for human-readable database names returned by the /biology/colabfold/msa-search/config/msa-database-configs endpoint. Overrides the defaults in /opt/nim/msa/config.py. Example: my_custom_db:My Custom Database.

NIM_LOG_LEVEL

No

INFO

Specifies the level of logging detail. Available options: DEBUG, INFO, WARNING, ERROR.

NIM_MODEL_NAME

No

None

Path to custom database directory inside the container. Use with volume mounts to load external or custom MMSeqs2 databases. Refer to Task-Specific Profiles and Custom Databases for details.

NIM_MODEL_PROFILE

No

None

Select a specific database profile by ID. Profiles allow reduced storage by loading only the databases needed for a particular task. Refer to Task-Specific Profiles and Custom Databases.

NIM_MSA_DB_INDEX_PATTERN

No

**/*.idx

Glob pattern for discovering sequence databases. The NIM scans for files matching this pattern under NIM_MODEL_NAME at startup. Database names are derived from the parent directory.

NIM_MSA_PDB_CIFS_ZIP

No

**/pdb*.zip

Glob pattern for discovering the PDB mmCIF structure archive. Used for retrieving 3D structures during structural template search.

NIM_GPUSERVER_STARTUP_TIMEOUT_S

No

300

Maximum time (seconds) to wait for a GPU server to become ready at startup.

NIM_GPUSERVER_STATUS_INTERVAL_S

No

3

Interval (seconds) between GPU server health-check status reports during startup.

NIM_TELEMETRY_MODE

No

0

Controls telemetry collection. Set to 0 to disable telemetry (default), set to 1 to enable telemetry. Telemetry helps NVIDIA improve performance, compatibility, and reliability while maintaining strict privacy protections. For more information, refer to NVIDIA’s Privacy Policy and NIM Telemetry Settings.

NIM_TELEMETRY_ENABLE_LOGGING

No

true

Enables logging for telemetry operations when set to true. Only applicable when NIM_TELEMETRY_MODE=1.

Volumes#

The following container paths can be mounted to local directories for persistent storage.

Container Path

Required

Notes

Docker Argument Example

/opt/nim/.cache (or NIM_CACHE_PATH if set)

Recommended

Directory where model artifacts and databases are cached. If not mounted, the container will re-download on each startup. Ensure the local directory has read-write-execute permissions (chmod 0777). Example: mkdir -p ~/.cache/nim && chmod 0777 ~/.cache/nim

-v ~/.cache/nim:/opt/nim/.cache

Custom path (set using NIM_MODEL_NAME env variable)

Optional

Mount custom databases by setting NIM_MODEL_NAME to point to your database directory inside the container, then mount the host directory to that path. Useful for using external or custom MMSeqs2 databases. Refer to Task-Specific Profiles and Custom Databases for detailed examples.

-e NIM_MODEL_NAME=/databases -v /host/path/to/db:/databases/mydb:ro

Note

The MODEL_PATH environment variable is also supported for backward compatibility with the same behavior as NIM_MODEL_NAME.

MMseqs2 Pipeline Tuning Environment Variables#

The MSA Search NIM exposes several environment variables that control internal MMseqs2 pipeline parameters. These are set at container startup and apply globally to all requests. The cascaded search pipeline (used by both colabfold and template search) follows these steps per database:

  1. Search — Iterative profile search to find initial hits

  2. Expand — Recover remote homologs similar to initial hits

  3. Align — Sensitive realignment of expanded candidates

  4. Filter — Quality filtering (for example, query coverage, and sequence identity)

  5. result2msa — Convert filtered results to A3M alignment

The variables below tune the align and filter steps, which directly affect per-database output depth and, consequently, the size of the merged colabfold alignment.

Align-Step Parameters#

These control how many targets and alternative alignments are kept during the post-expansion realignment. Together, max_accept and alt_ali determine the upper bound U on the untrimmed per-database sequence count: Umax_accept × (1 + alt_ali).

Variable

Default

Description

NIM_MMSEQS_PROFILE_ALIGN_MAX_ACCEPT

100

Maximum number of accepted targets per query in the profile (first-database) align step. Corresponds to mmseqs align --max-accept.

NIM_MMSEQS_FOLLOWUP_ALIGN_MAX_ACCEPT

100

Same as above, but for follow-up (subsequent database) align steps.

NIM_MMSEQS_PROFILE_ALIGN_ALT_ALI

10

Maximum number of alternative non-overlapping alignments per accepted target in the profile align step. Corresponds to mmseqs align --alt-ali. Each accepted target can produce up to 1 + alt_ali alignment rows.

NIM_MMSEQS_FOLLOWUP_ALIGN_ALT_ALI

10

Same as above, but for follow-up align steps.

NIM_MMSEQS_PROFILE_ALIGN_EVALUE

1e-5

E-value threshold for the profile align step. Alignments above this e-value are discarded.

NIM_MMSEQS_FOLLOWUP_SEARCH_EVALUE

1e-5

E-value threshold for the follow-up search step.

NIM_MMSEQS_FOLLOWUP_ALIGN_EVALUE

1e-5

E-value threshold for the follow-up align step.

With defaults, each database can produce up to 100 × (1 + 10) = 1,100 sequences before trimming. Per-database API results are trimmed to max_msa_sequences (N, default 500), but the merged colabfold entry concatenates untrimmed results from all D databases, yielding up to D × 1,100 sequences.

Filter-Step Parameters#

These control quality filtering applied after the align step. Separate variables are provided for the profile (first-database) and follow-up (subsequent database) filter steps.

Variable

Default

Description

NIM_MMSEQS_PROFILE_FILTER_QID

0

Minimum sequence identity with query (0–1) in the profile filter step. Set to 0 to disable.

NIM_MMSEQS_PROFILE_FILTER_QSC

0.0

Minimum score per aligned residue with query sequence (mmseqs filterresult --qsc, range −50 to 100) in the profile filter step.

NIM_MMSEQS_PROFILE_FILTER_DIFF

0

Diversity filter: keep at least this many sequences in each MSA block of length 50 (--diff) in the profile filter step. Set to 0 to disable diversity filtering.

NIM_MMSEQS_PROFILE_FILTER_MAX_SEQ_ID

1.0

Maximum pairwise sequence identity (0–1) for redundancy reduction in the profile filter step. Set to 1.0 to disable.

NIM_MMSEQS_PROFILE_FILTER_MIN_ENABLE

100

Only apply filtering when the MSA has more than this many sequences in the profile filter step. Set to 0 to always filter.

NIM_MMSEQS_FOLLOWUP_FILTER_QID

0

Minimum sequence identity with query (0–1) in the follow-up filter step. Set to 0 to disable.

NIM_MMSEQS_FOLLOWUP_FILTER_QSC

0.0

Minimum score per aligned residue with query sequence (range −50 to 100) in the follow-up filter step.

NIM_MMSEQS_FOLLOWUP_FILTER_DIFF

0

Diversity filter: keep at least this many sequences in each MSA block of length 50 in the follow-up filter step. Set to 0 to disable diversity filtering.

NIM_MMSEQS_FOLLOWUP_FILTER_MAX_SEQ_ID

1.0

Maximum pairwise sequence identity (0–1) for redundancy reduction in the follow-up filter step. Set to 1.0 to disable.

NIM_MMSEQS_FOLLOWUP_FILTER_MIN_ENABLE

100

Only apply filtering when the MSA has more than this many sequences in the follow-up filter step. Set to 0 to always filter.

Expand-Step Parameters#

Variable

Default

Description

NIM_MMSEQS_FOLLOWUP_EXPAND_EVALUE

inf

E-value threshold for the follow-up expansion step. inf means all expanded hits are kept.

Search Iteration Parameters#

Variable

Default

Description

NIM_MMSEQS_PROFILE_SEARCH_ITERATIONS

3

Number of profile-search iterations in the profile (first-database) search.

NIM_MMSEQS_FOLLOWUP_SEARCH_ITERATIONS

3

Number of profile-search iterations in follow-up searches.

NIM_MMSEQS_PAIRED_SEARCH_ITERATIONS

3

Number of profile-search iterations in paired (multimer) search.

NIM_MMSEQS_FOLLOWUP_NONEXP_SEARCH_ITERATIONS

3

Number of profile-search iterations for follow-up databases that are not in NIM_MSA_EXPANDABLE_DBS (for example, non-expandable databases).

Other Pipeline Parameters#

Variable

Default

Description

NIM_MMSEQS_NUM_THREADS

12

Default number of threads for MMseqs2 commands.

NIM_MMSEQS_TEMPLATE_SENSITIVITY

7.5

Sensitivity parameter for template search. Higher values are more sensitive but slower.

NIM_MMSEQS_TEMPLATE_EVALUE

0.1

E-value threshold for template search hits.

NIM_MMSEQS_TEMPLATE_CONVERTALIS_FORMAT

query,target,fident,...

Output format for convertalis (template search hits). Refer to the MMseqs2 documentation for column options.

NIM_MMSEQS_PAIRED_ALIGN_MAX_ACCEPT

1000000

Maximum accepted targets in paired search alignment. Set very high by default to maximize pairing candidates.

NIM_MMSEQS_PAIRED_ALIGN_EVALUE

0.001

E-value threshold for paired search alignment.

NIM_MMSEQS_PAIRED_EXPAND_MAX_SEQ_ID

0.95

Maximum sequence identity threshold during paired search expansion.

Note

These environment variables are global: they apply to all requests served by the NIM instance. To change them, restart the container with the desired values. A future release may expose some of these as per-request API parameters.