NVIDIA Docs Hub Homepage NVIDIA Clara Welcome to NVIDIA Parabricks v4.6.0 Improvements

Improvements

For all tools: Added support for preserving file symlinks (--preserve-file-symlinks) when processing input files. This feature is useful for reference files such as FASTA sequences, ensuring that inferred files such as reference indexes can be located at their symlink destinations.

Tool Updates

fq2bam and fq2bam_meth:

Significantly improved performance for Hopper, Blackwell, Blackwell Ultra, and RTX Pro Blackwell GPU architectures (compute capabilities 9.0, 10.0, 10.3, and 12.0).
Improved performance across all other supported GPU architectures.
Optimized thread scheduling algorithm for multi-threaded CPU processing stages.
Introduced new --bwa-nstreams option (now default) with auto mode that automatically configures the number of streams based on the GPU's device memory specifications. This option optimizes performance by maximizing stream utilization while reducing false errors due to memory limitations. Users retain the ability to manually specify the number of streams when desired.
Added the option --bwa-primary-cpus. Maintains the previous default behavior where there is one primary CPU per GPU. Each primary CPU thread drives P CPU thread pool threads as specifed with the option --bwa-cpu-thread-pool. The total number of CPU threads processing the CPU stages of alignment is the product of the --bwa-primary-cpus and --bwa-cpu-thread-pool parameters. This allows the user to control the ratio of "primary" CPU threads, which act indepently, to thread pool threads, which act in unison. Changing the number of primary CPU threads may increase the CPU resources required. This is an advanced performance tuning option.
GPU CRAM writer now has reduced CPU memory requirements.
Improved performance of CPU threads during alignment on x86-64 processors with AVX2, AVX512 (if supported) and on ARM processors with Neon instructions.

deepvariant and deepsomatic:

Introduced new --num-streams-per-gpu option (now default) with auto mode that will automatically try to use an optimal amount of streams based on the GPU's device memory specifications.

giraffe:

Optimized device code for cluster extension routine. This yields improved performance for both single- and paired-end alignment.
Added the option to compute minimizers and seeds on the GPU for single-end alignment using --minimizers-gpu. This yields improved performance and is recommended for GPUs with more than 80GB of memory.
Added the option to perform the sorting step for minimizers on the GPU using --minimizers-gpu-sort. This yealds improved performance but it may produce a different BAM. Please see Giraffe's man page here for more details.
The configuration of CPU threads has been streamlined for ease of use. Previously, users had the option to set both --num-primary-cpus-per-gpu and --cpu-thread-pool. Now, this has been simplified to a single option: --num-cpu-threads-per-gpu, which specifies the number of CPU threads per GPU. The total number of threads utilized will be the value of --num-cpu-threads-per-gpu multiplied by the number of GPUs. This change removes the complexity of managing the thread pool size and simplifies the configuration process.

mutectcaller:

Added mitochondrial mode --mitochondria-mode.
Added the following new options:
- --minimum-mapping-quality.
- --min-base-quality-score.
- --f1r2-median-mq.
- --base-quality-score-threshold.
- --normal-lod.
- --allow-non-unique-kmers-in-ref.
- --enable-dynamic-read-disqualification-for-genotyping.
- --recover-all-dangling-branches.
- --pileup-detection.
- -A AssemblyComplexity (through --mutectcaller-options).
- -min-dangling-branch-length (through --mutectcaller-options).

haplotypecaller:

Added argument --activeregion-alt-multiplier (through --haplotypecaller-options).

minimap2:

A new alignment batching has been implemented which improves performance for PacBio datasets and allows for better speedup with more than 2 GPUs.
GPU CRAM writer now has reduced CPU memory requirements.

rna_fq2bam:

A new performance knob, --enable-gpu-helper-threads, is added to enable a certain number of CPU threads to help with GPU workloads. These helper threads improve performance when the GPU has lower compute capabilities.
Added GeneCount option in --quantMode.
Code optimization that leads to significant performance improvements.
GPU CRAM writer now has reduced CPU memory requirements.