NVIDIA Docs Hub Homepage NVIDIA Clara Welcome to NVIDIA Parabricks v4.7.0 4.6.0-1 Release Notes

4.6.0-1 Release Notes

Highlights:

New variant caller pangenome_aware_deepvariant for GPU-accelerated Pangenome-aware DeepVariant variant calling.
Performance improvement in giraffe, rna_fq2bam, minimap2, fq2bam, and fq2bam_meth.
New features added in mutectcaller: mitochondrial mode, pileup detection and so on.
Updated deepvariant and deepsomatic to match with baseline Google version 1.9.
Added support for --mode ont and --mode pacbio in deepsomatic.
Added WES support in deepsomatic.

New Tools

With Parabricks 4.6.0 we are releasing a GPU-accelerated pangenome_aware_deepvariant tool. Pangenome-aware DeepVariant is an enhanced version of Google's DeepVariant that leverages pangenome reference graphs (GBZ files) to improve variant calling accuracy, particularly in complex and highly variable genomic regions. It generates pileup images of both reads and pangenome haplotypes near potential variants and uses a Convolutional Neural Network to infer genotypes.

Improvements

For all tools: Added support for preserving file symlinks (--preserve-file-symlinks) when processing input files. This feature is useful for reference files such as FASTA sequences, ensuring that inferred files such as reference indexes can be located at their symlink destinations.

Tool Updates

fq2bam and fq2bam_meth:

Significantly improved performance for Hopper, Blackwell, Blackwell Ultra, and RTX Pro Blackwell GPU architectures (compute capabilities 9.0, 10.0, 10.3, and 12.0).
Improved performance across all other supported GPU architectures.
Optimized thread scheduling algorithm for multi-threaded CPU processing stages.
Introduced new --bwa-nstreams option (now default) with auto mode that automatically configures the number of streams based on the GPU's device memory specifications. This option optimizes performance by maximizing stream utilization while reducing false errors due to memory limitations. Users retain the ability to manually specify the number of streams when desired.
Added the option --bwa-primary-cpus. Maintains the previous default behavior where there is one primary CPU per GPU. Each primary CPU thread drives P CPU thread pool threads as specifed with the option --bwa-cpu-thread-pool. The total number of CPU threads processing the CPU stages of alignment is the product of the --bwa-primary-cpus and --bwa-cpu-thread-pool parameters. This allows the user to control the ratio of "primary" CPU threads, which act indepently, to thread pool threads, which act in unison. Changing the number of primary CPU threads may increase the CPU resources required. This is an advanced performance tuning option.
GPU CRAM writer now has reduced CPU memory requirements.
Improved performance of CPU threads during alignment on x86-64 processors with AVX2, AVX512 (if supported) and on ARM processors with Neon instructions.

deepvariant and deepsomatic:

Introduced new --num-streams-per-gpu option (now default) with auto mode that will automatically try to use an optimal amount of streams based on the GPU's device memory specifications.

giraffe:

Optimized device code for cluster extension routine. This yields improved performance for both single- and paired-end alignment.
Added the option to compute minimizers and seeds on the GPU for single-end alignment using --minimizers-gpu. This yields improved performance and is recommended for GPUs with more than 80GB of memory.
Added the option to perform the sorting step for minimizers on the GPU using --minimizers-gpu-sort. This yealds improved performance but it may produce a different BAM. Please see Giraffe's man page here for more details.
The configuration of CPU threads has been streamlined for ease of use. Previously, users had the option to set both --num-primary-cpus-per-gpu and --cpu-thread-pool. Now, this has been simplified to a single option: --num-cpu-threads-per-gpu, which specifies the number of CPU threads per GPU. The total number of threads utilized will be the value of --num-cpu-threads-per-gpu multiplied by the number of GPUs. This change removes the complexity of managing the thread pool size and simplifies the configuration process.

mutectcaller:

Added mitochondrial mode --mitochondria-mode.
Added the following new options:
- --minimum-mapping-quality.
- --min-base-quality-score.
- --f1r2-median-mq.
- --base-quality-score-threshold.
- --normal-lod.
- --allow-non-unique-kmers-in-ref.
- --enable-dynamic-read-disqualification-for-genotyping.
- --recover-all-dangling-branches.
- --pileup-detection.
- -A AssemblyComplexity (through --mutectcaller-options).
- -min-dangling-branch-length (through --mutectcaller-options).

haplotypecaller:

Added argument --activeregion-alt-multiplier (through --haplotypecaller-options).

minimap2:

A new alignment batching has been implemented which improves performance for PacBio datasets and allows for better speedup with more than 2 GPUs.
GPU CRAM writer now has reduced CPU memory requirements.

rna_fq2bam:

A new performance knob, --enable-gpu-helper-threads, is added to enable a certain number of CPU threads to help with GPU workloads. These helper threads improve performance when the GPU has lower compute capabilities.
Added GeneCount option in --quantMode.
Code optimization that leads to significant performance improvements.
GPU CRAM writer now has reduced CPU memory requirements.

Bug Fixes

fq2bam, fq2bam_meth, deepvariant, deepsomatic, and pangenome_aware_deepvariant:

In the 4.6.0-2 release we have fixed a bug which caused the above tools to crash on DGX Spark unless performance parameters were manually set.

fq2bam and fq2bam_meth:

In the 4.6.0-2 release we have fixed a memory leak which occurred during CPU recovery.
Resolved potential deadlock conditions that could occur when work queues become saturated.
Fixed error which could trigger if --bwa-cpu-thread-pool was set to 1 thread.
Fixed benign error which was reported when setting the environment variable CUDA_LOG_FILE=stderr.

fq2bam, fq2bam_meth, giraffe:

Fixed a memory leak when the --align-only option is on and outputs an unsorted bam file.

fq2bam, fq2bam_meth, giraffe, minimap2, rna_fq2bam:

Fixed a missing comparison between the sum of base qualities and INT8_MAX / 2 in mark duplicates.

giraffe:

Fixed a bug where batches could fail to enter CPU recovery mode when an error occurred during GPU extension.

Known Issues:

fq2bam, fq2bam_meth, giraffe, minimap2, rna_fq2bam, bamsort, and associated pipelines: The parameter --gpuwrite is not recommended for use on DGX Spark as it may fail.

For further information see the Parabricks datasheet.

Previous 4.7.0-1 Release Notes

Next 4.5.1-1 Release Notes