4.6.0-1 Release Notes
Highlights:
New variant caller pangenome_aware_deepvariant for GPU-accelerated Pangenome-aware DeepVariant variant calling.
Performance improvement in giraffe, rna_fq2bam, minimap2, fq2bam, and fq2bam_meth.
New features added in mutectcaller: mitochondrial mode, pileup detection and so on.
Updated deepvariant and deepsomatic to match with baseline Google version 1.9.
Added support for
--mode ont
and--mode pacbio
in deepsomatic.Added WES support in deepsomatic.
With Parabricks 4.6.0 we are releasing a GPU-accelerated pangenome_aware_deepvariant tool. Pangenome-aware DeepVariant is an enhanced version of Google's DeepVariant that leverages pangenome reference graphs (GBZ files) to improve variant calling accuracy, particularly in complex and highly variable genomic regions. It generates pileup images of both reads and pangenome haplotypes near potential variants and uses a Convolutional Neural Network to infer genotypes.
For all tools: Added support for preserving file symlinks (--preserve-file-symlinks
) when processing input files.
This feature is useful for reference files such as FASTA sequences,
ensuring that inferred files such as reference indexes can be located at their symlink destinations.
Tool Updates
fq2bam and fq2bam_meth:
-
Significantly improved performance for Hopper, Blackwell, Blackwell Ultra, and RTX Pro Blackwell GPU architectures (compute capabilities 9.0, 10.0, 10.3, and 12.0).
-
Improved performance across all other supported GPU architectures.
-
Optimized thread scheduling algorithm for multi-threaded CPU processing stages.
-
Introduced new
--bwa-nstreams
option (now default) withauto
mode that automatically configures the number of streams based on the GPU's device memory specifications. This option optimizes performance by maximizing stream utilization while reducing false errors due to memory limitations. Users retain the ability to manually specify the number of streams when desired. -
Added the option
--bwa-primary-cpus
. Maintains the previous default behavior where there is one primary CPU per GPU. Each primary CPU thread drivesP
CPU thread pool threads as specifed with the option--bwa-cpu-thread-pool
. The total number of CPU threads processing the CPU stages of alignment is the product of the--bwa-primary-cpus
and--bwa-cpu-thread-pool
parameters. This allows the user to control the ratio of "primary" CPU threads, which act indepently, to thread pool threads, which act in unison. Changing the number of primary CPU threads may increase the CPU resources required. This is an advanced performance tuning option. -
GPU CRAM writer now has reduced CPU memory requirements.
-
Improved performance of CPU threads during alignment on x86-64 processors with AVX2, AVX512 (if supported) and on ARM processors with Neon instructions.
deepvariant and deepsomatic:
-
Introduced new
--num-streams-per-gpu
option (now default) withauto
mode that will automatically try to use an optimal amount of streams based on the GPU's device memory specifications.
-
Optimized device code for cluster extension routine. This yields improved performance for both single- and paired-end alignment.
-
Added the option to compute minimizers and seeds on the GPU for single-end alignment using
--minimizers-gpu
. This yields improved performance and is recommended for GPUs with more than 80GB of memory. -
Added the option to perform the sorting step for minimizers on the GPU using
--minimizers-gpu-sort
. This yealds improved performance but it may produce a different BAM. Please see Giraffe's man page here for more details. -
The configuration of CPU threads has been streamlined for ease of use. Previously, users had the option to set both
--num-primary-cpus-per-gpu
and--cpu-thread-pool
. Now, this has been simplified to a single option:--num-cpu-threads-per-gpu
, which specifies the number of CPU threads per GPU. The total number of threads utilized will be the value of--num-cpu-threads-per-gpu
multiplied by the number of GPUs. This change removes the complexity of managing the thread pool size and simplifies the configuration process.
-
Added mitochondrial mode
--mitochondria-mode
. -
Added the following new options:
-
--minimum-mapping-quality
. -
--min-base-quality-score
. -
--f1r2-median-mq
. -
--base-quality-score-threshold
. -
--normal-lod
. -
--allow-non-unique-kmers-in-ref
. -
--enable-dynamic-read-disqualification-for-genotyping
. -
--recover-all-dangling-branches
. -
--pileup-detection
. -
-A AssemblyComplexity
(through--mutectcaller-options
). -
-min-dangling-branch-length
(through--mutectcaller-options
).
-
-
Added argument
--activeregion-alt-multiplier
(through--haplotypecaller-options
).
-
A new alignment batching has been implemented which improves performance for PacBio datasets and allows for better speedup with more than 2 GPUs.
-
GPU CRAM writer now has reduced CPU memory requirements.
-
A new performance knob,
--enable-gpu-helper-threads
, is added to enable a certain number of CPU threads to help with GPU workloads. These helper threads improve performance when the GPU has lower compute capabilities. -
Added GeneCount option in
--quantMode
. -
Code optimization that leads to significant performance improvements.
-
GPU CRAM writer now has reduced CPU memory requirements.
fq2bam and fq2bam_meth:
-
Resolved potential deadlock conditions that could occur when work queues become saturated.
-
Fixed error which could trigger if
--bwa-cpu-thread-pool
was set to 1 thread. -
Fixed benign error which was reported when setting the environment variable CUDA_LOG_FILE=stderr.
-
Fixed a memory leak when the
--align-only
option is on and outputs an unsorted bam file.
fq2bam, fq2bam_meth, giraffe, minimap2, rna_fq2bam:
-
Fixed a missing comparison between the sum of base qualities and
INT8_MAX / 2
in mark duplicates.
-
Fixed a bug where batches could fail to enter CPU recovery mode when an error occurred during GPU extension.
For further information see the Parabricks datasheet.