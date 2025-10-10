While Parabricks Pangenome-aware DeepVariant does not lose any accuracy in functionality when compared with Google's Pangenome-aware DeepVariant, there are several reasons that can result in different output files:

CNN Inference

Google DeepVariant uses a CNN (convolutional neural network) to predict the possibilities of each variant candidate. The model is trained, and does inference through, Keras. In Parabricks DeepVariant we convert this Keras model to an engine file with TensorRT to perform accelerated deep learning inferencing on NVIDIA GPUs. Because of the optimizations from TensorRT there is a small difference in the final possibility scores after inferencing (10^-5), which could cause a few different variants in the final VCF output. Based on current observations the mismatches only happen to RefCalls with a quality score of zero.

Read Sorting Differences

The Google Pangenome-aware DeepVariant implementation uses sort instead of stable_sort for sorting reads based on position, fragment_name, and read_number. Unfortunately, when the keep-supplementary-alignments option is enabled, it is possible to have duplicate reads which are sorted non-deterministically by std::sort . The Parabricks implementation uses stable_sort to resolve this. To obtain identical results with Google's implementation, users are recommended to update the std::sort in BuildPileupForOneSample pileup_image_native.cc to std::stable_sort .

GBZ Reader Caching Mechanism