QUALITY CONTROL AND BAM PROCESSING

Accelerated SplitNCigarReads functionality from GATK. This tool splits reads that contain Ns in their cigar string (e.g. spanning splicing events in RNAseq data).

QUICK START

Copy
Copied!
            

$ pbrun splitncigar --ref Ref.fa --in-bam in.bam --out-bam out.bam


COMPATIBLE GATK COMMAND

The command below is the GATK counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!
            

gatk SplitNCigarReads --reference Ref.fa --input in.bam --output tmp.bam gatk SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=tmp.bam \ -O=out.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun


OPTIONS

--ref

Path to the reference file (default: None)

--in-bam

Path to the input BAM/CRAM file. (default: None)

--knownSites

Path to a known indels file. Must be in vcf/vcf.gz format. This option can be used multiple times (default: None)

--out-bam

Path to the output BAM/CRAM file. (default: None)

--out-recal-file

Path of report file after Base Quality Score Recalibration. Path can be a Google Cloud Storage object or AWS S3 Storage object (default: None)

--num-cpu-threads

Number of CPU threads to traverse separate chromosomes in splitncigar (default: 4)

--no-ignore-mark

Do not ignore marked reads in sorted output (default: None)

Accelerated mpileup functionality from samtools

QUICK START

Copy
Copied!
            

$ pbrun samtoolsmpileup --in-bam wgs.bam --out-file pileup.txt


COMPATIBLE SAMTOOLS COMMAND

The command below is the samtools counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!
            

samtools mpileup /w/wgs.bam -o pileup.txt -d 0


OPTIONS

--in-bam

(required) Path to the input BAM/CRAM file.

--ref

Path to the reference file.

--out-file

Path of output text pileup. If this option is not used, it will write to standard output (default: None)

--num-threads

Number of threads for worker (default: 12)

--min-mapq

Skip alignments with mapping quality smaller than this value (default: 0)

--enable-baq

Enable BAQ (per-Base Alignment Quality) (default: None)

--anomalous-reads

Do not discard anomalous read pairs (default: None)

--interval-file

Path to a BED file (.bed) for selective access. This option can be used multiple times (default: None)

-L, --interval

Interval within which to call the variants from the bam file. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--no-seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

Accelerated mpileup functionality from bcftools

QUICK START

Copy
Copied!
            

$ pbrun bcftoolsmpileup --in-bam wgs.bam --out-file pileup.vcf


COMPATIBLE BCFTOOLS COMMAND

The command below is the bcftools counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!
            

bcftools mpileup wgs.bam -o pileup.txt -d 2147483647


OPTIONS

--in-bam

(required) Path to the input BAM/CRAM file.

--ref

Path to the reference file.

--out-file

Path of output text pileup. If this option is not used, it will write to standard output. By default, the output will be uncompressed VCF. To output uncompressed BCF, use the --bcf option (default: None)

--num-threads

Number of threads for worker (default: 1)

--min-mapq

Skip alignments with mapping quality smaller than this value (default: 0)

--disable-baq

Disable BAQ (per-Base Alignment Quality) (default: None)

--anomalous-reads

Do not discard anomalous read pairs (default: None)

--bcf

Output uncompressed BCF (default: None)

--no-reference

Do not require fasta reference file (default: None)

--no-version

Do not append version and command line to the header (default: None)

--interval-file

Path to a BED file (.bed) for selective access. This option can be used multiple times (default: None)

-L, --interval

Interval within which to call the variants from the bam file. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--no-seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

Accelerated CollectWGSMetrics functionality from GATK4

bammetrics collects whole genome sequencing metrics, similar to CollectWGSMetrics from GATK4, but in a highly accelerated manner. The output metrics match exactly with that of GATK4.

QUICK START

Copy
Copied!
            

$ pbrun bammetrics --ref Ref.fa --bam wgs.bam --out-metrics-file metrics.txt


COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!
            

gatk CollectWGSMetrics -R Ref.fa -I wgs.bam -O metrics.txt


OPTIONS

--ref

(required) Path to the reference file.

--bam

(required) Path to the input BAM/CRAM file.

--out-metrics-file

(required) Output metrics file.

--minimum-base-quality

Minimum base quality for a base to contribute coverage (default: 20)

--minimum-mapping-quality

Minimum mapping quality for a read to contribute coverage (default: 20)

--count-unpaired

If true, count unpaired reads, and paired reads with one end unmapped (default: None)

--coverage-cap

Treat positions with coverage exceeding this value as if they had coverage at this value (but calculate the difference for PCT_EXC_CAPPED) (default: 250)

--num-threads

Defaults to 12.

Number of parallel threads to use.

–interval/-L Interval strings. Overlapping intervals will be combined. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. -L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000 (default: None)

--interval-file

Path to an interval file with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--no-seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

Accelerated CollectMultipleMetrics from GATK4

collectmultiplemetrics collects whole genome sequencing metrics, similar to CollectMultipleMetrics from GATK4, but in a highly accelerated manner. The output metrics match exactly with that of GATK4.

QUICK START

CLI

Copy
Copied!
            

$ pbrun collectmultiplemetrics --ref Ref.fa \ --bam wgs.bam \ --out-qc-metrics-dir output-qc\ --gen-all-metrics


COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!
            

gatk CollectMultipleMetrics --REFERENCE_SEQUENCE Ref.fa -I wgs.bam -O metrics \ --PROGRAM CollectAlignmentSummaryMetrics \ --PROGRAM CollectInsertSizeMetrics \ --PROGRAM QualityScoreDistribution \ --PROGRAM MeanQualityByCycle \ --PROGRAM CollectBaseDistributionByCycle \ --PROGRAM CollectGcBiasMetrics \ --PROGRAM CollectSequencingArtifactMetrics \ --PROGRAM CollectQualityYieldMetrics


OPTIONS

--ref

(required) Path to the reference file.

--bam

(required) Path to the input BAM file. (No CRAM support yet)

--out-qc-metrics-dir

This option will automatically run every analysis. The output file of each analysis will start with this prefix name (default: None)

--gen-all-metrics

Generate QC for alignment summary metric (default:None)

--gen-alignment

Generate QC for alignment summary metric (default: None)

--gen-quality-score

Generate QC for quality score distribution metric (default: None)

--gen-insert-size

Generate QC for insert size metric (default: None)

--gen-mean-quality-by-cycle

Generate QC for mean quality by cycle metric (default: None)

--gen-base-distribution-by-cycle

Generate QC for base distribution by cycle metric (default: None)

--gen-gc-bias

Prefix name used to generate detail and summary files for gc bias metric (default: None)

--gen-seq-artifact

Generate QC for sequencing artifact metric (default: None)

--gen-quality-yield

Generate QC for quality yield metric (default: None)

--bam-decompressor-threads

Defaults to 3.

Number of threads for bam decompression.

--num-gpus NUM_GPUS

Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used. If you are using flexera, please include –gpu-devices too.

--gpu-devices GPU_DEVICES

Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--no-seccomp-override

Do not override seccomp options for docker

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory where bin/ and species/ folders are located.

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in installation directory.

--version

View compatible software versions.

© Copyright 2021, NVIDIA. Last updated on Oct 8, 2021.