QUALITY CONTROL AND BAM PROCESSING
Accelerated SplitNCigarReads functionality from GATK. This tool splits reads that contain Ns in their cigar string (e.g. spanning splicing events in RNAseq data).
QUICK START
$ pbrun splitncigar --ref Ref.fa --in-bam in.bam --out-bam out.bam
COMPATIBLE GATK COMMAND
The command below is the GATK counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
gatk SplitNCigarReads --reference Ref.fa --input in.bam --output tmp.bam
gatk SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=tmp.bam \
-O=out.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun
OPTIONS
- --ref
- --in-bam
- --knownSites
- --out-bam
- --out-recal-file
- --num-cpu-threads
- --no-ignore-mark
Path to the reference file (default: None)
Path to the input BAM/CRAM file. (default: None)
Path to a known indels file. Must be in vcf/vcf.gz format. This option can be used multiple times (default: None)
Path to the output BAM/CRAM file. (default: None)
Path of report file after Base Quality Score Recalibration. Path can be a Google Cloud Storage object or AWS S3 Storage object (default: None)
Number of CPU threads to traverse separate chromosomes in splitncigar (default: 4)
Do not ignore marked reads in sorted output (default: None)
Accelerated mpileup functionality from samtools
QUICK START
$ pbrun samtoolsmpileup --in-bam wgs.bam --out-file pileup.txt
COMPATIBLE SAMTOOLS COMMAND
The command below is the samtools counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
samtools mpileup /w/wgs.bam -o pileup.txt -d 0
OPTIONS
- --in-bam
- --ref
- --out-file
- --num-threads
- --min-mapq
- --enable-baq
- --anomalous-reads
- --interval-file
- -L, --interval
(required) Path to the input BAM file.(No CRAM support yet)
Path to the reference file.
Path of output text pileup. If this option is not used, it will write to standard output (default: None)
Number of threads for worker (default: 12)
Skip alignments with mapping quality smaller than this value (default: 0)
Enable BAQ (per-Base Alignment Quality) (default: None)
Do not discard anomalous read pairs (default: None)
Path to a BED file (.bed) for selective access. This option can be used multiple times (default: None)
Interval within which to call the variants from the bam file. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)
- --tmp-dir TMP_DIR
- --seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated mpileup functionality from bcftools
QUICK START
$ pbrun bcftoolsmpileup --in-bam wgs.bam --out-file pileup.vcf
COMPATIBLE BCFTOOLS COMMAND
The command below is the bcftools counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
bcftools mpileup wgs.bam -o pileup.txt -d 2147483647
OPTIONS
- --in-bam
- --ref
- --out-file
- --num-threads
- --min-mapq
- --disable-baq
- --anomalous-reads
- --bcf
- --no-reference
- --no-version
- --interval-file
- -L, --interval
(required) Path to the input BAM file.(No CRAM support yet)
Path to the reference file.
Path of output text pileup. If this option is not used, it will write to standard output. By default, the output will be uncompressed VCF. To output uncompressed BCF, use the --bcf
option (default: None)
Number of threads for worker (default: 1)
Skip alignments with mapping quality smaller than this value (default: 0)
Disable BAQ (per-Base Alignment Quality) (default: None)
Do not discard anomalous read pairs (default: None)
Output uncompressed BCF (default: None)
Do not require fasta reference file (default: None)
Do not append version and command line to the header (default: None)
Path to a BED file (.bed) for selective access. This option can be used multiple times (default: None)
Interval within which to call the variants from the bam file. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)
- --tmp-dir TMP_DIR
- --seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated CollectWGSMetrics functionality from GATK4
bammetrics collects whole genome sequencing metrics, similar to CollectWGSMetrics from GATK4, but in a highly accelerated manner. The output metrics match exactly with that of GATK4.
QUICK START
$ pbrun bammetrics --ref Ref.fa --bam wgs.bam --out-metrics-file metrics.txt
COMPATIBLE GATK4 COMMAND
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
gatk CollectWGSMetrics -R Ref.fa -I wgs.bam -O metrics.txt
OPTIONS
- --ref
- --bam
- --out-metrics-file
- --minimum-base-quality
- --minimum-mapping-quality
- --count-unpaired
- --coverage-cap
- --num-threads
(required) Path to the reference file.
(required) Path to the input BAM/CRAM file.
(required) Output metrics file.
Minimum base quality for a base to contribute coverage (default: 20)
Minimum mapping quality for a read to contribute coverage (default: 20)
If true, count unpaired reads, and paired reads with one end unmapped (default: None)
Treat positions with coverage exceeding this value as if they had coverage at this value (but calculate the difference for PCT_EXC_CAPPED) (default: 250)
Defaults to 12.
Number of parallel threads to use.
–interval/-L Interval strings. Overlapping intervals will be combined. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. -L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000
(default: None)
- --interval-file
Path to an interval file with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times
- --tmp-dir TMP_DIR
- --seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.
Accelerated CollectMultipleMetrics from GATK4
collectmultiplemetrics collects whole genome sequencing metrics, similar to CollectMultipleMetrics from GATK4, but in a highly accelerated manner. The output metrics match exactly with that of GATK4.
QUICK START
CLI
$ pbrun collectmultiplemetrics --ref Ref.fa \
--bam wgs.bam \
--out-qc-metrics-dir output-qc\
--gen-all-metrics
COMPATIBLE GATK4 COMMAND
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
gatk CollectMultipleMetrics --REFERENCE_SEQUENCE Ref.fa -I wgs.bam -O metrics \
--PROGRAM CollectAlignmentSummaryMetrics \
--PROGRAM CollectInsertSizeMetrics \
--PROGRAM QualityScoreDistribution \
--PROGRAM MeanQualityByCycle \
--PROGRAM CollectBaseDistributionByCycle \
--PROGRAM CollectGcBiasMetrics \
--PROGRAM CollectSequencingArtifactMetrics \
--PROGRAM CollectQualityYieldMetrics
OPTIONS
- --ref
- --bam
- --out-qc-metrics-dir
- --gen-all-metrics
- --gen-alignment
- --gen-quality-score
- --gen-insert-size
- --gen-mean-quality-by-cycle
- --gen-base-distribution-by-cycle
- --gen-gc-bias
- --gen-seq-artifact
- --gen-quality-yield
- --processor-threads
- --bam-decompressor-threads
(required) Path to the reference file.
(required) Path to the input BAM file. (No CRAM support yet)
This option will automatically run every analysis. The output file of each analysis will start with this prefix name (default: None)
Generate QC for alignment summary metric (default:None)
Generate QC for alignment summary metric (default: None)
Generate QC for quality score distribution metric (default: None)
Generate QC for insert size metric (default: None)
Generate QC for mean quality by cycle metric (default: None)
Generate QC for base distribution by cycle metric (default: None)
Prefix name used to generate detail and summary files for gc bias metric (default: None)
Generate QC for sequencing artifact metric (default: None)
Generate QC for quality yield metric (default: None)
Defaults to 8.
Number of threads for processing (up to 20 threads shows increasing performance).
Defaults to 3.
Number of threads for bam decompression.
- --tmp-dir TMP_DIR
- --seccomp-override
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --version
Full path to the directory where temporary files will be stored.
Do not override seccomp options for docker
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in installation directory.
View compatible software versions.