QUALITY CONTROL AND BAM PROCESSING¶
SPLITNCIGAR¶
Accelerated SplitNCigarReads functionality from GATK. This tool splits reads that contain Ns in their cigar string (e.g. spanning splicing events in RNAseq data).
QUICK START¶
$ pbrun splitncigar --ref Ref.fa --in-bam in.bam --out-bam out.bam
COMPATIBLE GATK COMMAND¶
The command below is the GATK counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
gatk SplitNCigarReads --reference Ref.fa --input in.bam --output tmp.bam
gatk SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=tmp.bam \
-O=out.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun
OPTIONS¶
- --ref
Path to the reference file (default: None)
- --in-bam
Path to the bam file. Path can be a Google Cloud Storage object or AWS S3 Storage object (default: None)
- --knownSites
Path to a known indels file. Must be in vcf/vcf.gz format. This option can be used multiple times (default: None)
- --out-bam
Output BAM file. Path can be a Google Cloud Storage object or AWS S3 Storage object (default: None)
- --out-recal-file
Path of report file after Base Quality Score Recalibration. Path can be a Google Cloud Storage object or AWS S3 Storage object (default: None)
- --num-cpu-threads
Number of CPU threads to traverse separate chromosomes in splitncigar (default: 4)
- --no-ignore-mark
Do not ignore marked reads in sorted output (default: None)
SAMTOOLS MPILEUP¶
Accelerated mpileup functionality from samtools
QUICK START¶
$ pbrun samtoolsmpileup --in-bam wgs.bam --out-file pileup.txt
COMPATIBLE SAMTOOLS COMMAND¶
The command below is the samtools counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
samtools mpileup /w/wgs.bam -o pileup.txt -d 0
OPTIONS¶
- --in-bam
(required) Path to the input bam file.
- --ref
Path to the reference file.
- --out-file
Path of output text pileup. If this option is not used, it will write to standard output (default: None)
- --num-threads
Number of threads for worker (default: 1)
- --min-mapq
Skip alignments with mapping quality smaller than this value (default: 0)
- --disable-baq
Disable BAQ (per-Base Alignment Quality) (default: None)
- --anomalous-reads
Do not discard anomalous read pairs (default: None)
BCFTOOLS MPILEUP¶
Accelerated mpileup functionality from bcftools
QUICK START¶
$ pbrun bcftoolsmpileup --in-bam wgs.bam --out-file pileup.vcf
COMPATIBLE BCFTOOLS COMMAND¶
The command below is the bcftools counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
bcftools mpileup wgs.bam -o pileup.txt -d 2147483647
OPTIONS¶
- --in-bam
(required) Path to the input bam file.
- --ref
Path to the reference file.
- --out-file
Path of output text pileup. If this option is not used, it will write to standard output. By default, the output will be uncompressed VCF. To output uncompressed BCF, use the
--bcf
option (default: None)- --num-threads
Number of threads for worker (default: 1)
- --min-mapq
Skip alignments with mapping quality smaller than this value (default: 0)
- --disable-baq
Disable BAQ (per-Base Alignment Quality) (default: None)
- --anomalous-reads
Do not discard anomalous read pairs (default: None)
- --bcf
Output uncompressed BCF (default: None)
- --no-reference
Do not require fasta reference file (default: None)
- --no-version
Do not append version and command line to the header (default: None)
BAMMETRICS¶
Accelerated CollectWGSMetrics functionality from GATK4
bammetrics collects whole genome sequencing metrics, similar to CollectWGSMetrics from GATK4, but in a highly accelerated manner. The output metrics match exactly with that of GATK4.
QUICK START¶
$ pbrun bammetrics --ref Ref.fa --bam wgs.bam --out-metrics-file metrics.txt
COMPATIBLE GATK4 COMMAND¶
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
gatk CollectWGSMetrics -R Ref.fa -I wgs.bam -O metrics.txt
OPTIONS¶
- --ref
(required) Path to the reference file.
- --bam
(required) Path to the input bam file.
- --out-metrics-file
(required) Output metrics file.
- --minimum-base-quality
Minimum base quality for a base to contribute coverage (default: 20)
- --minimum-mapping-quality
Minimum mapping quality for a read to contribute coverage (default: 20)
- --count-unpaired
If true, count unpaired reads, and paired reads with one end unmapped (default: None)
- --coverage-cap
Treat positions with coverage exceeding this value as if they had coverage at this value (but calculate the difference for PCT_EXC_CAPPED) (default: 250)
- --num-threads
Defaults to 12.
Number of parallel threads to use.
COLLECTMULTIPLEMETRICS¶
Accelerated CollectMultipleMetrics from GATK4
collectmultiplemetrics collects whole genome sequencing metrics, similar to CollectMultipleMetrics from GATK4, but in a highly accelerated manner. The output metrics match exactly with that of GATK4.
QUICK START¶
CLI
$ pbrun collectmultiplemetrics --ref Ref.fa \
--bam wgs.bam \
--out-all-metrics metrics
COMPATIBLE GATK4 COMMAND¶
The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.
gatk CollectMultipleMetrics --REFERENCE_SEQUENCE Ref.fa -I wgs.bam -O metrics \
--PROGRAM CollectAlignmentSummaryMetrics \
--PROGRAM CollectInsertSizeMetrics \
--PROGRAM QualityScoreDistribution \
--PROGRAM MeanQualityByCycle \
--PROGRAM CollectBaseDistributionByCycle \
--PROGRAM CollectGcBiasMetrics \
--PROGRAM CollectSequencingArtifactMetrics \
--PROGRAM CollectQualityYieldMetrics
OPTIONS¶
- --ref
(required) Path to the reference file.
- --bam
(required) Path to the input bam file.
- --out-all-metrics
Automatically run every analysis. The output file of each analysis will start with this prefix name. This is required if no individual metrics are specified.
- --out-alignment
Output file for alignment summary metric. This is not required if
--out-all-metrics
is specified.- --out-quality-score
Output file for quality score distribution metric. This is not required if
--out-all-metrics
is specified.- --out-insert-size
Output file for insert size metric. This is not required if
--out-all-metrics
is specified.- --out-mean-quality-by-cycle
Output file for mean quality by cycle metric. This is not required if
--out-all-metrics
is specified.- --out-base-distribution-by-cycle
Output file for base distribution by cycle metric. This is not required if
--out-all-metrics
is specified.- --out-gc-bias
Prefix name used to generate detail and summary files for gc bias metric. This is not required if
--out-all-metrics
is specified.- --out-seq-artifact
Output file for sequencing artifact metric. This is not required if
--out-all-metrics
is specified.- --out-quality-yield
Output file for quality yield metric. This is not required if
--out-all-metrics
is specified.- --processor-threads
Defaults to 8.
Number of threads for processing (up to 20 threads shows increasing performance).
- --bam-decompressor-threads
Defaults to 3.
Number of threads for bam decompression.