Run GATK best practices for RNAseq short variant discovery (SNPs + Indels)
The RNA GATK pipeline process the input fastq files. The output is in VCF format.

CLI
# The commandline below will run RNA GATK pipeline.
$ pbrun rna_gatk --ref Ref/Homo_sapiens_assembly38.fasta \
--in-fq Data/sample_1.fq.gz Data/sample_2.fq.gz \
--read-files-command zcat \
--genome-lib-dir Ref/ \
--out-variants output.vcf \
--out-bam tumor.bam \
--output-dir output
- --ref
Path to the reference file (default: None)
- --in-fq
Path to the pair ended fastq files followed by optional read group with quotes (Example: “@RGtID:footLB:lib1tPL:bartSM:sampletPU:foo”). Files can be in fastq or fastq.gz format or a google cloud storage object. If no read group is provided, one will be automatically added by the pipeline. Example 1: –in-fq sampleX_1_1.fastq.gz sampleX_1_2.fastq.gz . Example 2: –in-fq sampleX_1_1.fastq.gz sampleX_1_2.fastq.gz “@RGtID:footLB:lib1tPL:bartSM:sampletPU:unit1” (default: None)
- --in-se-fq
Path to the single ended fastq file followed by optional read group with quotes (Example: “@RGtID:footLB:lib1tPL:bartSM:sampletPU:foo”). File can be in fastq or fastq.gz format or a google cloud storage object. Either all sets of inputs have read group or none should have it and will be automatically added by the pipeline. This option can be repeated multiple times. Example 1: –in-se-fq sampleX_1.fastq.gz –in-se-fq sampleX_2.fastq.gz . Example 2: –in-se-fq sampleX_1.fastq.gz “@RGtID:footLB:lib1tPL:bartSM:sampletPU:unit1” –in-se-fq sampleX_2.fastq.gz “@RGtID:foo2tLB:lib1tPL:bartSM:sampletPU:unit2” . For same sample, Read Groups should have same sample name (SM) and different ID and PU (default: None)
- --genome-lib-dir
Path to a genome resource library directory. We assume that the indexing required to run star has been completed by the user. (default: None)
- --knownSites
Path to a known indels file. Must be in vcf/vcf.gz format. This option can be used multiple times (default: None)
- --interval-file
Path to an interval file with possible formats: Picard-style (.interval_list or .picard), GATK-style (.list or .intervals), or BED file (.bed). This option can be used multiple times (default: None)
- --output-dir
Path to the directory that will contain all of the generated files (default: None)
- --out-recal-file
Path of report file after Base Quality Score Recalibration. (default: None)
- --read-files-command
Command line to execute for each of the input files. This command should generate FASTA or FASTQ text and send it to stdout. For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc. (default: None)
- --out-bam
Path of output BAM/CRAM file. (default: None)
- --out-variants
Path of vcf/g.vcf/gvcf.gz file after variant calling. (default: None)
- --num-cpu-threads
Number of CPU threads to traverse separate chromosomes in splitncigar (default: 4)
- --no-ignore-mark
Do not ignore marked reads in sorted output (default: None)
- --num-threads
Number of running worker threads per GPU (default: 4)
- --out-prefix
Prefix filename for output data (default: None)
- --read-group-sm
SM tag for read groups in this run (default: None)
- --read-group-lb
LB tag for read groups in this run (default: None)
- --read-group-pl
PL tag for read groups in this run (default: None)
- --read-group-id-prefix
prefix for ID and PU tag for read groups in this run. This prefix will be used for all pair of fastq files in this run. The ID and PU tag will consist of this prefix and an identifier which will be unique for a pair of fastq files (default: None)
- --two-pass-mode
2-pass mapping mode. The string can be “None” for 1-pass mapping or “Basic” for basic 2-pass mapping with all 1st pass junctions inserted into the genome indices on the fly (default: Basic)
- --read-length
Input read length used to determine sjdbOverhang (default: None)
- --haplotypecaller-options
Pass supported haplotype caller options as one string. Currently supported original haplotypecaller options: -min-pruning <int>, -standard-min-confidence-threshold-for-calling <int>, -max-reads-per-alignment-start <int>, -min-dangling-branch-length <int>, -pcr-indel-model <NONE, HOSTILE, AGGRESSIVE, CONSERVATIVE>. e.g. –haplotypecaller-options=”-min-pruning 4 -standard-min-confidence-threshold-for-calling 30” (default: None)
- --static-quantized-quals
Use static quantized quality scores to a given number of levels. Repeat this option multiple times for multiple bins (default: None)
- --gvcf
Generate variant calls in gVCF format (default: None)
- --batch
Given an input list of BAMs, run the variant calling of each BAM using one GPU, and process BAMs in parallel based on how many GPUs the system has (default: None)
- --disable-read-filter
Disable the read filters for bam entries. Currently supported read filters that can be disabled: MappingQualityAvailableReadFilter, MappingQualityReadFilter, NotSecondaryAlignmentReadFilter, WellformedReadFilter (default: None)
- --max-alternate-alleles
Maximum number of alternate alleles to genotype (default: None)
- -G, --annotation-group
Which groups of annotations to add to the output variant calls. Currently supported annotation groups: StandardAnnotation, StandardHCAnnotation, AS_StandardAnnotation (default: None)
- -G<var>QB</var>, --gvcf-gq-bands
Exclusive upper bounds for reference confidence GQ bands. Must be in the range [1, 100] and specified in increasing order (default: None)
- --rna
Run haplotypecaller optimized for RNA Data (default: None)
- --dont-use-soft-clipped-bases
Dont use soft clipped bases for variant calling (default: None)
- --ploidy
Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported (default: 2)
- -L, --interval
Interval within which to call the variants from the bam/cram file. All intervals will have a padding of 100 to get read records and overlapping intervals will be combined. Interval files should be passed using the –interval-file option. This option can be used multiple times. e.g. “-L chr1 -L chr2:10000 -L chr3:20000+ -L chr4:10000-20000” (default: None)
- -i<var>p</var>, --interval-padding
Amount of padding (in base pairs) to add to each interval you are including (default: None)
- --dont-use-soft-clipped-bases
Dont use soft clipped bases for variant calling.
- --ploidy <var>PLOIDY</var>
Ploidy assumed for the bam file. Currently only haploid (ploidy 1) and diploid (ploidy 2) are supported (default: 2)
- --num-gpus <var>NUM_GPUS</var>
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used. If you are using flexera, please include –gpu-devices too.
- --gpu-devices <var>GPU_DEVICES</var>
Which GPU devices to use for a run. By default, all GPU devices will be used. To use specific GPU devices enter a comma-separated list of GPU device numbers. Possible device numbers can be found by examining the output of the nvidia-smi command. For example, using –gpu-devices 0,1 would only use the first two GPUs.
- --tmp-dir <var>TMP_DIR</var>
Full path to the directory where temporary files will be stored.
- --no-seccomp-override
Do not override seccomp options for docker
- --with-petagene-dir <var>WITH_PETAGENE_DIR</var>
Full path to the PetaGene installation directory where bin/ and species/ folders are located.
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file <var>LICENSE_FILE</var>
Path to license file license.bin if not in installation directory.
- --version
View compatible software versions.