GERMLINE PIPELINE - NVIDIA Docs

Given one or more pairs of fastq files, you can run the germline variant pipeline workflow to generate output including variants, BAM, and recal.

The germline pipeline shown below resembles the GATK4 best practices pipeline. The inputs are BWA-indexed reference files, pair-ended fastq files and knownSites for BQSR calculation. The outputs of this pipeline are:

Aligned, co-ordinate sorted, duplicated marked bam
BQSR report
Variants in vcf/g.vcf/g.vcf.gz format

QUICK START

Run a germline pipeline:

Copy
Copied!

            
            $ pbrun germline --ref Ref/Homo_sapiens_assembly38.fasta \
                 --in-fq Data/sample_1.fq.gz Data/sample_2.fq.gz \
                 --knownSites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
                 --out-bam output.bam \
                 --out-variants output.vcf \
                 --out-recal-file report.txt

DNAnexus

Note

You’ll first need to install the application to your DNAnexus project. See our DNAnexus installation guide.

Copy
Copied!

            
            $ dx run pbgermline

COMPATIBLE CPU BASED BWA-MEM, GATK4 COMMANDS

The command below is the bwa-0.7.12 and GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            # Run bwa-mem and pipe output to create sorted bam
$ bwa mem -t 32 -K 10000000 -R '@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1' \
Ref/Homo_sapiens_assembly38.fasta S1_1.fastq.gz S1_2.fastq.gz | gatk \
SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=/dev/stdin \
-O=cpu.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun

# Mark Duplicates
$ gatk MarkDuplicates --java-options -Xmx30g -I=cpu.bam -O=mark_dups_cpu.bam \
-M=metrics.txt --TMP_DIR=/raid/myrun

# Generate BQSR Report
$ gatk BaseRecalibrator --java-options -Xmx30g --input mark_dups_cpu.bam --output \
recal_cpu.txt --known-sites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--reference Ref/Homo_sapiens_assembly38.fasta

# Run ApplyBQSR Step
$ gatk ApplyBQSR --java-options -Xmx30g -R Ref/Homo_sapiens_assembly38.fasta \
-I=mark_dups_cpu.bam --bqsr-recal-file=recal_file.txt -O=cpu_nodups_BQSR.bam

#Run Haplotype Caller
$ gatk HaplotypeCaller --java-options -Xmx30g --input cpu_nodups_BQSR.bam --output \
result_cpu.vcf --reference Ref/Homo_sapiens_assembly38.fasta \
--native-pair-hmm-threads 16

OPTIONS

--ref
--in-fq
--in-se-fq
--out-bam
--out-variants
--in-mba-file
--out-recal-file
--knownSites
--tmp-dir
--num-gpus
--no-markdups
--out-duplicate-metrics
--markdups-assume-sortorder-queryname
--optical-duplicate-pixel-distance
--mba
--read-group-sm
--read-group-lb
--read-group-pl
--read-group-id-prefix
--static-quantized-quals
--batch
--disable-read-filter
--ploidy
--interval-file
--interval
--interval-padding
--gvcf
--bwa-options
--haplotypecaller-options
--max-alternate-alleles
--annotation-group
--gvcf-gq-bands
--num-gpus
--gpu-devices

DNANEXUS

Option1: Use the DNAnexus command line dialog to run the workflow:

Copy
Copied!

            
            $ dx run pbgermline

Option 2: Specify your inputs via command line options:

Copy
Copied!

            
            $ dx run pbgermline -y \
                    --allow-ssh \
                    --destination=MY_OUTPUT_DIR \
                    -iref=MY_REF_FILE_ID \
                    -iknown_sites=MY_KNOWN_INDELS_FILE_ID \
                    -iin_fq=MY_FIRT_FQ_FILE_ID \
                    -iin_fq=MY_SECOND_FQ_FILE_ID