FASTQ AND BAM PROCESSING OVERVIEW

NVIDIA Clara Parabricks Pipelines tools that can process fastq files and refine bam files

Here are the articles in this section:

FQ2BAM

Generate BAM output given one or more pairs of fastq files. Optionally generate BQSR report.

fq2bam performs the following steps. User can decide to turn-off marking of duplicates. BQSR step is only performed if –knownSites input and –out-recal-file options are provided.

QUICK START

Copy
Copied!

            
            $ pbrun fq2bam --ref Ref/Homo_sapiens_assembly38.fasta \
--in-fq Data/sample_1.fq.gz Data/sample_2.fq.gz  \
--knownSites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--out-bam mark_dups_gpu.bam \
--out-recal-file recal_gpu.txt \
--tmp-dir /raid/myrun

COMPATIBLE CPU BASED BWA-MEM, GATK4 COMMANDS

The command below is the bwa-0.7.15 and GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command. Please look at Output Comparison page on how you can compare the results.

Copy
Copied!

            
            # Run bwa-mem and pipe output to create sorted bam
$ bwa mem -t 32 -K 10000000 -R '@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1' \
Ref/Homo_sapiens_assembly38.fasta Data/sample_1.fq.gz Data/sample_2.fq.gz | gatk \
SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=/dev/stdin \
-O=cpu.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun

# Mark Duplicates
$ gatk MarkDuplicates --java-options -Xmx30g -I=cpu.bam -O=mark_dups_cpu.bam \
-M=metrics.txt --TMP_DIR=/raid/myrun

# Generate BQSR Report
$ gatk BaseRecalibrator --java-options -Xmx30g --input mark_dups_cpu.bam --output \
recal_cpu.txt --known-sites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--reference Ref/Homo_sapiens_assembly38.fasta

OPTIONS

--ref
--in-fq
--in-se-fq
--out-bam
--out-recal-file
--out-duplicate-metrics
--mba
--in-mba-file
--knownSites
--interval-file
--interval
--interval-padding
--no-markdups
--bwa-options
--markdups-assume-sortorder-queryname
--optical-duplicate-pixel-distance
--read-group-sm
--read-group-lb
--read-group-pl
--read-group-id-prefix
--tmp-dir
--num-gpus
--gpu-devices

BQSR

bqsr performs the Base Quality Score Recalibration (BQSR) in a stand alone fashion.

QUICK START

Copy
Copied!

            
            $ pbrun bqsr --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--knownSites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--out-recal-file recal_gpu.txt \

COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!

            
            $ gatk BaseRecalibrator --java-options -Xmx30g --input mark_dups_gpu.bam --output \
recal_cpu.txt --known-sites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--reference Ref/Homo_sapiens_assembly38.fasta

OPTIONS

--ref
--in-bam
--knownSites
--interval-file
--interval
--interval-padding
--out-recal-file
--num-gpus
--gpu-devices

APPLYBQSR

applybqsr updates the Base Quality Scores using the BQSR report.

QUICK START-CLI

Copy
Copied!

            
            $ pbrun applybqsr --ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam \
--in-recal-file recal_gpu.txt  \
--out-bam S1_updated.bam \

COMPATIBLE GATK4 COMMAND

The command below is the GATK4 counterpart of the Parabricks command above. The output from these commands will generate the exact same results as the output from the above command.

Copy
Copied!

            
            $ gatk ApplyBQSR --java-options -Xmx30g -R Ref/Homo_sapiens_assembly38.fasta \
-I=mark_dups_gpu.bam --bqsr-recal-file=recal_cpu.txt  -O=S1_updated.bam

OPTIONS

--ref
--in-bam
--in-recal-file
--out-bam
--interval-file
--interval
--interval-padding
--num-threads
--num-gpus
--gpu-devices