NVIDIA Docs Hub Homepage NVIDIA Clara Clara Parabricks v3.8.0 fq2bam

fq2bam

Generate BAM/CRAM output given one or more pairs of FASTQ files. Can also optionally generate a BQSR report.

fq2bam performs the following steps. The user can decide to turn-off marking of duplicates. The BQSR step is only performed if the --knownSites input and --out-recal-file output options are provided.

Quick Start

Copy
Copied!

            
            $ pbrun fq2bam \
    --ref Ref/Homo_sapiens_assembly38.fasta \
    --in-fq Data/sample_1.fq.gz Data/sample_2.fq.gz  \
    --knownSites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
    --out-bam mark_dups_gpu.bam \
    --out-recal-file recal_gpu.txt \
    --tmp-dir /raid/myrun

Compatible CPU-based BWA-MEM, GATK4 Commands

The commands below are the bwa-0.7.15 and GATK4 counterpart of the Parabricks command above. The output from these commands will be identical to the output from the above command. See the Output Comparison page for comparing the results.

Copy
Copied!

            
            # Run bwa-mem and pipe output to create sorted BAM
$ bwa mem -t 32 -K 10000000 -R '@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1' \
Ref/Homo_sapiens_assembly38.fasta Data/sample_1.fq.gz Data/sample_2.fq.gz | gatk \
SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=/dev/stdin \
-O=cpu.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun

# Mark Duplicates
$ gatk MarkDuplicates --java-options -Xmx30g -I=cpu.bam -O=mark_dups_cpu.bam \
-M=metrics.txt --TMP_DIR=/raid/myrun

# Generate BQSR Report
$ gatk BaseRecalibrator --java-options -Xmx30g --input mark_dups_cpu.bam --output \
recal_cpu.txt --known-sites Ref/Homo_sapiens_assembly38.known_indels.vcf.gz \
--reference Ref/Homo_sapiens_assembly38.fasta

fq2bam Reference

Run GPU-bwa mem, co-ordinate sorting, marking duplicates and Base Quality Score Recalibration to convert FASTQ to BAM/CRAM.

Input/Output file options

--ref REF
--in-fq [IN_FQ [IN_FQ ...]]
--in-se-fq [IN_SE_FQ [IN_SE_FQ ...]]
--in-fq-list IN_FQ_LIST
--knownSites KNOWNSITES
--interval-file INTERVAL_FILE
--out-recal-file OUT_RECAL_FILE
--out-bam OUT_BAM
--out-duplicate-metrics OUT_DUPLICATE_METRICS
--out-qc-metrics-dir OUT_QC_METRICS_DIR

Tool Options:

-L INTERVAL, --interval INTERVAL
--bwa-options BWA_OPTIONS
--no-warnings
--no-markdups
--fix-mate
--markdups-assume-sortorder-queryname
--markdups-picard-version-2182
--optical-duplicate-pixel-distance OPTICAL_DUPLICATE_PIXEL_DISTANCE
--read-group-sm READ_GROUP_SM
--read-group-lb READ_GROUP_LB
--read-group-pl READ_GROUP_PL
--read-group-id-prefix READ_GROUP_ID_PREFIX
-ip INTERVAL_PADDING, --interval-padding INTERVAL_PADDING

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--no-seccomp-override
--version

GPU options:

--num-gpus NUM_GPUS
--gpu-devices GPU_DEVICES

Note

The --in-fq option takes the names of two FASTQ files, optionally followed by a quoted read group. The FASTQ filenames must not start with a hyphen.