NVIDIA Docs Hub Homepage NVIDIA Clara Clara Parabricks v4.1.0 somatic (Somatic Variant Caller)

somatic (Somatic Variant Caller)

Run a somatic variant workflow.

The somatic tool processes the tumor FASTQ files, and optionally normal FASTQ files and knownSites files, and generates tumor or tumor/normal analysis. The output is in VCF format.

Internally the somatic tool runs several other Parabricks tools, thereby simplifying your work flow.

Quick Start

Copy
Copied!

            
            # The command line below will run tumor-only analysis.
    # This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
$ docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.1.0-1 \
    pbrun somatic \
    --ref /workdir/${REFERENCE_FILE} \
    --in-tumor-fq /workdir/${INPUT_FASTQ_1} /workdir/${INPUT_FASTQ_2} \
    --bwa-options="-Y" \
    --out-vcf /outputdir/${OUTPUT_VCF} \
    --out-tumor-bam /outputdir/${OUTPUT_BAM}

# The command line below will run tumor-normal analysis.
    # This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
$ docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.1.0-1 \
    pbrun somatic \
    --ref /workdir/${REFERENCE_FILE} \
    --knownSites /workdir/${KNOWN_SITES_FILE} \
    --in-tumor-fq /workdir/${INPUT_TUMOR_FASTQ_1} /workdir/${INPUT_TUMOR_FASTQ_2} "@RG\tID:sm_tumor_rg1\tLB:lib1\tPL:bar\tSM:sm_tumor\tPU:sm_tumor_rg1" \
    --bwa-options="-Y" \
    --out-vcf /outputdir/${OUTPUT_VCF} \
    --out-tumor-bam /outputdir/${OUTPUT_TUMOR_BAM} \
    --out-tumor-recal-file /outputdir/${OUTPUT_RECAL_FILE} \
    --in-normal-fq /workdir/${INPUT_NORMAL_FASTQ_1} /workdir/${INPUT_NORMAL_FASTQ_2} "@RG\tID:sm_normal_rg1\tLB:lib1\tPL:bar\tSM:sm_normal\tPU:sm_normal_rg1" \
    --out-normal-bam /outputdir/${OUTPUT_NORMAL_BAM}

Compatible CPU Command

Copy
Copied!

            
            # The commands below will run tumor-normal analysis.
#
# Run bwa mem on the tumor FASTQ files then sort the BAM by coordinates.
    $ bwa mem \
    -t 32 \
    -K 10000000 \
    -Y \
    -R '@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1' \
    ${REFERENCE_FILE} ${TUMOR_FASTQ_1} ${TUMOR_FASTQ_2} | \
  gatk SortSam \
    --java-options -Xmx30g \
    --MAX_RECORDS_IN_RAM 5000000 \
    -I /dev/stdin \
    -O tumor_cpu.bam \
    --SORT_ORDER coordinate

# Mark duplicates.
$ gatk MarkDuplicates \
    --java-options -Xmx30g \
    -I tumor_cpu.bam \
    -O tumor_mark_dups_cpu.bam \
    -M tumor_metrics.txt

# Generate a BQSR report.
$ gatk BaseRecalibrator \
    --java-options -Xmx30g \
    --input tumor_mark_dups_cpu.bam \
    --output ${OUTPUT_TUMOR_RECAL_FILE} \
    --known-sites ${KNOWN_SITES_FILE} \
    --reference ${REFERENCE_FILE}

# Apply the BQSR report.
$ gatk ApplyBQSR \
    --java-options -Xmx30g \
    -R ${REFERENCE_FILE} \
    -I tumor_cpu.bam \
    --bqsr-recal-file ${TUMOR_OUTPUT_RECAL_FILE} \
    -O ${OUTPUT_TUMOR_BAM}

# Now repeat all the above steps, only with the normal FASTQ data.
$ bwa mem \
    -t 32 \
    -K 10000000 \
    -Y \
    -R '@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample\tPU:sample_rg1' \
    ${REFERENCE_FILE} ${NORMAL_FASTQ_1} ${NORMAL_FASTQ_2} | \
  gatk SortSam \
    --java-options -Xmx30g \
    --MAX_RECORDS_IN_RAM 5000000 \
    -I /dev/stdin \
    -O normal_cpu.bam \
    --SORT_ORDER coordinate

# Mark duplicates.
$ gatk MarkDuplicates \
    --java-options -Xmx30g \
    -I normal_cpu.bam \
    -O normal_mark_dups_cpu.bam \
    -M normal_metrics.txt

# Generate a BQSR report.
$ gatk BaseRecalibrator \
    --java-options -Xmx30g \
    --input normal_mark_dups_cpu.bam \
    --output ${OUTPUT_NORMAL_RECAL_FILE} \
    --known-sites ${KNOWN_SITES_FILE} \
    --reference ${REFERENCE_FILE}

# Apply the BQSR report.
$ gatk ApplyBQSR \
    --java-options -Xmx30g \
    -R ${REFERENCE_FILE} \
    -I normal_cpu.bam \
    --bqsr-recal-file ${OUTPUT_NORMAL_RECAL_FILE} \
    -O ${OUTPUT_NORMAL_BAM}

# Finally, run Mutect2 on the normal and tumor data.
$ gatk Mutect2 \
    -R ${REFERENCE_FILE} \
    --input ${OUTPUT_TUMOR_BAM} \
    --tumor-sample tumor \
    --input ${OUTPUT_NORMAL_BAM} \
    --normal-sample normal \
    --output ${OUTPUT_VCF}

somatic Reference

Run the tumor normal somatic pipeline from FASTQ to VCF.

Input/Output file options

--ref REF
--in-tumor-fq [IN_TUMOR_FQ [IN_TUMOR_FQ ...]]
--in-se-tumor-fq [IN_SE_TUMOR_FQ [IN_SE_TUMOR_FQ ...]]
--in-normal-fq [IN_NORMAL_FQ [IN_NORMAL_FQ ...]]
--in-se-normal-fq [IN_SE_NORMAL_FQ [IN_SE_NORMAL_FQ ...]]
--knownSites KNOWNSITES
--interval-file INTERVAL_FILE
--out-vcf OUT_VCF
--out-tumor-bam OUT_TUMOR_BAM
--out-normal-bam OUT_NORMAL_BAM
--out-tumor-recal-file OUT_TUMOR_RECAL_FILE
--out-normal-recal-file OUT_NORMAL_RECAL_FILE

Tool Options:

-L INTERVAL, --interval INTERVAL
--bwa-options BWA_OPTIONS
--no-warnings
--gpuwrite
--gpusort
--low-memory
--filter-flag FILTER_FLAG
--skip-multiple-hits
--min-read-length MIN_READ_LENGTH
--align-only
--no-markdups
--fix-mate
--markdups-assume-sortorder-queryname
--markdups-picard-version-2182
--optical-duplicate-pixel-distance OPTICAL_DUPLICATE_PIXEL_DISTANCE
-ip INTERVAL_PADDING, --interval-padding INTERVAL_PADDING
--ploidy PLOIDY
--max-mnp-distance MAX_MNP_DISTANCE
--mutectcaller-options MUTECTCALLER_OPTIONS
--tumor-read-group-sm TUMOR_READ_GROUP_SM
--tumor-read-group-lb TUMOR_READ_GROUP_LB
--tumor-read-group-pl TUMOR_READ_GROUP_PL
--tumor-read-group-id-prefix TUMOR_READ_GROUP_ID_PREFIX
--normal-read-group-sm NORMAL_READ_GROUP_SM
--normal-read-group-lb NORMAL_READ_GROUP_LB
--normal-read-group-pl NORMAL_READ_GROUP_PL
--normal-read-group-id-prefix NORMAL_READ_GROUP_ID_PREFIX

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--no-seccomp-override
--version

GPU options:

--num-gpus NUM_GPUS