NVIDIA Docs Hub Homepage NVIDIA Clara mutectcaller

mutectcaller

This tool is an accelerated version of the GATK somatic variant caller, Mutect2, which takes aligned BAMs from the FQ2BAM tool, and outputs a VCF file. This can take as input either a single (“tumor-only”) BAM, or a pair of BAMs (“tumor-normal”) to provide a baseline to call somatic variants against.

The figure below shows the high-level functionality of mutectcaller. All dotted boxes indicate optional data, with some constraints.

The names of the tumor sample (for the --tumor-name option) and the normal sample (for the --normal-name option) can be extracted from the headers of their respective BAM files with this command:

Copy
Copied!

            
            $ samtools view NA12878.bam -H | grep '@RG'
@RG ID:HJYFJ.4  SM:NA12878  LB:Pond-492093  PL:illumina PU:HJYFJCCXX160204.4.GCCGCAAC   CN:BI   DT:2016-02-04T00:00:00-0500

The sample name is the value after "SM:" (NA12878, in this example)

If there are multiple read group (@RG) lines and all of them have the same sample name you may safely use the common sample name. If there are multiple read group lines with multiple sample names, choose one sample name as the input. All reads with that sample name will be processed by mutectcaller and all other reads will be ignored. Currently only one sample name per BAM file is supported.

If there are no read group lines in the BAM header, or there is no sample name in the read group line, you will need to add read group information to the BAM file. This may be done by running this command:

Copy
Copied!

            
            $ samtools addreplacerg \
    -r "@RG\tID:sample_rg1\tLB:lib1\tPL:bar\tSM:sample_sm\tPU:sample_rg1" \
    original_file.bam \
    -o updated_file.bam \
    -O BAM

This will update the sample name of all reads in this BAM file to "sample_sm", and you can pass "sample_sm" as the sample name of this BAM file. Make sure you use the updated_file.bam as input to mutectcaller.

Quick Start

Copy
Copied!

            
            # This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.2.1-1 \
    pbrun mutectcaller \
    --ref /workdir/${REFERENCE_FILE} \
    --tumor-name tumor_name_inside_bam_file \
    --in-tumor-bam /workdir/${INPUT_TUMOR_BAM} \
    --in-normal-bam /workdir/${INPUT_NORMAL_BAM} \
    --normal-name normal_name_inside_bam_file \
    --out-vcf /outputdir/${OUTPUT_VCF}

Compatible GATK4 Command

The command below is the GATK4 counterpart of the Parabricks command above. The output from this command will be identical to the output from the above command. See the Output Comparison page for comparing the results.

Copy
Copied!

            
            $ gatk Mutect2 \
    -R <INPUT_DIR>/${REFERENCE_FILE} \
    --input <INPUT_DIR>/${INPUT_TUMOR_BAM} \
    --tumor-sample tumor_name_inside_bam_file \
    --input <INPUT_DIR>/${INPUT_NORMAL_BAM} \
    --normal-sample normal_name_inside_bam_file \
    --output <OUTPUT_DIR>/${OUTPUT_VCF}

Mutect2 with Panel of Normals

Parabricks Mutect2 from version 3.7.0-1 has started supporting Panel of Normals to process variants. There are three steps involved:

prepon
running mutectcaller with the index generated by prepon
postpon, updating the vcf with pon information

Copy
Copied!

            
            # The first command will generate input.pon that should be done once for the input.vcf.gz
# This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.2.1-1 \
    pbrun prepon --in-pon-file /workdir/${INPUT_PON_VCF}

# Run mutectcaller with the pon index
# This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.2.1-1 \
    pbrun mutectcaller \
    --ref /workdir/${REFERENCE_FILE} \
    --tumor-name tumor \
    --in-tumor-bam /workdir/${INPUT_TUMOR_BAM} \
    --in-normal-bam /workdir/${INPUT_NORMAL_BAM} \
    --pon /workdir/${INPUT_PON_VCF} \
    --normal-name normal \
    --out-vcf /outputdir/${OUTPUT_VCF}

# Add the annotation to the output.vcf generated above
# This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.2.1-1 \
    pbrun postpon \
  --in-vcf /workdir/${OUTPUT_VCF} \
  --in-pon-file /workdir/${INPUT_PON_FILE} \
  --out-vcf /outputdir/${OUTPUT_ANNOTATED_VCF}

mutectcaller Reference

Run GPU mutect2 to convert BAM/CRAM to vcf

Input/Output file options

--ref REF
--out-vcf OUT_VCF
--in-tumor-bam IN_TUMOR_BAM
--in-normal-bam IN_NORMAL_BAM
--in-tumor-recal-file IN_TUMOR_RECAL_FILE
--in-normal-recal-file IN_NORMAL_RECAL_FILE
--interval-file INTERVAL_FILE
--mutect-bam-output MUTECT_BAM_OUTPUT
--pon PON

Tool Options:

--max-mnp-distance MAX_MNP_DISTANCE
--mutectcaller-options MUTECTCALLER_OPTIONS
--initial-tumor-lod INITIAL_TUMOR_LOD
--tumor-lod-to-emit TUMOR_LOD_TO_EMIT
--pruning-lod-threshold PRUNING_LOD_THRESHOLD
--active-probability-threshold ACTIVE_PROBABILITY_THRESHOLD
--no-alt-contigs
--genotype-germline-sites
--genotype-pon-sites
--tumor-name TUMOR_NAME
--normal-name NORMAL_NAME
-L INTERVAL, --interval INTERVAL
-ip INTERVAL_PADDING, --interval-padding INTERVAL_PADDING

Performance Options:

--mutect-low-memory
--run-partition
--gpu-num-per-partition GPU_NUM_PER_PARTITION
--num-htvc-threads NUM_HTVC_THREADS

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--no-seccomp-override
--version

GPU options:

--num-gpus NUM_GPUS