Run a GPU-accelerated version of GATK’s CollectMultipleMetrics.
<br>
This tool applies an accelerated version of the GATK CollectMultipleMetrics for assessing BAM file metrics such as alignment success, quality score distributions, GC bias, and sequencing artifacts. This functions as a ‘meta-metrics’ tool that can run any combination of the available metrics tools in GATK to perform an overall assessment of how well a sequencing run has been performed. The available metrics tools (PROGRAMs) can be found in the reference section below.
# This command assumes all the inputs are in <INPUT_DIR> and all the outputs go to <OUTPUT_DIR>.
$ docker run --rm --gpus all --volume <INPUT_DIR>:/workdir --volume <OUTPUT_DIR>:/outputdir
-w /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
pbrun collectmultiplemetrics \
--ref /workdir/${REFERENCE_FILE} \
--bam /workdir/${INPUT_BAM} \
--out-qc-metrics-dir /outputdir/${OUTPUT_DIR}\
--gen-all-metrics
The command below is the GATK4 counterpart of the Parabricks command above. The output from this command will be identical to the output from the above command.
gatk CollectMultipleMetrics \
--REFERENCE_SEQUENCE <INPUT_DIR>/${REFERENCE_FILE} \
-I <INPUT_DIR>/${INPUT_BAM} \
-O <OUTPUT_DIR>/${OUTPUT_DIR} \
--PROGRAM CollectAlignmentSummaryMetrics \
--PROGRAM CollectInsertSizeMetrics \
--PROGRAM QualityScoreDistribution \
--PROGRAM MeanQualityByCycle \
--PROGRAM CollectBaseDistributionByCycle \
--PROGRAM CollectGcBiasMetrics \
--PROGRAM CollectSequencingArtifactMetrics \
--PROGRAM CollectQualityYieldMetrics
Run collectmultiplemetrics on a BAM file to generate files for multiple classes of metrics.
Input/Output file options
- --ref REF
-
Path to the reference file. (default: None)
Option is required.
- --bam BAM
-
Path to the BAM file. (default: None)
Option is required.
- --out-qc-metrics-dir OUT_QC_METRICS_DIR
-
Output Directory to store results of each analysis.
(default: None)
Option is required.
Tool Options:
- --bam-decompressor-threads BAM_DECOMPRESSOR_THREADS
-
Number of threads for BAM decompression. (default: 3)
- --gen-all-metrics
-
Generate QC for every analysis. (default: None)
- --gen-alignment
-
Generate QC for alignment summary metric. (default: None)
- --gen-quality-score
-
Generate QC for quality score distribution metric. (default: None)
- --gen-insert-size
-
Generate QC for insert size metric. (default: None)
- --gen-mean-quality-by-cycle
-
Generate QC for mean quality by cycle metric. (default: None)
- --gen-base-distribution-by-cycle
-
Generate QC for base distribution by cycle metric. (default: None)
- --gen-gc-bias
-
Prefix name used to generate detail and summary files for gc bias metric. (default: None)
- --gen-seq-artifact
-
Generate QC for sequencing artifact metric. (default: None)
- --gen-quality-yield
-
Generate QC for quality yield metric. (default: None)
Common options:
- --logfile LOGFILE
-
Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)
- --tmp-dir TMP_DIR
-
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
-
Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)
- --keep-tmp
-
Do not delete the directory storing temporary files after completion.
- --no-seccomp-override
-
Do not override seccomp options for docker (default: None).
- --version
-
View compatible software versions.
GPU options:
- --num-gpus NUM_GPUS
-
Number of GPUs to use for a run. GPUs 0..(NUM_GPUS-1) will be used.