cnvkit

CPU accelerated Copy number variant calling. You need to pass "--extra-tools" to the installer to use this tool.

Run CNVkit with accelerated coverage calculation from read depths.

Quick Start

$ pbrun cnvkit \
    --sub-command batch \
    --ref Ref/Homo_sapiens_assembly38.fasta \
    --in-bam mark_dups_gpu.bam
    --batch-output-dir outputFolder

Compatible Baseline Command

The command below is the baseline CNVkit counterpart of the Parabricks command above. The output from this command will be identical to the output from the above command.

$ cnvkit.py batch mark_dups_gpu.bam --fasta Ref/Homo_sapiens_assembly38.fasta \
    --output-dir outputFolder -m wgs -n -p

cnvkit Reference

Accelerated CNVkit. Currently we support three CNVkit sub-commands: batch, autobin and coverage. The help below is divided into a section for options common to all subcommands, followed by options for the batch, autobin and coverage subcommands.

Input/Output file options

--ref REF

Path to the reference file. (default: None)

--in-bam IN_BAM

Path to the BAM file. (default: None)

Option is required.

Options specific to this tool

(none)

Common Tool Options:

--sub-command SUB_COMMAND

The sub-command to call in CNVkit tool. Value can be one of [batch, autobin, coverage]. (default: None)

Batch Input Output file options:

--batch-output-dir BATCH_OUTPUT_DIR

Path to the directory that will contain all of the generated files. (default: None)

--batch-access BATCH_ACCESS

Regions of accessible sequence on chromosomes (.bed) (default: None)

--batch-targets BATCH_TARGETS

Target intervals (.bed or .list) (default: None)

--batch-annotate BATCH_ANNOTATE

Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar. (default: None)

--batch-cnn-reference BATCH_CNN_REFERENCE

Copy number reference file (.cnn), to reuse an existing reference (default: None)

Options for the batch sub-command:

--generate-vcf

Export the output CNS to VCF after running batch. (default: None)

--batch-cnvkit-options BATCH_CNVKIT_OPTIONS

Pass supported batch CNVkit options as one string (e.g. --batch-cnvkit-options="--count-reads --drop-low-coverage --target-avg-size 5000 ") (default: None)

Autobin Input Output file options:

--autobin-access AUTOBIN_ACCESS

Sequencing-accessible genomic regions, or exons to use as possible targets. (default: None)

--autobin-targets AUTOBIN_TARGETS

Potentially targeted genomic regions, e.g. all possible exons for the reference genome. Format: BED, interval list, etc. (default: None)

--autobin-annotate AUTOBIN_ANNOTATE

Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar. (default: None)

--target-output-bed TARGET_OUTPUT_BED

Filename for target BED output. (default: None)

--antitarget-output-bed ANTITARGET_OUTPUT_BED

Filename for antitarget BED output. (default: None)

Options for autobin sub-command:

--bp-per-bin BP_PER_BIN

Desired average number of sequencing read bases mapped to each bin. (default: 100000.0)

--target-max-size TARGET_MAX_SIZE

Maximum size of target bins. (default: 20000)

--target-min-size TARGET_MIN_SIZE

Minimum size of target bins. (default: 20)

--antitarget-max-size ANTITARGET_MAX_SIZE

Maximum size of antitarget bins. (default: 500000)

--antitarget-min-size ANTITARGET_MIN_SIZE

Minimum size of antitarget bins. (default: 500)

--short-names

Reduce multi-accession bait labels to be short and consistent. (default: None)

Coverage Input Output file options:

--coverage-output COVERAGE_OUTPUT

Output file name of coverage. (default: None)

--coverage-interval COVERAGE_INTERVAL

Input interval file name of coverage. (default: None)

Options for coverage sub-command:

--count

Get read depths by counting read midpoints within each bin (an alternative algorithm). (default: None)

--processes PROCESSES

Number of subprocesses to calculate coverage in parallel. (default: 4)

Common options:

--logfile LOGFILE

Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)

--tmp-dir TMP_DIR

Full path to the directory where temporary files will be stored.

--with-petagene-dir WITH_PETAGENE_DIR

Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)

--keep-tmp

Do not delete the directory storing temporary files after completion.

--license-file LICENSE_FILE

Path to license file license.bin if not in the installation directory.

--no-seccomp-override

Do not override seccomp options for docker (default: None).

--version

View compatible software versions.