cnvkit¶
CPU accelerated Copy number variant calling. You need to pass "--extra-tools" to the installer to use this tool.
Run CNVkit with accelerated coverage calculation from read depths.
Quick Start¶
$ pbrun cnvkit \
--sub-command batch \
--ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam
--batch-output-dir outputFolder
Compatible Baseline Command¶
The command below is the baseline CNVkit counterpart of the Parabricks command above. The output from this command will be identical to the output from the above command.
$ cnvkit.py batch mark_dups_gpu.bam --fasta Ref/Homo_sapiens_assembly38.fasta \
--output-dir outputFolder -m wgs -n -p
cnvkit Reference¶
Accelerated CNVkit. Currently we support three CNVkit sub-commands: batch, autobin and coverage. The help below is divided into a section for options common to all subcommands, followed by options for the batch, autobin and coverage subcommands.
Input/Output file options¶
- --ref REF
Path to the reference file. (default: None)
- --in-bam IN_BAM
Path to the BAM file. (default: None)
Option is required.
Options specific to this tool¶
(none)
Common Tool Options:¶
- --sub-command SUB_COMMAND
The sub-command to call in CNVkit tool. Value can be one of [batch, autobin, coverage]. (default: None)
Batch Input Output file options:¶
- --batch-output-dir BATCH_OUTPUT_DIR
Path to the directory that will contain all of the generated files. (default: None)
- --batch-access BATCH_ACCESS
Regions of accessible sequence on chromosomes (.bed) (default: None)
- --batch-targets BATCH_TARGETS
Target intervals (.bed or .list) (default: None)
- --batch-annotate BATCH_ANNOTATE
Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar. (default: None)
- --batch-cnn-reference BATCH_CNN_REFERENCE
Copy number reference file (.cnn), to reuse an existing reference (default: None)
Options for the batch sub-command:¶
- --generate-vcf
Export the output CNS to VCF after running batch. (default: None)
- --batch-cnvkit-options BATCH_CNVKIT_OPTIONS
Pass supported batch CNVkit options as one string (e.g. --batch-cnvkit-options="--count-reads --drop-low-coverage --target-avg-size 5000 ") (default: None)
Autobin Input Output file options:¶
- --autobin-access AUTOBIN_ACCESS
Sequencing-accessible genomic regions, or exons to use as possible targets. (default: None)
- --autobin-targets AUTOBIN_TARGETS
Potentially targeted genomic regions, e.g. all possible exons for the reference genome. Format: BED, interval list, etc. (default: None)
- --autobin-annotate AUTOBIN_ANNOTATE
Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar. (default: None)
- --target-output-bed TARGET_OUTPUT_BED
Filename for target BED output. (default: None)
- --antitarget-output-bed ANTITARGET_OUTPUT_BED
Filename for antitarget BED output. (default: None)
Options for autobin sub-command:¶
- --bp-per-bin BP_PER_BIN
Desired average number of sequencing read bases mapped to each bin. (default: 100000.0)
- --target-max-size TARGET_MAX_SIZE
Maximum size of target bins. (default: 20000)
- --target-min-size TARGET_MIN_SIZE
Minimum size of target bins. (default: 20)
- --antitarget-max-size ANTITARGET_MAX_SIZE
Maximum size of antitarget bins. (default: 500000)
- --antitarget-min-size ANTITARGET_MIN_SIZE
Minimum size of antitarget bins. (default: 500)
- --short-names
Reduce multi-accession bait labels to be short and consistent. (default: None)
Coverage Input Output file options:¶
- --coverage-output COVERAGE_OUTPUT
Output file name of coverage. (default: None)
- --coverage-interval COVERAGE_INTERVAL
Input interval file name of coverage. (default: None)
Options for coverage sub-command:¶
- --count
Get read depths by counting read midpoints within each bin (an alternative algorithm). (default: None)
- --processes PROCESSES
Number of subprocesses to calculate coverage in parallel. (default: 4)
Common options:¶
- --logfile LOGFILE
Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)
- --tmp-dir TMP_DIR
Full path to the directory where temporary files will be stored.
- --with-petagene-dir WITH_PETAGENE_DIR
Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)
- --keep-tmp
Do not delete the directory storing temporary files after completion.
- --license-file LICENSE_FILE
Path to license file license.bin if not in the installation directory.
- --no-seccomp-override
Do not override seccomp options for docker (default: None).
- --version
View compatible software versions.