cnvkit
CPU accelerated Copy number variant calling. You need to pass "--extra-tools" to the installer to use this tool.
Run CNVkit with accelerated coverage calculation from read depths.
$ pbrun cnvkit \
--sub-command batch \
--ref Ref/Homo_sapiens_assembly38.fasta \
--in-bam mark_dups_gpu.bam
--batch-output-dir outputFolder
The command below is the baseline CNVkit counterpart of the Parabricks command above. The output from this command will be identical to the output from the above command.
$ cnvkit.py batch mark_dups_gpu.bam --fasta Ref/Homo_sapiens_assembly38.fasta \
--output-dir outputFolder -m wgs -n -p
Accelerated CNVkit. Currently we support three CNVkit sub-commands: batch, autobin and coverage. The help below is divided into a section for options common to all subcommands, followed by options for the batch, autobin and coverage subcommands.
Input/Output file options
- --ref REF
- --in-bam IN_BAM
Path to the reference file. (default: None)
Path to the BAM file. (default: None)
Option is required.
Options specific to this tool
(none)
Common Tool Options:
- --sub-command SUB_COMMAND
The sub-command to call in CNVkit tool. Value can be one of [batch, autobin, coverage]. (default: None)
Batch Input Output file options:
- --batch-output-dir BATCH_OUTPUT_DIR
- --batch-access BATCH_ACCESS
- --batch-targets BATCH_TARGETS
- --batch-annotate BATCH_ANNOTATE
- --batch-cnn-reference BATCH_CNN_REFERENCE
Path to the directory that will contain all of the generated files. (default: None)
Regions of accessible sequence on chromosomes (.bed) (default: None)
Target intervals (.bed or .list) (default: None)
Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar. (default: None)
Copy number reference file (.cnn), to reuse an existing reference (default: None)
Options for the batch sub-command:
- --generate-vcf
- --batch-cnvkit-options BATCH_CNVKIT_OPTIONS
Export the output CNS to VCF after running batch. (default: None)
Pass supported batch CNVkit options as one string (e.g. --batch-cnvkit-options="--count-reads --drop-low-coverage --target-avg-size 5000 ") (default: None)
Autobin Input Output file options:
- --autobin-access AUTOBIN_ACCESS
- --autobin-targets AUTOBIN_TARGETS
- --autobin-annotate AUTOBIN_ANNOTATE
- --target-output-bed TARGET_OUTPUT_BED
- --antitarget-output-bed ANTITARGET_OUTPUT_BED
Sequencing-accessible genomic regions, or exons to use as possible targets. (default: None)
Potentially targeted genomic regions, e.g. all possible exons for the reference genome. Format: BED, interval list, etc. (default: None)
Use gene models from this file to assign names to the target regions. Format: UCSC refFlat.txt or ensFlat.txt file (preferred), or BED, interval list, GFF, or similar. (default: None)
Filename for target BED output. (default: None)
Filename for antitarget BED output. (default: None)
Options for autobin sub-command:
- --bp-per-bin BP_PER_BIN
- --target-max-size TARGET_MAX_SIZE
- --target-min-size TARGET_MIN_SIZE
- --antitarget-max-size ANTITARGET_MAX_SIZE
- --antitarget-min-size ANTITARGET_MIN_SIZE
- --short-names
Desired average number of sequencing read bases mapped to each bin. (default: 100000.0)
Maximum size of target bins. (default: 20000)
Minimum size of target bins. (default: 20)
Maximum size of antitarget bins. (default: 500000)
Minimum size of antitarget bins. (default: 500)
Reduce multi-accession bait labels to be short and consistent. (default: None)
Coverage Input Output file options:
- --coverage-output COVERAGE_OUTPUT
- --coverage-interval COVERAGE_INTERVAL
Output file name of coverage. (default: None)
Input interval file name of coverage. (default: None)
Options for coverage sub-command:
- --count
- --processes PROCESSES
Get read depths by counting read midpoints within each bin (an alternative algorithm). (default: None)
Number of subprocesses to calculate coverage in parallel. (default: 4)
Common options:
- --logfile LOGFILE
- --tmp-dir TMP_DIR
- --with-petagene-dir WITH_PETAGENE_DIR
- --keep-tmp
- --license-file LICENSE_FILE
- --no-seccomp-override
- --version
Path to the log file. If not specified, messages will only be written to the standard error output. (default: None)
Full path to the directory where temporary files will be stored.
Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials (default: None)
Do not delete the directory storing temporary files after completion.
Path to license file license.bin if not in the installation directory.
Do not override seccomp options for docker (default: None).
View compatible software versions.