giraffe (vg giraffe + GATK)
Note that the Parabricks GPU-accelerated Giraffe tool is currently in beta.
Generate BAM output given one or a pair of FASTQ files using the pangenome aligner VG Giraffe [1] [2].
See the giraffe Reference section for a detailed listing of all available options.
VG Giraffe is a short-read mapping tool developed by Dr. Benedict Paten's lab at the University of California, Santa Cruz (UCSC). This innovative tool can align reads to a graph representation of multiple reference genomes, enhancing the quality of downstream analyses. By accurately mapping reads to thousands of genomes simultaneously, VG Giraffe offers a substantial improvement over traditional single-reference aligners.
By utilizing a graph-based approach, VG Giraffe can more effectively handle genetic diversity and structural variations across populations. Here are three key benefits of using VG Giraffe:
Improved accuracy: VG Giraffe achieves higher precision and recall in read mapping compared to linear genome aligners, especially when dealing with complex genomic regions or populations with significant genetic diversity.
Reduced reference bias (or mapping bias): By incorporating multiple haplotypes and known variants into its graph structure, VG Giraffe minimizes the reference bias inherent in traditional linear genome aligners. This leads to more comprehensive and unbiased characterization of genetic variation, especially for samples that diverge significantly from the standard reference genome.
Faster performance: Despite working with more complex graph structures, VG Giraffe is significantly faster than its predecessor VG Map and comparable in speed to popular linear genome mappers. It can map sequencing reads to thousands of human genomes at a speed similar to methods that map to a single reference genome.
VG Giraffe can be used within Parabricks, a software suite designed for accelerated
secondary analysis in genomics. Our wrapper (pbrun giraffe) will run our GPU-accelerated
VG Giraffe and sort the output BAM by coordinate.
While users can build custom reference graphs for VG Giraffe using the VG Autoindex tool, pre-built pangenome graphs are also available. Dr. Paten's lab and the Human Pangenome Consortium have made these resources publicly accessible, allowing researchers to leverage high-quality, ready-to-use pangenome graphs for their analyses (HPRC data).
The index files .gbz, .dist, .min, and .zipcodes are required to run Giraffe.
A reference paths file is also needed to define the set of paths used for BAM output.
The index files can be generated from a GBZ graph using vg autoindex. The following
example uses the HPRC v1.1 Minigraph-Cactus pangenome graph aligned to GRCh38:
# Download GBZ
# https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.d9.gbz
aws s3 cp \
s3://human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.d9.gbz \
. \
--no-sign-request
# Extract index files from GBZ
docker run --rm --volume $(pwd):/workdir \
--workdir /workdir \
--user $(id -u):$(id -g) \
quay.io/vgteam/vg:v1.70.0 \
vg autoindex \
-p hprc-v1.1-mc-grch38.d9.autoindex.1.70 \
-G hprc-v1.1-mc-grch38.d9.gbz \
-w giraffe
# Extract paths from GBZ
docker run --rm \
--user $(id -u):$(id -g) \
--volume $(pwd):/workdir \
--workdir /workdir \
quay.io/vgteam/vg:v1.70.0 \
vg paths -x hprc-v1.1-mc-grch38.d9.gbz \
-L --paths-by GRCh38 > hprc-v1.1-mc-grch38.d9.paths
# As per best practices, remove decoys, unplaced/unlocalized contigs,
# and other non-primary paths unnecessary for pangenome graph alignment.
grep -v _decoy hprc-v1.1-mc-grch38.d9.paths \
| grep -v _random \
| grep -v chrUn_ \
| grep -v chrEBV \
| grep -v chrM \
| grep -v chain_ > hprc-v1.1-mc-grch38.d9.paths.sub
Before running giraffe, ensure you have generated the required index files. See the index generation section above for instructions.
# This command assumes all the inputs are in the current working directory and all the outputs go to the same place.
docker run --rm --gpus all --volume $(pwd):/workdir --volume $(pwd):/outputdir \
--workdir /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1 \
pbrun giraffe --read-group "sample_rg1" \
--sample "sample-name" --read-group-library "library" \
--read-group-platform "platform" --read-group-pu "pu" \
--gbz-name /workdir/hprc-v1.1-mc-grch38.d9.gbz \
--dist-name /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.dist \
--minimizer-name /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.shortread.withzip.min \
--zipcodes-name /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.shortread.zipcodes \
--ref-paths /workdir/hprc-v1.1-mc-grch38.d9.paths.sub \
--in-fq /workdir/${INPUT_FASTQ_1} /workdir/${INPUT_FASTQ_2} \
--out-bam /outputdir/${OUTPUT_BAM}
To ensure optimal performance with VG Giraffe, please consider the following system requirements based on your GPU configuration:
A 2 GPU system should have at least 100GB CPU RAM and at least 32 CPU threads.
A 4 GPU system should have at least 200GB CPU RAM and at least 64 CPU threads.
For GPUs with less than 22 GB of device memory, use
--low-memory.
By default, --nstreams is set to auto, which enables auto mode. In this
mode, Giraffe automatically configures the number of CUDA streams, batch size, and GPU
acceleration options based on the available GPU device memory. Auto mode is designed to
provide sensible defaults, but may still require further optimization for each specific
system. The following table summarizes the auto mode configuration based on GPU
device memory:
GPU Memory | Streams | Batch Size | minimizers-gpu (SE only) |
|---|---|---|---|
| < 22 GB | 1 (low-memory) | 5000 | No |
| 22-32 GB | 1 | 8000 | No |
| 32-40 GB | 2 | 10000 | No |
| 40-80 GB | 3 | 10000 | No |
| 80-120 GB | 4 | 10000 | Yes |
| >= 120 GB | 5 | 10000 | Yes |
Auto mode also takes host memory into account. If host memory is insufficient,
--minimizers-gpu may be disabled, and batch size and work queue capacity may
be reduced.
Note that --minimizers-gpu is only enabled for single-end (SE) reads.
For paired-end (PE) reads, the number of streams and batch size are configured as
shown above, but --minimizers-gpu is always disabled.
For best performance, auto mode can be overridden by explicitly setting --nstreams
and other options. The following configurations are, for example, recommended for L4, H100
and RTX PRO 6000 Blackwell Server Edition GPUs:
L4 (16 GB):
--batch-size 5000 --nstreams 2H100 (80 GB):
--nstreams 5 --num-cpu-threads-per-gpu 24 --minimizers-gpuRTX PRO 6000 Blackwell Server Edition (96 GB):
--nstreams 4 --num-cpu-threads-per-gpu 24 --minimizers-gpu
Note: While a fixed base memory allocation exists per device, the number of streams and batch size are the primary factors affecting total device memory consumption.
To use Giraffe-aligned BAM files for variant calling, you need to extract the appropriate reference file from the Giraffe index files. Run the following commands from the directory containing the Giraffe index files:
# Extract the sequences corresponding to the list of paths to a FASTA file
docker run --rm \
--user $(id -u):$(id -g) \
--volume $(pwd):/workdir \
--workdir /workdir \
quay.io/vgteam/vg:v1.70.0 \
vg paths -x hprc-v1.1-mc-grch38.d9.gbz \
-p hprc-v1.1-mc-grch38.d9.paths.sub \
-F > hprc-v1.1-mc-grch38.d9.fa
# Index the FASTA file
docker run --rm \
--user $(id -u):$(id -g) \
--volume $(pwd):/workdir \
--workdir /workdir \
quay.io/biocontainers/samtools:1.17--hd87286a_2 \
samtools faidx hprc-v1.1-mc-grch38.d9.fa
These commands will generate a FASTA file (hprc-v1.1-mc-grch38.d9.fa),
and the corresponding index (hprc-v1.1-mc-grch38.d9.fa.fai), that can
be used as the reference for variant calling. Note that these files can be also used
for BQSR (bqsr). We can now run Giraffe to obtain the aligned BAM as follows:
# This command assumes all the inputs are in the current working directory and all the outputs go to the same place.
docker run --rm --gpus all --volume $(pwd):/workdir --volume $(pwd):/outputdir \
--workdir /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1 \
pbrun giraffe --read-group "sample_rg1" \
--sample "sample-name" --read-group-library "library" \
--read-group-platform "platform" --read-group-pu "pu" \
--gbz-name /workdir/hprc-v1.1-mc-grch38.d9.gbz \
--dist-name /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.dist \
--minimizer-name /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.shortread.withzip.min \
--zipcodes-name /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.shortread.zipcodes \
--ref-paths /workdir/hprc-v1.1-mc-grch38.d9.paths.sub \
--in-fq /workdir/${INPUT_FASTQ_1} /workdir/${INPUT_FASTQ_2} \
--out-bam /outputdir/${OUTPUT_BAM}
Once you have the Giraffe-aligned BAM file and the extracted reference FASTA, you can proceed with variant calling using HaplotypeCaller, Deepvariant, Pangenome_aware_deepvariant, or the end-to-end pangenome_germline pipeline.
# Haplotype Caller
# This command assumes all the inputs are in the current working directory and all the outputs go to the same place.
docker run --rm --gpus all --volume $(pwd):/workdir --volume $(pwd):/outputdir \
--workdir /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1 \
pbrun haplotypecaller \
--ref /workdir/hprc-v1.1-mc-grch38.d9.fa \
--in-bam /workdir/${INPUT_BAM} \
--in-recal-file /workdir/${INPUT_RECAL_FILE} \
--out-variants /outputdir/${OUTPUT_VCF}
# Deepvariant
# This command assumes all the inputs are in the current working directory and all the outputs go to the same place.
docker run --rm --gpus all --volume $(pwd):/workdir --volume $(pwd):/outputdir \
--workdir /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1 \
pbrun deepvariant \
--ref /workdir/hprc-v1.1-mc-grch38.d9.fa \
--in-bam /workdir/${INPUT_BAM} \
--out-variants /outputdir/${OUTPUT_VCF}
# Pangenome_aware_deepvariant
# This command assumes all the inputs are in the current working directory and all the outputs go to the same place.
docker run --rm --gpus all --volume $(pwd):/workdir --volume $(pwd):/outputdir \
--workdir /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1 \
pbrun pangenome_aware_deepvariant \
--ref /workdir/hprc-v1.1-mc-grch38.d9.fa \
--pangenome /workdir/hprc-v1.1-mc-grch38.d9.gbz \
--in-bam /workdir/${INPUT_BAM} \
--out-variants /outputdir/${OUTPUT_VCF}
For more detailed instructions on variant calling, please refer to the tool-specific documentation (haplotypecaller, deepvariant, pangenome_aware_deepvariant, pangenome_germline).
Giraffe's haplotype sampling functionality, activated using arguments
--haplotype-name and --kff-name, was introduced to significantly enhance
alignment accuracy by tailoring the reference graph to the specific genetic profile
of a sample. This process begins by analyzing sequencing reads with a kmer counter to
identify patterns of kmer presence and frequency. Using this information, Giraffe
sub-samples the GBWT (using the original .hapl and .gbz files)
to select haplotypes that best represent the sample, creating a customized graph.
From this tailored graph, Giraffe also generates new index files
(.dist, .min, and .zipcodes) that are optimized for the sample to be analyzed.
These steps can be performed using the baseline VG container for graph customization and index generation, followed by Parabricks' accelerated Giraffe for high-performance alignment, as demonstrated below.
The required .hapl and .gbz files can be downloaded as follows:
aws s3 cp \
s3://human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.gbz \
. \
--no-sign-request
aws s3 cp \
s3://human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.hapl \
. \
--no-sign-request
You will also need to extract and filter reference paths for this graph:
docker run --rm \
--user $(id -u):$(id -g) \
--volume $(pwd):/workdir \
--workdir /workdir \
quay.io/vgteam/vg:v1.70.0 \
vg paths -x hprc-v1.1-mc-grch38.gbz \
-L --paths-by GRCh38 > hprc-v1.1-mc-grch38.paths
grep -v _decoy hprc-v1.1-mc-grch38.paths \
| grep -v _random \
| grep -v chrUn_ \
| grep -v chrEBV \
| grep -v chrM \
| grep -v chain_ > hprc-v1.1-mc-grch38.paths.sub
# Run KMC on the input reads to obtain the .kff file
mkdir kmc_tmp_dir
cat > input.fq.paths <<- EOM
${INPUT_FASTQ_1}
${INPUT_FASTQ_2}
EOM
docker run --rm --volume $(pwd):/workdir \
--workdir /workdir \
quay.io/biocontainers/kmc:3.2.4--haf24da9_3 \
kmc \
-k29 \
-m128 \
-okff \
-t64 \
@input.fq.paths \
input.fq.distr kmc_tmp_dir
# Compute the sampled .gbz file using the baseline container
docker run --rm --volume $(pwd):/workdir \
--workdir /workdir \
quay.io/vgteam/vg:v1.70.0 \
vg haplotypes \
-v 2 -t 64 \
--include-reference \
--diploid-sampling \
-i hprc-v1.1-mc-grch38.hapl \
-k input.fq.distr.kff \
-g hprc-v1.1-mc-grch38.sampled.gbz \
hprc-v1.1-mc-grch38.gbz
# Generate index files from the sampled graph using autoindex
docker run --rm --volume $(pwd):/workdir \
--workdir /workdir \
--user $(id -u):$(id -g) \
quay.io/vgteam/vg:v1.70.0 \
vg autoindex \
-p hprc-v1.1-mc-grch38.sampled.autoindex.1.70 \
-G hprc-v1.1-mc-grch38.sampled.gbz \
-w giraffe
# Align the reads to the sampled graph using Parabricks Giraffe
# This command assumes all the inputs are in the current working directory and all the outputs go to the same place.
docker run --rm --gpus all --volume $(pwd):/workdir --volume $(pwd):/outputdir \
--workdir /workdir \
nvcr.io/nvidia/clara/clara-parabricks:4.7.0-1 \
pbrun giraffe --read-group "sample_rg1" \
--sample "sample-name" --read-group-library "library" \
--read-group-platform "platform" --read-group-pu "pu" \
--gbz-name hprc-v1.1-mc-grch38.sampled.gbz \
--dist-name hprc-v1.1-mc-grch38.sampled.autoindex.1.70.dist \
--minimizer-name hprc-v1.1-mc-grch38.sampled.autoindex.1.70.shortread.withzip.min \
--zipcodes-name hprc-v1.1-mc-grch38.sampled.autoindex.1.70.shortread.zipcodes \
--ref-paths hprc-v1.1-mc-grch38.paths.sub \
--in-fq ${INPUT_FASTQ_1} ${INPUT_FASTQ_2} \
--out-bam /outputdir/${OUTPUT_BAM}
The commands below are the vg-1.70.0 and GATK4 counterpart of the Parabricks command above. The output from these commands will be identical to the output from the above command. See the Output Comparison page for comparing the results.
The index files used below are generated in the index generation section.
# Run giraffe and pipe the output to create a sorted BAM.
$ vg giraffe \
-t 16 \
-Z /workdir/hprc-v1.1-mc-grch38.d9.gbz \
-d /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.dist \
-m /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.shortread.withzip.min \
-z /workdir/hprc-v1.1-mc-grch38.d9.autoindex.1.70.shortread.zipcodes \
--ref-paths /workdir/hprc-v1.1-mc-grch38.d9.paths.sub \
-f /workdir/${INPUT_FASTQ_1} \
-f /workdir/${INPUT_FASTQ_2} \
--output-format bam | \
gatk SortSam \
--java-options -Xmx30g \
--MAX_RECORDS_IN_RAM 5000000 \
-I /dev/stdin \
-O cpu.bam \
--SORT_ORDER coordinate
# Mark duplicates.
$ gatk MarkDuplicates \
-I cpu.bam \
-O cpu.markdup.bam \
-M metrics.txt
When comparing output with the CPU counterpart the following can be sources of small differences.
Baseline VG Container
Single-end (SE) reads: Parabricks matches the baseline
quay.io/vgteam/vg:v1.70.0container exactly. No modifications to the baseline container are needed.Paired-end (PE) reads: A bug fix for fragment distance recording is required in the baseline container. You need to cherry-pick the fix and rebuild the container as follows:
# Clone the repo (full history needed for cherry-pick)
git clone https://github.com/vgteam/vg.git
cd vg
# Checkout v1.70.0 tag and create a patch branch
git checkout v1.70.0
git checkout -b v1.70.0-fragment-fix
# Initialize submodules (required for build)
git submodule update --init --recursive
# Cherry-pick the bug fix
git cherry-pick d99a2a4d4b16500ec8dd4bd9d9d93c7fbec26ed1
# Build the Docker container
make version
docker build --build-arg THREADS=64 -t vg:v1.70.0-fragment-fix .
Unmapped reads
Parabricks
giraffesorts unmapped reads slightly differently than baseline GATK SortSam. Unmapped reads can be filtered with samtools by runningsamtools view -F 4.
Align reads to a pangenome graph.
Type | Name | Required? | Description |
|---|---|---|---|
| I/O | --in-fq [IN_FQ ...] |
No | Path to the paired-end FASTQ files. The files must be in fastq or fastq.gz format. Example 1: --in-fq sampleX_1_1.fastq.gz sampleX_1_2.fastq.gz. |
| I/O | --in-fq-list IN_FQ_LIST |
No | Path to a file that contains the locations of pair-ended FASTQ files. Each line must contain the location of the FASTQ files followed by a read group, each separated by a space. Each pair of files (and associated read group) must be on a separate line. Files must be in fastq/fastq.gz format. Line syntax: |
| I/O | --in-se-fq [IN_SE_FQ ...] |
No | Path to the single-end FASTQ file. The file must be in fastq or fastq.gz format. |
| I/O | --in-se-fq-list IN_SE_FQ_LIST |
No | Path to a file that contains the locations of single-ended FASTQ files. Each line must contain the location of the FASTQ files followed by a read group, each separated by a space. Each file (and associated read group) must be on a separate line. Files must be in fastq/fastq.gz format. Line syntax: |
| I/O | -d DIST_NAME, --dist-name DIST_NAME |
Yes | Cluster using this distance index. |
| I/O | -m MINIMIZER_NAME, --minimizer-name MINIMIZER_NAME |
Yes | Use this minimizer index. |
| I/O | -Z GBZ_NAME, --gbz-name GBZ_NAME |
Yes | Map to this GBZ graph. |
| I/O | -z ZIPCODES_NAME, --zipcodes-name ZIPCODES_NAME |
Yes | Use this zipcodes file for clustering. |
| I/O | -x XG_NAME, --xg-name XG_NAME |
No | XG graph used for BAM output. |
| I/O | -g GRAPH_NAME, --graph-name GRAPH_NAME |
No | GBWTGraph used for mapping. |
| I/O | -H GBWT_NAME, --gbwt-name GBWT_NAME |
No | GBWT index for mapping. |
| I/O | --out-bam OUT_BAM |
Yes | Path of a BAM file for output. |
| I/O | --ref-paths REF_PATHS |
No | Path to file containing ordered list of paths in the graph, one per line or HTSlib .dict, for HTSLib @SQ headers. |
| I/O | --out-duplicate-metrics OUT_DUPLICATE_METRICS |
No | Path of duplicate metrics file after marking duplicates. |
| Tool | --read-group READ_GROUP |
No | Read group ID for this run. |
| Tool | --sample SAMPLE |
No | Sample (SM) tag for read group in this run. |
| Tool | --read-group-library READ_GROUP_LIBRARY |
No | Library (LB) tag for read group in this run. |
| Tool | --read-group-platform READ_GROUP_PLATFORM |
No | Platform (PL) tag for read group in this run; refers to platform/technology used to produce reads. |
| Tool | --read-group-pu READ_GROUP_PU |
No | Platform unit (PU) tag for read group in this run. |
| Tool | --prune-low-cplx |
No | Prune short and low complexity anchors during linear format realignment. |
| Tool | --max-fragment-length MAX_FRAGMENT_LENGTH |
No | Assume that fragment lengths should be smaller than MAX-FRAGMENT-LENGTH when estimating the fragment length distribution. |
| Tool | --fragment-mean FRAGMENT_MEAN |
No | Force the fragment length distribution to have this mean. |
| Tool | --fragment-stdev FRAGMENT_STDEV |
No | Force the fragment length distribution to have this standard deviation. |
| Tool | --align-only |
No | Generate output BAM after vg-giraffe alignment. The output will not be co-ordinate sorted. |
| Tool | --copy-comment |
No | Append FASTQ comment to BAM output via auxiliary tag. |
| Tool | --no-markdups |
No | Do not perform the Mark Duplicates step. Return BAM after sorting. |
| Tool | --markdups-single-ended-start-end |
No | Mark duplicate on single-ended reads by 5' and 3' end. |
| Tool | --ignore-rg-markdups-single-ended |
No | Ignore read group info in marking duplicates on single-ended reads. This option must be used with --markdups-single-ended-start-end. |
| Tool | --markdups-assume-sortorder-queryname |
No | Assume the reads are sorted by queryname for marking duplicates. This will mark secondary, supplementary, and unmapped reads as duplicates as well. This flag will not impact variant calling while increasing processing times. |
| Tool | --markdups-picard-version-2182 |
No | Assume marking duplicates to be similar to Picard version 2.18.2. |
| Tool | --optical-duplicate-pixel-distance OPTICAL_DUPLICATE_PIXEL_DISTANCE |
No | The maximum offset between two duplicate clusters in order to consider them optical duplicates. Ignored if --out-duplicate-metrics is not passed. |
| Tool | --monitor-usage |
No | Monitor approximate CPU utilization and host memory usage during execution. |
| Tool | --max-read-length MAX_READ_LENGTH |
No | Maximum read length/size (i.e., sequence length) used for giraffe and filtering FASTQ input. (default: 480) |
| Tool | --min-read-length MIN_READ_LENGTH |
No | Minimum read length/size (i.e., sequence length) used for giraffe and filtering FASTQ input. (default: 1) |
| Performance | --nstreams NSTREAMS |
No | Number of streams per GPU to use; use 'auto' to set from GPU and host memory (may enable low-memory, dozeu/minimizers for SE). Integer overrides. More streams increases device and host memory usage. (default: auto) |
| Performance | --num-cpu-threads-per-gpu NUM_CPU_THREADS_PER_GPU |
No | Number of primary CPU threads to use per GPU. (default: 16) |
| Performance | --batch-size BATCH_SIZE |
No | Batch size used for processing alignments. (default: 10000) |
| Performance | --write-threads WRITE_THREADS |
No | Number of threads used for writing and pre-sorting output. (default: 4) |
| Performance | --gpuwrite |
No | Use one GPU to accelerate writing final BAM/CRAM. |
| Performance | --gpuwrite-deflate-algo GPUWRITE_DEFLATE_ALGO |
No | Choose the nvCOMP DEFLATE algorithm to use with --gpuwrite. Note these options do not correspond to CPU DEFLATE options. Valid options are 1, 2, and 4. Option 1 is fastest, while options 2 and 4 have progressively lower throughput but higher compression ratios. The default value is 1 when the user does not provide an input (i.e., None). |
| Performance | --gpusort |
No | Use GPUs to accelerate sorting and marking. |
| Performance | --use-gds |
No | Use GPUDirect Storage (GDS) to enable a direct data path for direct memory access (DMA) transfers between GPU memory and storage. Must be used concurrently with --gpuwrite. Please refer to Parabricks Documentation > Best Performance for information on how to set up and use GPUDirect Storage. |
| Performance | --memory-limit MEMORY_LIMIT |
No | System memory limit in GBs during sorting and postsorting. By default, the limit is half of the total system memory. (default: 62) |
| Performance | --low-memory |
No | Use low memory mode; will lower the number of streams per GPU and decrease the batch size. |
| Performance | --minimizers-gpu |
No | (SE only) Use GPU for minimizers and seeds. (default: False) |
| Performance | --work-queue-capacity WORK_QUEUE_CAPACITY |
No | Soft limit for the capacity of the work queues in between stages. (default: 40) |
| Runtime | --verbose |
No | Enable verbose output. |
| Runtime | --x3 |
No | Show full command line arguments. |
| Runtime | --logfile LOGFILE |
No | Path to the log file. If not specified, messages will only be written to the standard error output. |
| Runtime | --tmp-dir TMP_DIR |
No | Full path to the directory where temporary files will be stored. (default: .) |
| Runtime | --with-petagene-dir WITH_PETAGENE_DIR |
No | Full path to the PetaGene installation directory. By default, this should have been installed at /opt/petagene. Use of this option also requires that the PetaLink library has been preloaded by setting the LD_PRELOAD environment variable. Optionally set the PETASUITE_REFPATH and PGCLOUD_CREDPATH environment variables that are used for data and credentials. Optionally set the PetaLinkMode environment variable that is used to further configure PetaLink, notably setting it to "+write" to enable outputting compressed BAM and .fastq files. |
| Runtime | --keep-tmp |
No | Do not delete the directory storing temporary files after completion. |
| Runtime | --no-seccomp-override |
No | Do not override seccomp options for docker. |
| Runtime | --version |
No | View compatible software versions. |
| Runtime | --preserve-file-symlinks |
No | Override default behavior to keep file symlinks intact and not resolve the symlink. |
| Runtime | --num-gpus NUM_GPUS |
No | Number of GPUs to use for a run. (default: 1) |
Jouni Sirén et. al., Pangenomics enables genotyping of known structural variants in 5202 diverse genomes. Science 374, abg 8871 (2021). DOI: 10.1126/science.abg8871
Baseline VG Giraffe: https://github.com/vgteam/vg