NVIDIA Docs Hub NVIDIA Clara Clara Parabricks 4.2.1 rna_fq2bam

rna_fq2bam

This tool is the equivalent of fq2bam for RNA-Seq samples, receiving inputs in FASTQ format, performing alignment with the splice-aware STAR algorithm, optionally marking of duplicate reads, and outputting an aligned BAM file ready for variant and fusion calling.

Quick Start

Copy
Copied!

            
            # This command assumes all the inputs are in INPUT_DIR and all the outputs go to OUTPUT_DIR.
docker run --rm --gpus all --volume INPUT_DIR:/workdir --volume OUTPUT_DIR:/outputdir \
    --workdir /workdir \
    nvcr.io/nvidia/clara/clara-parabricks:4.2.1-1 \
    pbrun rna_fq2bam \
    --in-fq /workdir/${INPUT_FASTQ_1} /workdir/${INPUT_FASTQ_2} \
    --genome-lib-dir /workdir/${PATH_TO_GENOME_LIBRARY}/ \
    --output-dir /outputdir/${PATH_TO_OUTPUT_DIRECTORY} \
    --ref /workdir/${REFERENCE_FILE} \
    --out-bam /outputdir/${OUTPUT_BAM} \
    --read-files-command zcat

Compatible CPU Command

The output from these commands will be identical to the output from the above command. See the Output Comparison page for comparing the results.

Copy
Copied!

            
            # STAR Alignment
$ ./STAR \
      --genomeDir <INPUT_DIR>/${PATH_TO_GENOME_LIBRARY} \
      --readFilesIn <INPUT_DIR>/${INPUT_FASTQ_1} <INPUT_DIR>/${INPUT_FASTQ_2} \
      --outFileNamePrefix <OUTPUT_DIR>/${PATH_TO_OUTPUT_DIRECTORY}/ \
      --outSAMtype BAM SortedByCoordinate \
      --readFilesCommand zcat

# Mark Duplicates
$ gatk MarkDuplicates \
    --java-options -Xmx30g \
    -I Aligned.sortedByCoord.out.bam \# This filename is determined by STAR.
    -O <OUTPUT_DIR>/${NAME_OF_OUTPUT_BAM_FILE} \
    -M metrics.txt

Note

Make sure you have the same version of STAR installed that was used to build the genome index.

The Parabricks version of STAR is compatible with the 2.7.2a CPU-only version of STAR.

rna_fq2bam Reference

Run RNA-seq data through the fq2bam pipeline. It will run STAR aligner, co- ordinate sorting and mark duplicates.

Input/Output file options

--ref REF
--in-fq [IN_FQ [IN_FQ ...]]
--in-se-fq [IN_SE_FQ [IN_SE_FQ ...]]
--genome-lib-dir GENOME_LIB_DIR
--output-dir OUTPUT_DIR
--out-bam OUT_BAM

Tool Options:

--out-prefix OUT_PREFIX
--read-files-command READ_FILES_COMMAND
--read-group-sm READ_GROUP_SM
--read-group-lb READ_GROUP_LB
--read-group-pl READ_GROUP_PL
--read-group-id-prefix READ_GROUP_ID_PREFIX
--num-sa-bases NUM_SA_BASES
--max-intron-size MAX_INTRON_SIZE
--min-intron-size MIN_INTRON_SIZE
--min-match-filter MIN_MATCH_FILTER
--min-match-filter-normalized MIN_MATCH_FILTER_NORMALIZED
--out-filter-intron-motifs OUT_FILTER_INTRON_MOTIFS
--max-out-filter-mismatch MAX_OUT_FILTER_MISMATCH
--max-out-filter-mismatch-ratio MAX_OUT_FILTER_MISMATCH_RATIO
--max-out-filter-multimap MAX_OUT_FILTER_MULTIMAP
--out-reads-unmapped OUT_READS_UNMAPPED
--out-sam-unmapped OUT_SAM_UNMAPPED
--out-sam-attributes OUT_SAM_ATTRIBUTES [OUT_SAM_ATTRIBUTES ...]
--out-sam-strand-field OUT_SAM_STRAND_FIELD
--out-sam-mode OUT_SAM_MODE
--out-sam-mapq-unique OUT_SAM_MAPQ_UNIQUE
--min-score-filter MIN_SCORE_FILTER
--min-spliced-mate-length MIN_SPLICED_MATE_LENGTH
--max-junction-mismatches MAX_JUNCTION_MISMATCHES MAX_JUNCTION_MISMATCHES MAX_JUNCTION_MISMATCHES MAX_JUNCTION_MISMATCHES
--max-out-read-size MAX_OUT_READ_SIZE
--max-alignments-per-read MAX_ALIGNMENTS_PER_READ
--score-gap SCORE_GAP
--seed-search-start SEED_SEARCH_START
--max-bam-sort-memory MAX_BAM_SORT_MEMORY
--align-ends-type ALIGN_ENDS_TYPE
--align-insertion-flush ALIGN_INSERTION_FLUSH
--max-align-mates-gap MAX_ALIGN_MATES_GAP
--min-align-spliced-mate-map MIN_ALIGN_SPLICED_MATE_MAP
--max-collapsed-junctions MAX_COLLAPSED_JUNCTIONS
--min-align-sj-overhang MIN_ALIGN_SJ_OVERHANG
--min-align-sjdb-overhang MIN_ALIGN_SJDB_OVERHANG
--sjdb-overhang SJDB_OVERHANG
--min-chim-overhang MIN_CHIM_OVERHANG
--min-chim-segment MIN_CHIM_SEGMENT
--max-chim-multimap MAX_CHIM_MULTIMAP
--chim-multimap-score-range CHIM_MULTIMAP_SCORE_RANGE
--chim-score-non-gtag CHIM_SCORE_NON_GTAG
--min-non-chim-score-drop MIN_NON_CHIM_SCORE_DROP
--out-chim-format OUT_CHIM_FORMAT
--two-pass-mode TWO_PASS_MODE
--out-chim-type OUT_CHIM_TYPE
--no-markdups
--read-name-separator READ_NAME_SEPARATOR [READ_NAME_SEPARATOR ...]

Performance Options:

--num-threads NUM_THREADS

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--no-seccomp-override
--version

GPU options:

--num-gpus NUM_GPUS

Note

The --in-fq option takes the names of two FASTQ files, optionally followed by a quoted read group. The FASTQ filenames must not start with a hyphen.