rna_fq2bam - NVIDIA Docs

Spliced Transcripts Alignment to a Reference.

Quick Start

Copy
Copied!

            
            $ pbrun rna_fq2bam \
    --in-fq sample_X_1.fq.gz sample_X_2.fq.gz \
    --genome-lib-dir HG38 \
    --output-dir sample_X/ \
    --ref ref.fasta

Compatible CPU Command

The output from these commands will be identical to the output from the above command. See the Output Comparison page for comparing the results.

Copy
Copied!

            
            #STAR Alignment
$ ./STAR --genomeDir HG38 --readFilesIn sample_X_1.fq.gz sample_X_2.fq.gz
--outFileNamePrefix sample_X/ --outSAMtype BAM SortedByCoordinate

#Coordinate Sorting
$ gatk SortSam --java-options -Xmx30g --MAX_RECORDS_IN_RAM=5000000 -I=Aligned.sortedByCoord.out.bam \
-O=cpu.bam --SORT_ORDER=coordinate --TMP_DIR=/raid/myrun

# Mark Duplicates
$ gatk MarkDuplicates --java-options -Xmx30g -I=cpu.bam -O=mark_dups_cpu.bam \
-M=metrics.txt --TMP_DIR=/raid/myrun

rna_fq2bam Reference

Run RNA-seq data through the fq2bam pipeline. It will run STAR aligner, co-ordinate sorting and mark duplicates.

Input/Output file options

--ref REF
--in-fq [IN_FQ [IN_FQ ...]]
--in-se-fq [IN_SE_FQ [IN_SE_FQ ...]]
--genome-lib-dir GENOME_LIB_DIR
--output-dir OUTPUT_DIR
--out-bam OUT_BAM

Options specific to this tool

--num-threads NUM_THREADS
--out-prefix OUT_PREFIX
--read-files-command READ_FILES_COMMAND
--read-group-sm READ_GROUP_SM
--read-group-lb READ_GROUP_LB
--read-group-pl READ_GROUP_PL
--read-group-id-prefix READ_GROUP_ID_PREFIX
--num-sa-bases NUM_SA_BASES
--max-intron-size MAX_INTRON_SIZE
--min-intron-size MIN_INTRON_SIZE
--min-match-filter MIN_MATCH_FILTER
--min-match-filter-normalized MIN_MATCH_FILTER_NORMALIZED
--out-filter-intron-motifs OUT_FILTER_INTRON_MOTIFS
--max-out-filter-mismatch MAX_OUT_FILTER_MISMATCH
--max-out-filter-mismatch-ratio MAX_OUT_FILTER_MISMATCH_RATIO
--max-out-filter-multimap MAX_OUT_FILTER_MULTIMAP
--out-reads-unmapped OUT_READS_UNMAPPED
--out-sam-unmapped OUT_SAM_UNMAPPED
--out-sam-attributes OUT_SAM_ATTRIBUTES [OUT_SAM_ATTRIBUTES ...]
--out-sam-strand-field OUT_SAM_STRAND_FIELD
--out-sam-mode OUT_SAM_MODE
--out-sam-mapq-unique OUT_SAM_MAPQ_UNIQUE
--min-score-filter MIN_SCORE_FILTER
--min-spliced-mate-length MIN_SPLICED_MATE_LENGTH
--max-junction-mismatches MAX_JUNCTION_MISMATCHES MAX_JUNCTION_MISMATCHES MAX_JUNCTION_MISMATCHES MAX_JUNCTION_MISMATCHES
--max-out-read-size MAX_OUT_READ_SIZE
--max-alignments-per-read MAX_ALIGNMENTS_PER_READ
--score-gap SCORE_GAP
--seed-search-start SEED_SEARCH_START
--max-bam-sort-memory MAX_BAM_SORT_MEMORY
--align-ends-type ALIGN_ENDS_TYPE
--align-insertion-flush ALIGN_INSERTION_FLUSH
--max-align-mates-gap MAX_ALIGN_MATES_GAP
--min-align-spliced-mate-map MIN_ALIGN_SPLICED_MATE_MAP
--max-collapsed-junctions MAX_COLLAPSED_JUNCTIONS
--min-align-sj-overhang MIN_ALIGN_SJ_OVERHANG
--min-align-sjdb-overhang MIN_ALIGN_SJDB_OVERHANG
--sjdb-overhang SJDB_OVERHANG
--min-chim-overhang MIN_CHIM_OVERHANG
--min-chim-segment MIN_CHIM_SEGMENT
--max-chim-multimap MAX_CHIM_MULTIMAP
--chim-multimap-score-range CHIM_MULTIMAP_SCORE_RANGE
--chim-score-non-gtag CHIM_SCORE_NON_GTAG
--min-non-chim-score-drop MIN_NON_CHIM_SCORE_DROP
--out-chim-format OUT_CHIM_FORMAT
--two-pass-mode TWO_PASS_MODE
--out-chim-type OUT_CHIM_TYPE
--read-name-separator READ_NAME_SEPARATOR [READ_NAME_SEPARATOR ...]

Common options:

--logfile LOGFILE
--tmp-dir TMP_DIR
--with-petagene-dir WITH_PETAGENE_DIR
--keep-tmp
--license-file LICENSE_FILE
--no-seccomp-override
--version

GPU options:

--num-gpus NUM_GPUS
--gpu-devices GPU_DEVICES

Note

The --in-fq option takes the names of two FASTQ files, optionally followed by a quoted read group. The FASTQ filenames must not start with a hyphen.