Release Notes#
PVA SDK v2.7.1#
PVA SDK 2.7.1 is a patch release for PVA SDK 2.7. PVA SDK 2.7 is a feature release which adds support for improved internal driver APIs and various new runtime features, described in these release notes.
Supported Platforms and Devices#
Host#
The PVA SDK is designed for cross-compilation from an x86-64 host development environment. Supported host operating systems for this release are:
Ubuntu 22.04
Ubuntu 24.04
The PVA SDK may be used on a supported OS installation or via Docker. Refer to Installation and PVA SDK samples for more details on host setup and dependencies.
Targets#
Supported targets are:
DRIVE OS 6.5 or greater
JetPack 6.0 or greater
Native x86
PVA Architectures#
This release of PVA SDK supports Orin and Thor PVA architectures.
New Features#
System API Support#
Added support for new system APIs which are present in driver version 2.7. Driver version 2.7 is expected to be available with DRIVE OS 7.0.3.0 and JetPack 7.0. These new system APIs offer a considerable performance improvement and enable new runtime functionalities. It is recommended to migrate to platforms that provide support for these APIs. Older driver versions will remain supported with this release.
Runtime API Features#
Introduced
cupva::cuda::Submit()
, a free function that allows submitting cuPVA workloads directly to CUDA streams. CUDA stream VPU affinity may be set viacupva::cuda::SetAffinity()
.Added
cupva::mem::MapL2()
, which allows mapping DRAM device pointers to L2SRAM. CmdPrograms that use the mapped device pointer will be able to automatically share persistent L2SRAM with other CmdPrograms, regardless of whether they are submitted as part of the same batch. Explicit coherency operations may also be inserted withcupva::CmdL2Ops
.Introduced
cupvaGSDFUpdatePlaneIndices()
for usingcupva::GatherScatterDataFlow
with multi-plane surfaces. This is a replacement tocupvaGSDFUpdateTilesWithOffsets()
which is now considered deprecated.Introduced host-side APIs to support
cupvaGSDFUpdatePlaneIndices()
, including new src and dst overloads which allow specifyingPlanarGeometry
.Added new overloads for
cupvaSQDFUpdateAddr()
which allow updating source and destination line pitches along with addresses.
Samples#
Added a new sample,
warp_gsdf
, which demonstrates usingcupva::GatherScatterDataFlow
for an image warping kernel.
Tools#
Introduced a new CMake global flag,
PVA_DISASSEMBLE
. Previously, all device code object files and executables were disassembled after building. Now, users should open in for this behavior by passing-DPVA_DISASSEMBLE=ON
to CMake.
Deprecations and Compatibility#
cupva::cuda::CreateStream()
andCupvaCudaCreateStream()
are deprecated. Users are encouraged to instead submit directly to CUDA streams usingcupva::cuda::Submit()
andCupvaCudaStreamSubmit()
.cupvaGSDFUpdateTilesWithOffsets()
is deprecated. UsecupvaGSDFUpdatePlaneIndices()
instead.A
cupva::SyncObj
created withcupva::SyncWaitMode::SPIN
can no longer be shared between multiplecupva::Context
objects in the same process. Usecupva::SyncWaitMode::YIELD
orNvSciSync
for these use cases.Previously, the last
cupva::CmdRequestFences
submitted as part of a batch to a profiling stream did not count as part of the batch being processed. Now, such aCmd
is profiled as part of the profiling stream batch. Therefore, waiting on such a fence no longer guarantees that the batch has completed and its profiling data is available. Instead, users should submit a separate fence to the profiling stream and wait on it to ensure all prior batches have completed before reading the profiling data.In failing submissions containing more than one command, only the failing program will report the reason for failure in its output status. Other programs submitted near the failing command in the same submission may report
cupva::Error::AbortedCmdBuffer
. Incupva::OrderType::OUT_OF_ORDER
mode, this does not necessarily indicate that the command itself was aborted, as it may have been partially or fully executed before the abort occurred.The APIs that allocate cuPVA device pointers directly no longer accept access modes other than
cupva::mem::AccessType::READ_WRITE
. This affectscupva::mem::Alloc()
andCupvaMemAlloc()
. Trying to pass other access modes is always an application error and is now reported as such. The accessType parameter will be removed in a future release.
Bug Fixes#
The following bugs are resolved with this release.
Reference |
Description |
Fix information |
Affected platforms |
---|---|---|---|
PVAAS-14050 |
|
Fixed with driver version >=2.7 |
Orin and Thor |
PVAAS-16962 |
Some parameter configurations to PVA APL APIs were giving incorrect outputs. |
Fixed |
Orin and Thor |
PVAAS-20713 |
PVA failure during NvSciSyncObj import |
Fixed |
Orin and Thor |
Known Issues#
This release contains the following known issues:
Reference |
Description |
Suggested workaround |
Projected closure |
---|---|---|---|
PVAAS-16827 |
When linking StaticDataFlows with constant padding, the constant value must be the same even if px/py are both zero for one of the dataflows. Failure to do so leads to Dataflow compilation failing. |
Use the same constant value in these cases. |
Not planned |
PVAAS-12732 |
A StaticDataFlow specifying VMEM->VMEM transfer cannot be linked to a StaticDataFlow specifying DRAM->VMEM transfer. |
Use unlinked DataFlows for these cases. |
Not planned |
PVAAS-6633 |
NvSci APIs and CUDA interoperability are not supported for native. |
Decouple interop from execution to allow testing kernels in isolation in native. |
Not planned |
PVAAS-16828 |
Multiple |
Use separate handlers for such cases. |
Not planned |
PVAAS-16829 |
Chess compiler may optimize away some VMEM store instructions when calling dataflow Open/Update and Trig APIs back to back, leading to data corruption. |
Insert a chess_memory_fence() between Open and Trig APIs. |
Future feature release |
PVAAS-17465 |
When using a single UnifiedRDFHandler for multiple tile buffers in circular buffer mode, only the tilebuffer specified to |
Explicitly copy 64B of secondary tilebuffers using |
Future feature release |