PVA VPU compiler#
PVA SDK uses the Synopsys CHESS compiler to build VPU code. The CHESS compiler has two front ends.
The CHESS front end (noodle) is the legacy front end. It supports the C language and limited C++ features, such as references, function and operator overloading, etc.
The LLVM front end is a new CLANG based front end. It conforms new C and C++ language standards. It supports C++17 with one restriction: C++ exceptions are not supported.
LLVM Migration Guide#
Function Inlining Control#
LLVM front-end supports automatic function inlining. It may decide not to inline a function with an inline specifier or to inline a non-inline function. There are three ways to influence the LLVM front end to control function inlining:
Force the user to inline a function using the
always_inline
annotation.[[gnu::always_inline]] inline int foo(int x) { ... }
Instruct the LLVM front-end to never inline a function using noinline annotation.
[[gnu::noinline]] int foo(int x) { ... }
Memory Fence#
LLVM front-end can perform more aggressive optimizations than the CHESS front-end, especially for those inline functions.
In some cases, the LLVM front-end may optimize away some VMEM store instructions; e.g., it fails to detect data dependencies between VMEM store and gpo_set
instructions. To resolve such an issue, it is recommended to insert a chess_memory_fence
before gpo_set
instruction.
Prevent Breaking-Up Load and AGEN Restore Operations#
The LLVM front-end may perform aggressive optimization and break-up the load and AGEN restore operations apart, which causes compilation to fail. To resolve this issue, it is recommended to use the chess_protect_access
annotation to protect the
accesses on AGEN variables.
chess_protect_access agen_A in1 = init_agen_A_from_cfg(agens.cfg[0]);
Vector Demotion Operation with Interleaving#
LLVM front-end doesn’t support passing .lo and .hi fields of a XARF (extended accumulator register file) double vector type by reference for some intrinsics; e.g., vdemote_i
from VRF to XARF.
void vdemote_i(vintx src1, vintx src2, xvshortx &dst);
To workaround this issue, it is recommended to declare a single-vector XARF variable, and then pass it to the vdemote_i
.
dvintx a;
xvshortx b;
vdemote_i(a.lo, a.hi, b);
Function Name Mangling#
The Chess linker (bridge) uses a BCF file to control linking and memory layout for VPU ELFs. VPU developers are allowed to create a custom BCF to control VPU code layout for performance reasons, which requires to use mangled function names.
The noodle front-end uses CHESS function name mangling scheme for function overloading support. By default, this name mangling scheme is enabled in PVA SDK. With LLVM front-end, the standard Itanium name mangling is used. To migrate an existing VPU kernel with a custom BCF file to LLVM front-end, a user needs to convert all mangled function names in the BCF to LLVM C++ symbol names. PVA SDK provides a Python script to facilitate this conversion process.
pvasdkConvertToLLVMBCF.py
Assuming that you have a cuPVA device target foo, which has an exiting noodle BCF file foo.bcf
. The following steps are
required to use the script to convert a noodle BCF to a LLVM BCF.
Copy the
foo.bcf
tofoo_noodle.bcf
.Remove/disable all the lines started with
_symbol
infoo.bcf
.Build the foo target to generate foo lst file, which is needed in step 4.
Run
pvasdkConvertToLLVMBCF.py
to convert noodle symbol names to C++ symbol names.pvasdkConvertToLLVMBCF.py -i foo_noodle.bcf -l <FOO_LST_FILE> -o foo.bcf
Rebuild foo target with LLVM front-end and the converted
foo.bcf
.
Note
Chess name mangling is a Synopsys proprietary name mangling scheme. The pvasdkConvertToLLVMBCF.py
script may not handle all the mangling cases.
Performance Optimization Tips#
The CHESS compiler provides chess_guard annotation for predicated execution. This annotation is currently not supported by LLVM front-end. The LLVM front-end does not try to map the operations inside the if-statements with chess_guard onto guarded instructions, instead it turns the if-statements into real branches using jumps, which can adversely impact performance, especially when the if-statements are inside a loop.
To resolve such an issue, the user can change the code to remove the if-statements if possible, and use the ternary operators instead.