PVA VPU compiler#

PVA SDK uses the Synopsys CHESS compiler to build VPU code. The CHESS compiler has two front ends.

  • The CHESS front end (noodle) is the legacy front end. It supports the C language and limited C++ features, such as references, function and operator overloading, etc.

  • The LLVM front end is a new CLANG based front end. It conforms new C and C++ language standards. It supports C++17 with one restriction: C++ exceptions are not supported.

LLVM Migration Guide#

Function Inlining Control#

LLVM front-end supports automatic function inlining. It may decide not to inline a function with an inline specifier or to inline a non-inline function. There are three ways to influence the LLVM front end to control function inlining:

  • Force the user to inline a function using the always_inline annotation.

    [[gnu::always_inline]] inline int foo(int x)
    { ... }
    
  • Instruct the LLVM front-end to never inline a function using noinline annotation.

    [[gnu::noinline]] int foo(int x)
    { ... }
    

Memory Fence#

LLVM front-end can perform more aggressive optimizations than the CHESS front-end, especially for those inline functions. In some cases, the LLVM front-end may optimize away some VMEM store instructions; e.g., it fails to detect data dependencies between VMEM store and gpo_set instructions. To resolve such an issue, it is recommended to insert a chess_memory_fence before gpo_set instruction.

Prevent Breaking-Up Load and AGEN Restore Operations#

The LLVM front-end may perform aggressive optimization and break-up the load and AGEN restore operations apart, which causes compilation to fail. To resolve this issue, it is recommended to use the chess_protect_access annotation to protect the accesses on AGEN variables.

chess_protect_access agen_A in1 = init_agen_A_from_cfg(agens.cfg[0]);

Vector Demotion Operation with Interleaving#

LLVM front-end doesn’t support passing .lo and .hi fields of a XARF (extended accumulator register file) double vector type by reference for some intrinsics; e.g., vdemote_i from VRF to XARF.

void vdemote_i(vintx src1, vintx src2, xvshortx &dst);

To workaround this issue, it is recommended to declare a single-vector XARF variable, and then pass it to the vdemote_i.

dvintx a;
xvshortx b;
vdemote_i(a.lo, a.hi, b);

Function Name Mangling#

The Chess linker (bridge) uses a BCF file to control linking and memory layout for VPU ELFs. VPU developers are allowed to create a custom BCF to control VPU code layout for performance reasons, which requires to use mangled function names.

The noodle front-end uses CHESS function name mangling scheme for function overloading support. By default, this name mangling scheme is enabled in PVA SDK. With LLVM front-end, the standard Itanium name mangling is used. To migrate an existing VPU kernel with a custom BCF file to LLVM front-end, a user needs to convert all mangled function names in the BCF to LLVM C++ symbol names. PVA SDK provides a Python script to facilitate this conversion process.

pvasdkConvertToLLVMBCF.py

Assuming that you have a cuPVA device target foo, which has an exiting noodle BCF file foo.bcf. The following steps are required to use the script to convert a noodle BCF to a LLVM BCF.

  1. Copy the foo.bcf to foo_noodle.bcf.

  2. Remove/disable all the lines started with _symbol in foo.bcf.

  3. Build the foo target to generate foo lst file, which is needed in step 4.

  4. Run pvasdkConvertToLLVMBCF.py to convert noodle symbol names to C++ symbol names.

    pvasdkConvertToLLVMBCF.py -i foo_noodle.bcf -l <FOO_LST_FILE> -o foo.bcf
    
  5. Rebuild foo target with LLVM front-end and the converted foo.bcf.

Note

Chess name mangling is a Synopsys proprietary name mangling scheme. The pvasdkConvertToLLVMBCF.py script may not handle all the mangling cases.

Performance Optimization Tips#

The CHESS compiler provides chess_guard annotation for predicated execution. This annotation is currently not supported by LLVM front-end. The LLVM front-end does not try to map the operations inside the if-statements with chess_guard onto guarded instructions, instead it turns the if-statements into real branches using jumps, which can adversely impact performance, especially when the if-statements are inside a loop.

To resolve such an issue, the user can change the code to remove the if-statements if possible, and use the ternary operators instead.