NvSci Interoperability#

The NVIDIA NvStreams SDK that includes the NvSciBuf and NvSciSync libraries provide a common way of exchanging data between various NVIDIA APIs used to program different hardware blocks.

These libraries allow resources to be allocated up front with access restrictions defined by the application(s), which is a vital functionality for safety-critical systems. NvSci libraries allow resources to be exchanged between various other libraries that otherwise do not have knowledge of each other.

NvSciBuf allows applications to allocate and exchange buffers in memory. NvSciSync allows applications to manage synchronization objects that coordinate when sequence of operations begin and end.

The cuPVA runtime provides NvSci Interoperability APIs to harmoniously utilize PVA in a complex application that involves multiple NVIDIA APIs. NvSciSync and NvSciBuf objects may be imported into cuPVA to allow synchronization and data sharing between PVA engine and other engines.

We re-use the same contrast stretching example we described in the previous tutorial. The first stage of the algorithm, which involves histogram computation to determine the input dynamic range of the image pixels, is performed by a separate CPU thread. The subsequent pixel processing stage, which stretches the image contrast, is be mapped to the PVA.


Host Code#

  1. The main function starts by initializing the image buffer and parameter pointers and NvSci variables to nullptr.

    int main(int argc, char **argv)
    {
        int err = 0;
        if (GetAssetsDirectory(argc, argv, assetsDirectory, MAX_IMAGE_PATH_LENGTH) != 0)
        {
            return 1;
        }
    
        NvSciSyncModule nvSciSyncModule        = nullptr;
        NvSciBufModule nvSciBufModule          = nullptr;
        NvSciSyncCpuWaitContext cpuWaitContext = nullptr;
    
        NvSciBufObj image_nvsci = nullptr;
        uint8_t *image_cpu      = nullptr;
        uint8_t *image_pva      = nullptr;
    
        NvSciBufObj stretchParams_nvsci          = nullptr;
        ContrastStretchParams *stretchParams_cpu = nullptr;
        ContrastStretchParams *stretchParams_pva = nullptr;
    
        NvSciSyncObj cpuStartSyncObj_nvsci      = nullptr;
        NvSciSyncObj cpuCompletionSyncObj_nvsci = nullptr;
        NvSciSyncFence cpuStartFence_nvsci      = NvSciSyncFenceInitializer;
        NvSciSyncFence cpuCompletionFence_nvsci = NvSciSyncFenceInitializer;
    
        try
        {
    
  2. NvSciBuf and NvSciSync modules should be opened before invoking NvSciBuf and NvSciSync APIs. Modules represent the corresponding library’s instance created for the application and act as containers for other NvSciBuf and NvSciSync resources. Creating an NvSciSync CPU wait context is also required to enable waiting on NvSciSync fences within CPU threads.

            NVSCI_CALL(NvSciSyncModuleOpen(&nvSciSyncModule));
            NVSCI_CALL(NvSciSyncCpuWaitContextAlloc(nvSciSyncModule, &cpuWaitContext));
            NVSCI_CALL(NvSciBufModuleOpen(&nvSciBufModule));
    
  3. NvSciBufObj instances that hold the image buffer and algorithm parameters are created in this step.

            CreateNvSciBuf(&image_nvsci, nvSciBufModule, IMAGE_SIZE * sizeof(uint8_t));
            CreateNvSciBuf(&stretchParams_nvsci, nvSciBufModule, sizeof(ContrastStretchParams));
    

    A buffer attribute list should be created and set for each accessor of the NvSci buffer. For instance, if two or more hardware engines want to access a common buffer (e.g., one engine is writing data into the buffer and the other engine is reading from the buffer), an attribute list should be created for each engine.

    Buffer type and size attributes should be set for both CPU and PVA. cuPVA supports importing NvSciBuf types of Image, RawBuffer and Tensor. NvSciBufType_RawBuffer is used in this example since our image data has a single color plane and has pitch linear layout. cupva::nvsci::mem::FillAttributes() API fills the PVA specific attributes. NeedCpuAccess and RequiredPerm attributes should also be set for the CPU to access the created buffer.

    The image buffer is created with the NvSci NvSciBufObjAlloc() API call and should be freed using the NvSciBufObjFree() API at the end. You may refer to NvSciBuf library documentation for further details.

    void CreateNvSciBuf(NvSciBufObj *bufobj, NvSciBufModule sciBufModule, int64_t size)
    {
        NvSciBufAttrList pvaAttrList          = NULL;
        NvSciBufAttrList cpuAttrList          = NULL;
        NvSciBufAttrList unreconciledLists[2] = {NULL};
        NvSciBufAttrList reconciled_attrlist  = NULL;
        NvSciBufAttrList conflictattrlist     = NULL;
        NvSciBufAttrValAccessPerm access_perm = NvSciBufAccessPerm_ReadWrite;
        NvSciBufType bufTypes[]               = {NvSciBufType_RawBuffer};
        bool cpu_access                       = true;
    
        // Setup CPU attrlist
        NvSciBufAttrKeyValuePair cpu_attr_kvp[] = {{NvSciBufGeneralAttrKey_Types, &bufTypes, sizeof(bufTypes)},
                                                   {NvSciBufRawBufferAttrKey_Size, &size, sizeof(size)},
                                                   {NvSciBufGeneralAttrKey_RequiredPerm, &access_perm, sizeof(access_perm)},
                                                   {NvSciBufGeneralAttrKey_NeedCpuAccess, &cpu_access, sizeof(cpu_access)}};
        const size_t num_cpu_kvp                = 4;
        NVSCI_CALL(NvSciBufAttrListCreate(sciBufModule, &cpuAttrList));
        NVSCI_CALL(NvSciBufAttrListSetAttrs(cpuAttrList, cpu_attr_kvp, num_cpu_kvp));
    
        // Setup PVA attrlist
        NvSciBufAttrKeyValuePair pva_attr_kvp[] = {{NvSciBufGeneralAttrKey_Types, (void *)&bufTypes, sizeof(bufTypes)},
                                                   {NvSciBufRawBufferAttrKey_Size, (void *)&size, sizeof(size)}};
        const size_t num_pva_kvp                = 2;
        NVSCI_CALL(NvSciBufAttrListCreate(sciBufModule, &pvaAttrList));
        NVSCI_CALL(NvSciBufAttrListSetAttrs(pvaAttrList, pva_attr_kvp, num_pva_kvp));
    
        // cupva call to signal that the buffer will be used by PVA
        cupva::nvsci::mem::FillAttributes(pvaAttrList);
    
        unreconciledLists[0] = pvaAttrList;
        unreconciledLists[1] = cpuAttrList;
        NVSCI_CALL(NvSciBufAttrListReconcile(unreconciledLists, 2U, &reconciled_attrlist, &conflictattrlist));
        NVSCI_CALL(NvSciBufObjAlloc(reconciled_attrlist, bufobj));
    
        NvSciBufAttrListFree(pvaAttrList);
        NvSciBufAttrListFree(cpuAttrList);
        NvSciBufAttrListFree(reconciled_attrlist);
        NvSciBufAttrListFree(conflictattrlist);
    };
    
  4. cuPVA device pointers for the image buffer and algorithm parameters structure are created from the NvSciBufObj instances using the cupva::nvsci::mem::Import() API call. Importing a NvSci buffer creates a mapping which must be released by calling cupva::mem::Free() when the application has finished using the buffer.

            image_pva = (uint8_t *)cupva::nvsci::mem::Import(image_nvsci);
            image_cpu = (uint8_t *)mem::GetHostPointer(image_pva);
    
            stretchParams_pva = (ContrastStretchParams *)cupva::nvsci::mem::Import(stretchParams_nvsci);
            stretchParams_cpu = (ContrastStretchParams *)mem::GetHostPointer(stretchParams_pva);
    
  5. The image buffer is filled and the algorithm parameters are initialized in this step.

            if (ReadImageBuffer(inputImageName.c_str(), assetsDirectory, image_cpu, IMAGE_SIZE) != 0)
            {
                err = -1;
                throw std::runtime_error("Cannot read input image");
            }
    
            ContrastStretchParams algParams = {
                .inputLowPixelValue           = 0,
                .outputLowPixelValue          = 0,
                .inputHighPixelValue          = 0,
                .outputHighPixelValue         = 255,
                .saturationHistogramCountLow  = IMAGE_SIZE * SATURATED_PIXEL_PERCENTAGE_LOW_INTENSITY / 100,
                .saturationHistogramCountHigh = IMAGE_SIZE * SATURATED_PIXEL_PERCENTAGE_HIGH_INTENSITY / 100};
    
            memcpy(stretchParams_cpu, &algParams, sizeof(ContrastStretchParams));
    
  6. Synchronization of the CPU and PVA tasks is achieved using NvSciSync objects and fences. APIs for signaling the created NvSciSync objects and waiting on NvSciSync fences is demonstrated in the following steps.

            CreateNvSciSyncObj(&cpuStartSyncObj_nvsci, nvSciSyncModule, cupva::SyncClientType::WAITER,
                               NvSciSyncAccessPerm_WaitSignal);
            NVSCI_CALL(NvSciSyncObjGenerateFence(cpuStartSyncObj_nvsci, &cpuStartFence_nvsci));
    
            CreateNvSciSyncObj(&cpuCompletionSyncObj_nvsci, nvSciSyncModule, cupva::SyncClientType::WAITER,
                               NvSciSyncAccessPerm_SignalOnly);
            NVSCI_CALL(NvSciSyncObjGenerateFence(cpuCompletionSyncObj_nvsci, &cpuCompletionFence_nvsci));
    

    NvSciSync clients must supply the properties and constraints of an NvSciSync object to NvSciSync before allocating the object. This is expressed with attributes. An attribute is a key - value pair. Each application wanting to use a sync object indicates its needs in the form of various attributes before the sync object is created. The cupva::nvsci::FillAttributes() API fills the PVA-specific attributes of the list.

    The NvSciSyncObjAlloc() NvSci API call creates the NvSciSyncObj instance. The instance should be freed with the NvSciSyncObjFree() call at the end. You may refer to the NvSciSync library documentation for further details.

    void CreateNvSciSyncObj(NvSciSyncObj *syncObj, NvSciSyncModule sciSyncModule, cupva::SyncClientType pvaPerm,
                            NvSciSyncAccessPerm cpuPerm)
    {
        NvSciSyncAttrList cpuAttrList          = nullptr;
        NvSciSyncAttrList pvaAttrList          = nullptr;
        NvSciSyncAttrList unreconciledLists[2] = {NULL};
        NvSciSyncAttrKeyValuePair keyValues[2];
        bool cpuAccess                    = true;
        NvSciSyncAttrList reconciledList  = nullptr;
        NvSciSyncAttrList newConflictList = nullptr;
        // Create NvSciSyncAttrList for cpu & pva
        NVSCI_CALL(NvSciSyncAttrListCreate(sciSyncModule, &cpuAttrList));
        NVSCI_CALL(NvSciSyncAttrListCreate(sciSyncModule, &pvaAttrList));
    
        // Setup CPU list
        keyValues[0].attrKey = NvSciSyncAttrKey_NeedCpuAccess;
        keyValues[0].value   = (void *)&cpuAccess;
        keyValues[0].len     = sizeof(cpuAccess);
        keyValues[1].attrKey = NvSciSyncAttrKey_RequiredPerm;
        keyValues[1].value   = (void *)&cpuPerm;
        keyValues[1].len     = sizeof(cpuPerm);
        NVSCI_CALL(NvSciSyncAttrListSetAttrs(cpuAttrList, keyValues, 2));
    
        // Fill PVA list with cupva API
        cupva::nvsci::FillAttributes(pvaAttrList, pvaPerm);
    
        // Reconcile cpu Signaler and pva waiter NvSciSyncAttrList
        unreconciledLists[0] = cpuAttrList;
        unreconciledLists[1] = pvaAttrList;
        NVSCI_CALL(NvSciSyncAttrListReconcile(unreconciledLists, 2, &reconciledList, &newConflictList));
    
        // Create NvSciSync object and get the syncObj
        NVSCI_CALL(NvSciSyncObjAlloc(reconciledList, syncObj));
    
        NvSciSyncAttrListFree(pvaAttrList);
        NvSciSyncAttrListFree(cpuAttrList);
        NvSciSyncAttrListFree(reconciledList);
        NvSciSyncAttrListFree(newConflictList);
    };
    
  7. A cuPVA SyncObj instance is imported from the cpuCompletionSyncObj_nvsci using the cupva::nvsci::Import() cuPVA API call. We use the imported SyncObj to create a cuPVA Fence. The cupva::Fence object is filled from the NvSciSyncFence object using the cupva::nvsci::Import() function. The Fence object must have been created with a SyncObj which has been imported from the same NvSciSyncObj used to create the NvSciSyncFence.

            cupva::SyncObj cpuCompletionSyncObj_pva = cupva::nvsci::Import(cpuCompletionSyncObj_nvsci);
            cupva::Fence cpuCompletionFence_pva(cpuCompletionSyncObj_pva);
            cupva::nvsci::Import(cpuCompletionFence_pva, cpuCompletionFence_nvsci);
    
  8. The PVA program that performs image contrast stretching is created. The steps involving the CmdProgram initialization are similar to the previous tutorials that use the contrast stretching example. The imported image buffer and algorithm parameter structure device pointers are used as the inputs for the program. We also create a cuPVA Stream to submit the program and the synchronization commands.

            SyncObj pvaCompletionSyncObj_pva = SyncObj::Create();
            cupva::Fence pvaCompletionFence_pva(pvaCompletionSyncObj_pva);
    
            Executable execContrastStretch =
                Executable::Create(PVA_EXECUTABLE_DATA(nvsci_interoperability_contrast_stretch_dev),
                                   PVA_EXECUTABLE_SIZE(nvsci_interoperability_contrast_stretch_dev));
    
            CmdProgram progContrastStretch = CreateContrastStretchProg(
                execContrastStretch, image_pva, IMAGE_WIDTH, IMAGE_HEIGHT, TILE_WIDTH, TILE_HEIGHT, stretchParams_pva);
    
            Stream cupvaStream = Stream::Create();
    
  9. The CPU thread that carries out image dynamic range computation is launched.

            LaunchCpuThreadComputeDynamicRange(image_nvsci, stretchParams_nvsci, IMAGE_SIZE, &cpuStartFence_nvsci,
                                               &cpuWaitContext, &cpuCompletionSyncObj_nvsci);
    

    The thread is blocked until cpuStartSyncObj_nvsci is signaled.

    void LaunchCpuThreadComputeDynamicRange(NvSciBufObj image_nvsci, NvSciBufObj contrastStretchParams_nvsci,
                                            int32_t imageSize, NvSciSyncFence *cpuStartFence_nvsci,
                                            NvSciSyncCpuWaitContext *cpuWaitContext,
                                            NvSciSyncObj *cpuCompletionSyncObj_nvsci)
    {
        computeDynamicRangeThread =
            new std::thread(ComputeDynamicRangeCpu, image_nvsci, contrastStretchParams_nvsci, imageSize,
                            cpuStartFence_nvsci, cpuWaitContext, cpuCompletionSyncObj_nvsci);
    }
    
  10. progContrastStretch is submitted to the cuPVA stream after the cmdWaitCpuCompletion command. Therefore, the PVA task starts after the completion of the first stage running on CPU. Execution of the cmdSignalPvaCompletion command signals that the enhanced image is ready.

            cupva::CmdWaitOnFences cmdWaitCpuCompletion(cpuCompletionFence_pva);
            cupva::CmdRequestFences cmdSignalPvaCompletion(pvaCompletionFence_pva);
            cupvaStream.submit({&cmdWaitCpuCompletion, &progContrastStretch, &cmdSignalPvaCompletion});
    
  11. The cpuStartFence_nvsci fence, which the CPU thread is waiting on, is triggered by signaling the cpuStartSyncObj_nvsci with the NvSciSyncObjSignal() API call.

            NVSCI_CALL(NvSciSyncObjSignal(cpuStartSyncObj_nvsci));
    

    Once the CPU computes the image dynamic range, it unblocks the PVA task by signaling the cpuCompletionSyncObj_nvsci.

    void ComputeDynamicRangeCpu(NvSciBufObj image_nvsci, NvSciBufObj contrastStretchParams_nvsci, int32_t imageSize,
                                NvSciSyncFence *cpuStartFence_nvsci, NvSciSyncCpuWaitContext *cpuWaitContext,
                                NvSciSyncObj *cpuCompletionSyncObj_nvsci)
    {
        NVSCI_CALL(NvSciSyncFenceWait(cpuStartFence_nvsci, *cpuWaitContext, -1));
    
        uint8_t *image_cpu;
        NVSCI_CALL(NvSciBufObjGetCpuPtr(image_nvsci, (void **)&image_cpu));
    
        ContrastStretchParams *contrastStretchParams_cpu;
        NVSCI_CALL(NvSciBufObjGetCpuPtr(contrastStretchParams_nvsci, (void **)&contrastStretchParams_cpu));
    
        // Initialize histogram buffer
        int32_t histogram[BIN_COUNT];
        memset(histogram, 0, BIN_COUNT * sizeof(int32_t));
    
        // Compute histogram
        for (int32_t i = 0; i < imageSize; i++)
        {
            histogram[image_cpu[i]]++;
        }
    
        // Compute dynamic range
        uint8_t inputLowPixelValue = 0;
        int32_t pixelCount         = 0;
        while (inputLowPixelValue < 255 && pixelCount < contrastStretchParams_cpu->saturationHistogramCountLow)
        {
            pixelCount += histogram[inputLowPixelValue];
            inputLowPixelValue++;
        }
        inputLowPixelValue--;
    
        uint8_t inputHighPixelValue = 255;
        pixelCount                  = 0;
        while (inputHighPixelValue > inputLowPixelValue &&
               pixelCount < contrastStretchParams_cpu->saturationHistogramCountHigh)
        {
            pixelCount += histogram[inputHighPixelValue];
            inputHighPixelValue--;
        }
        inputHighPixelValue++;
    
        contrastStretchParams_cpu->inputLowPixelValue  = inputLowPixelValue;
        contrastStretchParams_cpu->inputHighPixelValue = inputHighPixelValue;
    
        NVSCI_CALL(NvSciSyncObjSignal(*cpuCompletionSyncObj_nvsci));
    }
    
  12. We wait until the second stage of the algorithm is completed and call join on the computDynamicRangeThread running on the CPU.

            pvaCompletionFence_pva.wait();
    
            JoinCpuThreadComputeDynamicRange();
    
  13. The enhanced image is written to the output file and allocated resources are freed in the last step.

            if (WriteImageBuffer(outputImageName.c_str(), ".", image_cpu, IMAGE_SIZE) != 0)
            {
                err = -1;
            }
        }
        catch (cupva::Exception const &e)
        {
            std::cout << "Caught a cuPVA exception with message: " << e.what() << std::endl;
            err = 1;
        }
        catch (const std::runtime_error &e)
        {
            std::cout << "Caught a NvSci exception with message: " << e.what() << std::endl;
        }
        mem::Free(image_pva);
        NvSciBufObjFree(image_nvsci);
        mem::Free(stretchParams_pva);
        NvSciBufObjFree(stretchParams_nvsci);
    
        NvSciSyncFenceClear(&cpuStartFence_nvsci);
        NvSciSyncFenceClear(&cpuCompletionFence_nvsci);
    
        NvSciSyncObjFree(cpuStartSyncObj_nvsci);
        NvSciSyncObjFree(cpuCompletionSyncObj_nvsci);
    
        NvSciSyncCpuWaitContextFree(cpuWaitContext);
        NvSciSyncModuleClose(nvSciSyncModule);
        NvSciBufModuleClose(nvSciBufModule);
        return err;
    }
    
  1. The main function starts by initializing the image buffer and parameter pointers and NvSci variables to NULL.

    int main(int argc, char **argv)
    {
        int32_t err = 0;
    
        if (GetAssetsDirectory(argc, argv, assetsDirectory, MAX_IMAGE_PATH_LENGTH) != 0)
        {
            return 1;
        }
    
        NvSciSyncModule nvSciSyncModule        = NULL;
        NvSciBufModule nvSciBufModule          = NULL;
        NvSciSyncCpuWaitContext cpuWaitContext = NULL;
    
        NvSciBufObj image_nvsci = NULL;
        uint8_t *image_cpu      = NULL;
        uint8_t *image_pva      = NULL;
    
        NvSciBufObj stretchParams_nvsci          = NULL;
        ContrastStretchParams *stretchParams_cpu = NULL;
        ContrastStretchParams *stretchParams_pva = NULL;
    
        NvSciSyncObj cpuStartSyncObj_nvsci      = NULL;
        NvSciSyncObj cpuCompletionSyncObj_nvsci = NULL;
        NvSciSyncFence cpuStartFence_nvsci      = NvSciSyncFenceInitializer;
        NvSciSyncFence cpuCompletionFence_nvsci = NvSciSyncFenceInitializer;
    
  2. NvSciBuf and NvSciSync modules should be opened before invoking NvSciBuf and NvSciSync APIs. Modules represent the corresponding library’s instance created for the application and act as containers for other NvSciBuf and NvSciSync resources. Creating an NvSciSync CPU wait context is also required to enable waiting on NvSciSync fences within CPU threads.

        NVSCI_CALL(NvSciSyncModuleOpen(&nvSciSyncModule), err, MemAllocFailed);
        NVSCI_CALL(NvSciSyncCpuWaitContextAlloc(nvSciSyncModule, &cpuWaitContext), err, MemAllocFailed);
        NVSCI_CALL(NvSciBufModuleOpen(&nvSciBufModule), err, MemAllocFailed);
    
  3. NvSciBufObj instances that hold the image buffer and algorithm parameters are created in this step.

        NVSCI_CALL(CreateNvSciBuf(&image_nvsci, nvSciBufModule, IMAGE_SIZE * sizeof(uint8_t)), err, MemAllocFailed);
        NVSCI_CALL(CreateNvSciBuf(&stretchParams_nvsci, nvSciBufModule, sizeof(ContrastStretchParams)), err,
                   MemAllocFailed);
    

    A buffer attribute list should be created and set for each accessor of the NvSci buffer. For instance, if two or more hardware engines want to access a common buffer (e.g., one engine is writing data into the buffer and the other engine is reading from the buffer), an attribute list should be created for each engine.

    Buffer type and size attributes should be set for both CPU and PVA. cuPVA supports importing NvSciBuf types of Image, RawBuffer and Tensor. NvSciBufType_RawBuffer is used in this example since our image data has a single color plane and has pitch linear layout. CupvaMemFillAttributes() API fills the PVA specific attributes. NeedCpuAccess and RequiredPerm attributes should also be set for the CPU to access the created buffer.

    Image buffer is created with the NvSci NvSciBufObjAlloc() API call and should be freed using the NvSciBufObjFree() API at the end. You may refer to the NvSciBuf library documentation for further details.

    NvSciError CreateNvSciBuf(NvSciBufObj *bufobj, NvSciBufModule sciBufModule, int64_t size)
    {
        NvSciError err                        = 0;
        NvSciBufAttrList pvaAttrList          = NULL;
        NvSciBufAttrList cpuAttrList          = NULL;
        NvSciBufAttrList unreconciledLists[2] = {NULL};
        NvSciBufAttrList reconciled_attrlist  = NULL;
        NvSciBufAttrList conflictattrlist     = NULL;
        NvSciBufAttrValAccessPerm access_perm = NvSciBufAccessPerm_ReadWrite;
        NvSciBufType bufTypes[]               = {NvSciBufType_RawBuffer};
        bool cpu_access                       = true;
    
        // Setup CPU attrlist
        NvSciBufAttrKeyValuePair cpu_attr_kvp[] = {{NvSciBufGeneralAttrKey_Types, &bufTypes, sizeof(bufTypes)},
                                                   {NvSciBufRawBufferAttrKey_Size, &size, sizeof(size)},
                                                   {NvSciBufGeneralAttrKey_RequiredPerm, &access_perm, sizeof(access_perm)},
                                                   {NvSciBufGeneralAttrKey_NeedCpuAccess, &cpu_access, sizeof(cpu_access)}};
        const size_t num_cpu_kvp                = 4;
        NVSCI_CALL(NvSciBufAttrListCreate(sciBufModule, &cpuAttrList), err, CreateNvSciBufFailed);
        NVSCI_CALL(NvSciBufAttrListSetAttrs(cpuAttrList, cpu_attr_kvp, num_cpu_kvp), err, CreateNvSciBufFailed);
    
        // Setup PVA attrlist
        NvSciBufAttrKeyValuePair pva_attr_kvp[] = {{NvSciBufGeneralAttrKey_Types, (void *)&bufTypes, sizeof(bufTypes)},
                                                   {NvSciBufRawBufferAttrKey_Size, (void *)&size, sizeof(size)}};
        const size_t num_pva_kvp                = 2;
        NVSCI_CALL(NvSciBufAttrListCreate(sciBufModule, &pvaAttrList), err, CreateNvSciBufFailed);
        NVSCI_CALL(NvSciBufAttrListSetAttrs(pvaAttrList, pva_attr_kvp, num_pva_kvp), err, CreateNvSciBufFailed);
    
        // cupva call to signal that the buffer will be used by PVA
        CupvaMemFillAttributes(pvaAttrList);
    
        unreconciledLists[0] = pvaAttrList;
        unreconciledLists[1] = cpuAttrList;
        NVSCI_CALL(NvSciBufAttrListReconcile(unreconciledLists, 2U, &reconciled_attrlist, &conflictattrlist), err,
                   CreateNvSciBufFailed);
        NVSCI_CALL(NvSciBufObjAlloc(reconciled_attrlist, bufobj), err, CreateNvSciBufFailed);
    CreateNvSciBufFailed:
        NvSciBufAttrListFree(pvaAttrList);
        NvSciBufAttrListFree(cpuAttrList);
        NvSciBufAttrListFree(reconciled_attrlist);
        NvSciBufAttrListFree(conflictattrlist);
        return err;
    };
    
  4. cuPVA device pointers for the image buffer and algorithm parameters structure are created from the NvSciBufObj instances using the CupvaMemImport() API call. Importing a NvSci buffer creates a mapping which must be released by calling CupvaMemFree() when the application has finished using the buffer.

        CHECK_ERROR_GOTO(CupvaMemImport((void **)&image_pva, image_nvsci, CUPVA_READ_WRITE), err, MemAllocFailed);
        CHECK_ERROR_GOTO(CupvaMemGetHostPointer((void **)&image_cpu, (void *)image_pva), err, MemAllocFailed);
    
        CHECK_ERROR_GOTO(CupvaMemImport((void **)&stretchParams_pva, stretchParams_nvsci, CUPVA_READ_WRITE), err,
                         MemAllocFailed);
        CHECK_ERROR_GOTO(CupvaMemGetHostPointer((void **)&stretchParams_cpu, (void *)stretchParams_pva), err,
                         MemAllocFailed);
    
  5. The image buffer is filled and the algorithm parameters are initialized in this step.

        if (ReadImageBuffer(INPUT_IMAGE_NAME, assetsDirectory, image_cpu, IMAGE_SIZE) != 0)
        {
            err = -1;
            goto MemAllocFailed;
        }
    
        ContrastStretchParams algParams = {
            .inputLowPixelValue           = 0,
            .outputLowPixelValue          = 0,
            .inputHighPixelValue          = 0,
            .outputHighPixelValue         = 255,
            .saturationHistogramCountLow  = IMAGE_SIZE * SATURATED_PIXEL_PERCENTAGE_LOW_INTENSITY / 100,
            .saturationHistogramCountHigh = IMAGE_SIZE * SATURATED_PIXEL_PERCENTAGE_HIGH_INTENSITY / 100};
    
        memcpy(stretchParams_cpu, &algParams, sizeof(ContrastStretchParams));
    
  6. Synchronization of the CPU and PVA tasks is achieved using NvSciSync objects and fences. APIs for signaling the created NvSciSync objects and waiting on NvSciSync fences is demonstrated in the following steps.

        NVSCI_CALL(
            CreateNvSciSyncObj(&cpuStartSyncObj_nvsci, nvSciSyncModule, CUPVA_WAITER, NvSciSyncAccessPerm_WaitSignal), err,
            SyncObjCreateFailed);
        NVSCI_CALL(NvSciSyncObjGenerateFence(cpuStartSyncObj_nvsci, &cpuStartFence_nvsci), err, SyncObjCreateFailed);
    
        NVSCI_CALL(
            CreateNvSciSyncObj(&cpuCompletionSyncObj_nvsci, nvSciSyncModule, CUPVA_WAITER, NvSciSyncAccessPerm_SignalOnly),
            err, SyncObjCreateFailed);
        NVSCI_CALL(NvSciSyncObjGenerateFence(cpuCompletionSyncObj_nvsci, &cpuCompletionFence_nvsci), err,
                   SyncObjCreateFailed);
    

    NvSciSync clients must supply the properties and constraints of an NvSciSync object to NvSciSync before allocating the object. This is expressed with attributes. An attribute is a key - value pair. Each application wanting to use a sync object indicates its needs in the form of various attributes before the sync object is created. The CupvaSyncObjFillAttributes() API fills PVA-specific attributes of the list.

    The NvSciSyncObjAlloc() NvSci API call creates the NvSciSyncObj instance. The instance should be freed with the NvSciSyncObjFree() call at the end. You may refer to the NvSciSync library documentation for further details.

    NvSciError CreateNvSciSyncObj(NvSciSyncObj *syncObj, NvSciSyncModule sciSyncModule, cupvaSyncClientType_t pvaPerm,
                                  NvSciSyncAccessPerm cpuPerm)
    {
        NvSciError err                         = 0;
        NvSciSyncAttrList cpuAttrList          = NULL;
        NvSciSyncAttrList pvaAttrList          = NULL;
        NvSciSyncAttrList unreconciledLists[2] = {NULL};
        NvSciSyncAttrKeyValuePair keyValues[2];
        bool cpuAccess                    = true;
        NvSciSyncAttrList reconciledList  = NULL;
        NvSciSyncAttrList newConflictList = NULL;
        // Create NvSciSyncAttrList for cpu & pva
        NVSCI_CALL(NvSciSyncAttrListCreate(sciSyncModule, &cpuAttrList), err, CreateNvSciSyncObjFailed);
        NVSCI_CALL(NvSciSyncAttrListCreate(sciSyncModule, &pvaAttrList), err, CreateNvSciSyncObjFailed);
    
        // Setup CPU list
        keyValues[0].attrKey = NvSciSyncAttrKey_NeedCpuAccess;
        keyValues[0].value   = (void *)&cpuAccess;
        keyValues[0].len     = sizeof(cpuAccess);
        keyValues[1].attrKey = NvSciSyncAttrKey_RequiredPerm;
        keyValues[1].value   = (void *)&cpuPerm;
        keyValues[1].len     = sizeof(cpuPerm);
        NVSCI_CALL(NvSciSyncAttrListSetAttrs(cpuAttrList, keyValues, 2), err, CreateNvSciSyncObjFailed);
    
        // Fill PVA list with cupva API
        CupvaSyncObjFillAttributes(&pvaAttrList, pvaPerm);
    
        // Reconcile cpu Signaler and pva waiter NvSciSyncAttrList
        unreconciledLists[0] = cpuAttrList;
        unreconciledLists[1] = pvaAttrList;
        NVSCI_CALL(NvSciSyncAttrListReconcile(unreconciledLists, 2, &reconciledList, &newConflictList), err,
                   CreateNvSciSyncObjFailed);
    
        // Create NvSciSync object and get the syncObj
        NVSCI_CALL(NvSciSyncObjAlloc(reconciledList, syncObj), err, CreateNvSciSyncObjFailed);
    CreateNvSciSyncObjFailed:
        NvSciSyncAttrListFree(pvaAttrList);
        NvSciSyncAttrListFree(cpuAttrList);
        NvSciSyncAttrListFree(reconciledList);
        NvSciSyncAttrListFree(newConflictList);
        return err;
    };
    
  7. A cuPVA SyncObj instance is imported from the cpuCompletionSyncObj_nvsci using the CupvaSyncObjCreateFromNvsci() cuPVA API call. We use the imported SyncObj to create a cuPVA Fence. The cuPVA Fence is filled from the NvSciSyncFence object using the CupvaFenceImport() function. The Fence object must be created with a SyncObj which has been imported from the same NvSciSyncObj used to create the NvSciSyncFence.

        cupvaSyncObj_t cpuCompletionSyncObj_pva;
        CHECK_ERROR_GOTO(CupvaSyncObjCreateFromNvsci(&cpuCompletionSyncObj_pva, cpuCompletionSyncObj_nvsci, CUPVA_WAITER),
                         err, SyncObjCreateFailed);
    
        cupvaFence_t cpuCompletionFence_pva;
        CHECK_ERROR_GOTO(CupvaFenceInit(&cpuCompletionFence_pva, cpuCompletionSyncObj_pva), err, SyncObjCreateFailed);
    
        CHECK_ERROR_GOTO(CupvaFenceImport(&cpuCompletionFence_pva, &cpuCompletionFence_nvsci), err, SyncObjCreateFailed);
    
  8. The PVA program that performs image contrast stretching is created. The steps involving the CmdProgram initialization are similar to the previous tutorials that use the contrast stretching example. Imported image buffer and algorithm parameter structure device pointers are used as the inputs for the program. We also create a cuPVA Stream to submit the program and the synchronization commands.

        cupvaSyncObj_t pvaCompletionSyncObj_pva;
        CHECK_ERROR_GOTO(CupvaSyncObjCreate(&pvaCompletionSyncObj_pva, false, CUPVA_SIGNALER_WAITER, CUPVA_SYNC_YIELD), err,
                         SyncObjCreateFailed);
    
        cupvaFence_t pvaCompletionFence_pva;
        CHECK_ERROR_GOTO(CupvaFenceInit(&pvaCompletionFence_pva, pvaCompletionSyncObj_pva), err, ExecutableCreateFailed);
    
        cupvaExecutable_t execContrastStretch;
        CHECK_ERROR_GOTO(
            CupvaExecutableCreate(&execContrastStretch,
                                  PVA_EXECUTABLE_DATA(nvsci_interoperability_contrast_stretch_dev),
                                  PVA_EXECUTABLE_SIZE(nvsci_interoperability_contrast_stretch_dev)),
            err, ExecutableCreateFailed);
    
        int32_t createdCmdProgramCount = 0;
        cupvaCmd_t progContrastStretch;
        CHECK_ERROR_GOTO(
            CreateContrastStretchProg(&progContrastStretch, &execContrastStretch, image_pva, IMAGE_WIDTH, IMAGE_HEIGHT,
                                      TILE_WIDTH, TILE_HEIGHT, stretchParams_pva, &createdCmdProgramCount),
            err, CmdProgramCreateFailed);
    
        cupvaStream_t stream;
        CHECK_ERROR_GOTO(CupvaStreamCreate(&stream, CUPVA_PVA0, CUPVA_VPU_ANY), err, StreamCreateFailed);
    
  9. The CPU thread that carries out image dynamic range computation is launched.

        LaunchCpuThreadComputeDynamicRange(image_nvsci, stretchParams_nvsci, IMAGE_SIZE, &cpuStartFence_nvsci,
                                           &cpuWaitContext, &cpuCompletionSyncObj_nvsci);
    

    The thread is blocked until cpuStartSyncObj_nvsci is signaled.

    void LaunchCpuThreadComputeDynamicRange(NvSciBufObj image_nvsci, NvSciBufObj contrastStretchParams_nvsci,
                                            int32_t imageSize, NvSciSyncFence *cpuStartFence_nvsci,
                                            NvSciSyncCpuWaitContext *cpuWaitContext,
                                            NvSciSyncObj *cpuCompletionSyncObj_nvsci)
    {
        computeDynamicRangeThread =
            new std::thread(ComputeDynamicRangeCpu, image_nvsci, contrastStretchParams_nvsci, imageSize,
                            cpuStartFence_nvsci, cpuWaitContext, cpuCompletionSyncObj_nvsci);
    }
    
  10. progContrastStretch is submitted to the cuPVA stream after the cmdWaitCpuCompletion command. Therefore, the PVA task starts after the completion of the first stage running on CPU. Execution of the cmdSignalPvaCompletion command signals that the enhanced image is ready.

        cupvaCmd_t cmdWaitCpuCompletion;
        CHECK_ERROR_GOTO(CupvaCmdWaitOnFencesInit(&cmdWaitCpuCompletion, &cpuCompletionFence_pva, 1), err,
                         DeallocateAllResources);
    
        cupvaCmd_t cmdSignalPvaCompletion;
        CHECK_ERROR_GOTO(CupvaCmdRequestFencesInit(&cmdSignalPvaCompletion, &pvaCompletionFence_pva, 1), err,
                         DeallocateAllResources);
    
        cupvaCmd_t const *cmd[3] = {&cmdWaitCpuCompletion, &progContrastStretch, &cmdSignalPvaCompletion};
        CHECK_ERROR_GOTO(CupvaStreamSubmit(stream, cmd, NULL, 3, CUPVA_IN_ORDER, -1, -1), err, DeallocateAllResources);
    
  11. The cpuStartFence_nvsci fence, which the CPU thread is waiting on, is triggered by signaling the cpuStartSyncObj_nvsci with the NvSciSyncObjSignal() API call.

        NVSCI_CALL(NvSciSyncObjSignal(cpuStartSyncObj_nvsci), err, DeallocateAllResources);
    

    Once the CPU computes the image dynamic range, it unblocks the PVA task by signaling the cpuCompletionSyncObj_nvsci.

    void ComputeDynamicRangeCpu(NvSciBufObj image_nvsci, NvSciBufObj contrastStretchParams_nvsci, int32_t imageSize,
                                NvSciSyncFence *cpuStartFence_nvsci, NvSciSyncCpuWaitContext *cpuWaitContext,
                                NvSciSyncObj *cpuCompletionSyncObj_nvsci)
    {
        NVSCI_CALL(NvSciSyncFenceWait(cpuStartFence_nvsci, *cpuWaitContext, -1));
    
        uint8_t *image_cpu;
        NVSCI_CALL(NvSciBufObjGetCpuPtr(image_nvsci, (void **)&image_cpu));
    
        ContrastStretchParams *contrastStretchParams_cpu;
        NVSCI_CALL(NvSciBufObjGetCpuPtr(contrastStretchParams_nvsci, (void **)&contrastStretchParams_cpu));
    
        // Initialize histogram buffer
        int32_t histogram[BIN_COUNT];
        memset(histogram, 0, BIN_COUNT * sizeof(int32_t));
    
        // Compute histogram
        for (int32_t i = 0; i < imageSize; i++)
        {
            histogram[image_cpu[i]]++;
        }
    
        // Compute dynamic range
        uint8_t inputLowPixelValue = 0;
        int32_t pixelCount         = 0;
        while (inputLowPixelValue < 255 && pixelCount < contrastStretchParams_cpu->saturationHistogramCountLow)
        {
            pixelCount += histogram[inputLowPixelValue];
            inputLowPixelValue++;
        }
        inputLowPixelValue--;
    
        uint8_t inputHighPixelValue = 255;
        pixelCount                  = 0;
        while (inputHighPixelValue > inputLowPixelValue &&
               pixelCount < contrastStretchParams_cpu->saturationHistogramCountHigh)
        {
            pixelCount += histogram[inputHighPixelValue];
            inputHighPixelValue--;
        }
        inputHighPixelValue++;
    
        contrastStretchParams_cpu->inputLowPixelValue  = inputLowPixelValue;
        contrastStretchParams_cpu->inputHighPixelValue = inputHighPixelValue;
    
        NVSCI_CALL(NvSciSyncObjSignal(*cpuCompletionSyncObj_nvsci));
    }
    
  12. We wait until the second stage of the algorithm is completed and call join on the computDynamicRangeThread running on the CPU.

        CHECK_ERROR_GOTO(CupvaFenceWait(&pvaCompletionFence_pva, -1, NULL), err, DeallocateAllResources);
    
        JoinCpuThreadComputeDynamicRange();
    
  13. The enhanced image is written to the output file and allocated resources are freed in the last step.

        if (WriteImageBuffer(OUTPUT_IMAGE_NAME, ".", image_cpu, IMAGE_SIZE) != 0)
        {
            err = -1;
            goto DeallocateAllResources;
        }
    
    DeallocateAllResources:
        CupvaStreamDestroy(stream);
    StreamCreateFailed:
    CmdProgramCreateFailed:
        if (createdCmdProgramCount > 0)
        {
            CupvaCmdDestroy(&progContrastStretch);
        }
        CupvaExecutableDestroy(execContrastStretch);
    ExecutableCreateFailed:
        CupvaSyncObjDestroy(pvaCompletionSyncObj_pva);
    SyncObjCreateFailed:
        NvSciSyncFenceClear(&cpuCompletionFence_nvsci);
        NvSciSyncFenceClear(&cpuStartFence_nvsci);
        NvSciSyncObjFree(cpuStartSyncObj_nvsci);
        NvSciSyncObjFree(cpuCompletionSyncObj_nvsci);
    MemAllocFailed:
        CupvaMemFree(image_pva);
        NvSciBufObjFree(image_nvsci);
        CupvaMemFree(stretchParams_pva);
        NvSciBufObjFree(stretchParams_nvsci);
        NvSciBufModuleClose(nvSciBufModule);
        NvSciSyncCpuWaitContextFree(cpuWaitContext);
        NvSciSyncModuleClose(nvSciSyncModule);
        return err;
    }
    

Output#

NvSci libraries should be present in your build environment to build this tutorial. The tegralibs_example.cmake file provided within the tutorial source directory can be used to include required NvSci library targets into a CMake build through the use of environment variables. Rename this file to tegralibs-config.cmake, edit to your specifications, and point CMake to the location of the file by adding -Dtegralibs_DIR=<path> to your CMake command.

The path to the Tutorial assets directory containing the input image file low-contrast-kodim08-768x512-grayscale.data should be provided as an argument.

The enhanced image output file contrast-stretched-kodim08-768x512-grayscale.data is written to the current working directory.

$ ./nvsci_interoperability_cpp -a <Tutorial Assets Directory Path>
Read 393216 bytes from <Tutorial Assets Directory Path>/low-contrast-kodim08-768x512-grayscale.data
Wrote 393216 bytes to ./contrast-stretched-kodim08-768x512-grayscale.data
$ ./nvsci_interoperability_c -a <Tutorial Assets Directory Path>
Read 393216 bytes from <Tutorial Assets Directory Path>/low-contrast-kodim08-768x512-grayscale.data
Wrote 393216 bytes to ./contrast-stretched-kodim08-768x512-grayscale.data