What can I help you with?
NVIDIA BlueField Platform Software Troubleshooting Guide

DOCA PCC

This page is intended to assist developers, system administrators and users address common issues with the DOCA SDK PCC library.

Whether you are integrating the SDK into your application, troubleshooting development issues, or managing production deployment issues, this guide provides comprehensive support.

This page provides a collection of troubleshooting tips, solutions, and best practices for resolving issues with the DOCA PCC library. Each section focuses on specific problem categories, offering detailed steps and explanations to help you diagnose and resolve issues effectively.

The guide addresses various topics, including installation issues, configuration challenges, runtime errors, and performance optimizations. It also includes debugging, logging, and monitoring advice to enhance your understanding of the SDK's behavior and streamline issue diagnosis.

Effective logging and monitoring are essential for diagnosing issues and understanding the DOCA PCC library's behavior.

For best practices in logging and tracing, refer to the DOCA Debuggability documentation for host-side logging and the DOCA PCC for device-side logging, which offers optimized tracing for application data paths.

Additionally, the DOCA PCC Counters Tool can be used to display PCC-related hardware counters.

The DOCA PCC library includes support for the PPCC registers, providing a range of commands to configure and monitor PCC algorithms and their parameters and counters. For detailed information on these commands and options, please refer to Port Programmable Congestion Control Register.

This section addresses common scenarios that developers and users may face, offering step-by-step instructions to help resolve them.

Configuration Problems

Device Does Not Support PCC

If you are unable to start DOCA PCC context due to unsupported PCC, follow these steps:

Solution For Reaction Point Context

  • Check the device configuration, run:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 q | grep USER_PROGRAMMABLE_CC

  • If USER_PROGRAMMABLE_CC is not configured, enable it in mlxconfig :

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 set USER_PROGRAMMABLE_CC=1

  • Reset the firmware or power cycle the host to apply the configuration changes.

Solution For Notification Point Context:

  • Check the device configuration, run:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 q | grep PCC_INT_EN

  • If PCC_INT_EN is configured and you need to enable DOCA PCC Notification Point, disable PCC_INT_EN with:

    Copy
    Copied!
                

    mlxconfig -y -d /dev/mst/mt41692_pciconf0 set PCC_INT_EN=0

  • Reset the firmware or power cycle the host to apply the configuration changes.

Error Starting PCC Threads

If your application is unable to create DPA threads, it may be due to missing DPA configurations.

Solution

Refer to DOCA DPA Execution Unit Management Tool for guidance on managing the DPA Execution Units (EUs) required by your application.

Runtime Problems

Coredump Crash

Your application crashes and generates a coredump file with no clear error message or cause, follow these steps:

Solution

  1. Locate the coredump file specified by the -f runtime option (see DOCA PCC Application Command Line Flags). For example, assume it is located at /tmp/pcc_core.

  2. Extract the .elf file from DOCA PCC application using dpacc-extract tool. More information can be found in DOCA DPA Tools). For example:

    Copy
    Copied!
                

    dpacc-extract <DOCA PCC application path> -o <elf file>.elf

  3. Decode the extracted file using GNU debugger tool (e.g. gdb-multiarch) to analyze the coredump:

    Copy
    Copied!
                

    gdb-multiarch -c /tmp/pcc_core.<PID>.core <elf file>.el

PCC Process in Standby State

If your application is in standby state and is not responding to requests or performing expected actions, follow these steps:

Solution

This issue occurs when another DOCA PCC application is running on the same server. Therefore, i dentify any background DOCA PCC process with:

Copy
Copied!
            

ps -ef | grep doca_pcc

PCC Process in Deactivated State Without Coredump

If your application process enters a deactivated state and stops responding to requests, but no coredump is generated, this may be due to a timeout mechanism or similar system feature.

Symptoms

  • The application process stops responding to requests or interactions.

  • Users experience delays or timeouts when accessing the application.

  • Application logs or monitoring tools may indicate a sudden halt in activity or processing.

Possible Cause

  • A system-level timeout mechanism may trigger a reset or termination of the application process if it does not respond within a specified time period.

  • Long-running tasks or blocking operations within the application may prevent it from responding to requests in a timely manner.

Solution

Check if any user callbacks within your application, specifically those defined by the library, are taking too long to return control. Optimize these callbacks or reduce the number of iterations they require.

© Copyright 2024, NVIDIA. Last updated on Nov 12, 2024.