Debuggability Guide#

Overview#

NVIDIA Cloud Functions provides comprehensive debuggability features through two main approaches:

  1. Real-Time Logs

    • Access near real-time logs for faster debugging

    • Available through both NGC UI and CLI

    • No long-term storage, logs are ephemeral during workload lifecycle

    • Significantly reduced latency compared to traditional logging solutions

  2. Remote Command Execution

    • Execute commands on function containers for debugging purposes

    • Support for common Linux commands in NGC CLI

    • Secure, controlled access to container environments

Real-Time Logs#

Real-time logs allow you to view function logs with minimal latency, providing immediate feedback during development and troubleshooting.

Key Benefits#

  • Immediate Feedback: View logs in near real-time, reducing debugging cycles

  • Reduced Latency: Significantly faster than historical log solutions

  • Multiple Access Methods: Available through both NGC UI and CLI interfaces

Getting Started#

Real-time logs are accessible for deployed NVIDIA Cloud Functions.

  1. Access Logs via NGC UI

    • Navigate to your function in the NGC UI

    • Click the 3-dots button next to an active function

    • Select “View Logs” or “View Version Logs”

    • In the Logs page, you’ll see two tabs:

      • History Logs: For historical log analysis across different function version instances

      • Live Tail Logs: For near real-time log streaming

  2. Using Live Tail Logs in NGC UI

    • Select the “Live Tail Logs” tab

    • Choose the Cluster name and Instance ID

    • Click “Start Session” to begin viewing live logs

    • Use “Pause Session” to temporarily halt the log stream

    • Use “Resume Session” to continue viewing logs

    • Click “Stop Session” to end the streaming session

    • Filter logs using the search box for quick identification of specific events

Note

  • Real-time logs are available after a function instance is actively running and a real-time logging session has begun. Once an instance terminates or restarts, these logs are no longer accessible. For historical log analysis, use the History Logs tab.

  • Live Tail logs are not stored and cannot be ‘replayed’ after a session ends or after the 50k buffer is exceeded.

  • Currently, live tail logs are only supported for functions deployed to GFN and DGXC cloud environments (note that not all GFN and DGXC environments may be supported).

Remote Command Execution#

Remote command execution allows you to run commands directly in your function’s container environment for advanced debugging purposes. Please note that the feature will depend on the user’s own container environment, i.e. if the container is a distroless container, you may not be able to access your target function container file system. Additionally, the default working directory will be the root directory of the target container when executing commands.

Key Benefits#

  • Interactive Debugging: Execute commands for troubleshooting without redeploying

  • Container Inspection: Examine file systems, processes, and environment variables

  • Secure Access: Commands are executed in a controlled, secure environment

  • Distroless Support: Debug containers with minimal operating system components

Getting Started#

  1. View Available Instances

    • Navigate to your function in the NGC UI or use the CLI

    • Use the CLI to list instances:

1ngc cf fn instance ls <function-id>:<version-id>
2--org <org-id> #NGC Organization ID
3--team <team-name> #Team name in an org
  1. Execute Commands via NGC CLI

Syntax#
1ngc cf fn instance exec <function-id>:<version-id>
2--org <org-id> #NGC Organization ID
3--team <team-name> #Team name in an org
4--instance-id <instance-id> #Instance ID
5--pod-name <pod-name> #Pod name used
6--container-name <container-name> #Container name used
7--command "<linux-command>" #linux command to be executed
Example#
1ngc cf fn instance exec my-function:v1
2--org my-organization
3--team my-team
4--instance-id --instance-id instance-1
5--pod-name pod-1234
6--container-name main
7--command "ls -la"

NGC CLI Requirements and Examples#

CLI Version Requirements#

The debuggability features are only available in NGC CLI versions 3.131.5 and newer.

Detailed CLI Examples#

  1. List Function Instances, Containers, and Pods

Syntax#
1ngc cf fn instance ls <function-id>:<version-id>
2--org <org-id> #NGC Organization ID
3--team <team-name> #Team name in an org
Example#
1ngc cf fn instance ls my-function:v1
2--org my-organization
3--team my-team
  1. Execute Commands on Target Containers

Syntax#
1ngc cf fn instance exec <function-id>:<version-id>
2--org <org-id> #NGC Organization ID
3--team <team-name> #Team name in an org
4--instance-id <instance-id> #Instance ID
5--pod-name <pod-name> #Pod name used
6--container-name <container-name> #Container name used
7--command "<linux-command>" #linux command to be executed
Example#
1ngc cf fn instance exec my-function:v1
2--org my-organization
3--team my-team
4--instance-id instance-1
5--pod-name pod-1234
6--container-name main
7--command "ls -la"
  1. Attach Log Output from a Specific Pod Container

Syntax#
1ngc cf fn instance logs <function-id>:<version-id>
2--org <org-id> #NGC Organization ID
3--team <team-name> #Team name in an org
4--instance-id <instance-id> #Instance ID
5--pod-name <pod-name> #Pod name used
6--container-name <container-name> #Container name used
Example#
1ngc cf fn instance logs my-function:v1 --org my-organization
2--team my-team
3--instance-id instance-1
4--pod-name pod-1234
5--container-name main
  1. Attach Log Output from an Entire Instance

Syntax#
1ngc cf fn instance logs <function-id>:<version-id>
2--org <org-id> #NGC Organization ID
3--team <team-name> #Team name in an org
4--instance-id <instance-id> #Instance ID
Example#
1ngc cf fn instance logs my-function:v1
2--org my-organization
3--team my-team
4--instance-id instance-1

Supported Commands#

The following commands are supported for remote execution:

Command/Method

Description

cat

Display file contents

ls

List directory contents

cd

Change directory

pwd

Print working directory

man

Display manual pages

sort

Sort lines of text files

df

Report file system disk space usage

du

Estimate file space usage

grep

Search for patterns in files

find

Search for files

head

Display beginning of files

more

Page through text

less

Page through text with more features

tail

Display end of files

wc

Print newline, word, and byte counts

cut

Remove sections from lines

echo

Display a line of text

printf

Format and print data

print

Print data

ps

Report process status

base64

Base64 encode/decode

Pipe (|)

Pipe output

Input redirect (<)

Redirect input

Command separator (;)

Separate commands

Command chaining (&&)

Chain commands

Note

The command execution environment is isolated and has no impact on the function’s running state. Command execution is logged for security and audit purposes.

Security#

NVCF ensures secure debugging capabilities:

  • Authentication and authorization for all debugging actions

  • Container isolation prevents unauthorized access

  • Limited command set to prevent system modifications

  • Access control based on NGC permissions

  • All debugging actions are logged and auditable

Troubleshooting#

Common Error Codes#

Error Code

Description

Possible Resolution

400 (BadRequestException)

Function is inactive or invalid parameters provided

Ensure function is active and parameters are correct

401 (NotAuthorizedException)

Invalid authentication token

Check that your NGC API key or SSA token is valid

403 (ForbiddenException)

Insufficient permissions or function does not exist

Verify that your token has the appropriate scopes and the function exists

404 (NotFoundException)

Selected pod/container/instance does not exist

Verify that the specified resources exist and are correctly named

429 (TooManyRequestsException)

Rate limit exceeded

Reduce the frequency of requests and try again later

500 (UpstreamException)

Internal service error

Contact support if the issue persists

Required Permissions#

To use the debuggability features, ensure your NGC API key has the correct permissions:

  • When generating an NGC API key from the NGC console, select the “Cloud Function” permission

  • This permission grants the necessary access to use both Live Tail Logs and Command Execution features

Limitations#

  • Real-time logs are ephemeral with no long-term storage

  • Historical logs are still available through the standard logging system

  • Command execution is limited to a predefined set of commands

  • Debugging sessions have a maximum duration of 2 hours

  • Output size is limited to 2MB per command

  • Live tail logs are only supported for functions deployed to GFN and DGXC cloud environments

  • Live tail logs view maintains a maximum of 50,000 lines in the console buffer

  • Real-time logs cannot be searched on aggregate across all functions (e.g., searching for a string across all functions in an organization)

Appendix A: Terminology#

Term

Definition

NGC

NVIDIA GPU Cloud which provides a way for users to set up and manage access to NVIDIA cloud services

NVCF

NVIDIA Cloud Functions

Ephemeral Container

A temporary container created within a pod for debugging purposes

Real-time Logs

Logs streamed with minimal latency during function execution

DGXC

DGX Cloud service

History Logs

Logs stored for longer-term analysis with search capabilities

Live Tail Logs

Near real-time streaming logs with minimal latency

Distroless Container

A container image with minimal operating system components