Troubleshooting#
Container Issues#
1. Verifying container configuration#
Use a login shell (bash -l) to see the banner displaying your container configuration:
docker exec -it aerial-framework-base-$USER bash -l
The banner shows your environment setup:
Entered Aerial Framework Container
Container Configuration:
User ID: 100963
Group ID: 100963
Dev workspace: /path/to/aerial-framework
Build workspace: /path/to/aerial-framework/out
GPU Configuration:
Visible devices: 1
Networking Configuration:
VFIO: /dev/vfio/vfio
InfiniBand: /dev/infiniband
GDRCopy: /dev/gdrdrv
Hugepages: /dev/hugepages
SSH Configuration:
SSH agent: /ssh-agent
Note: Real-time and networking device checks in container/setup_container.sh are optional.
If devices show /dev/null, those features are unavailable but the container will still function
for basic development and testing. DPDK/DOCA and fronthaul tests require proper networking hardware.
2. Container setup issues#
If container/setup_container.sh reports warnings or errors:
# Rerun setup with checks
bash container/setup_container.sh
# Regenerate .env file
bash container/setup_container.sh --overwrite-env
# Skip checks (for automation/CI)
bash container/setup_container.sh --env-only
See Container for detailed container setup and configuration.
3. “UID: readonly variable” or missing GROUP_ID#
Set the user id and group id variables in your .env file:
echo -e "\nUSER_ID=$(id -u)\nGROUP_ID=$(id -g)" >> container/.env
4. “service refers to undefined volume” or “empty section between colons”#
Set the DEV_WORKSPACE variable in your .env file:
echo -e "\nDEV_WORKSPACE=/path/to/your/dev/workspace" >> container/.env
5. Container won’t start#
docker compose build --no-cache
6. Permission errors with files#
# Fix file ownership in local workspace
sudo chown -R $USER:$USER workspace/
# Ensure USER_ID and GROUP_ID exist in .env
echo -e "\nUSER_ID=$(id -u)\nGROUP_ID=$(id -g)" >> container/.env
7. GPU not working#
# Test GPU access
docker compose -f container/compose.yaml run --rm aerial-framework-base nvidia-smi
# Check your .env file settings
grep NVIDIA_VISIBLE_DEVICES container/.env
8. Development workspace not accessible#
# Check if directory exists on host
ls -la /path/to/your/dev/workspace
# Check your .env file settings
grep DEV_WORKSPACE container/.env
Build Issues#
1. Build directory conflicts#
rm -rf out/build/<preset-name>
cmake --preset <preset-name>
2. Clean all Python virtual environments#
cmake --build out/build/<preset-name> --target py_all_clean_venv
Runtime Issues#
1. Real-time priority permission denied#
If an application fails with “Permission denied” when attempting to set real-time priority:
# The build system automatically sets CAP_SYS_NICE capability
# If you see permission errors, manually set capabilities:
sudo setcap cap_sys_nice=eip out/build/<preset-name>/path/to/executable
Note: The development container is not run as privileged for security reasons. Applications
requiring real-time priority or DPDK operations use Linux capabilities (via cmake/CapPermissions.cmake)
instead of requiring root or privileged containers. The build system automatically applies these
capabilities during the build process.
2. PTP clock synchronization for loopback#
Symptom: Late packets, timing violations, or timing window errors in fronthaul/RU emulator tests.
Cause: One or both NICs not synchronized to system clock.
Solution: Verify both phc2sys services are running (one for each NIC):
systemctl status 'phc2sys*'
You should see two active services like this:
● phc2sys1.service - running
● phc2sys.service - running
If only one service is running, configure the missing NIC following the NVIDIA Aerial CUDA-Accelerated RAN Installation Guide.
Critical: For loopback configurations, both NICs must be synchronized to avoid timing skew between the DU and RU sides. This is a common oversight that causes test failures.
3. Fronthaul integration test failure#
Symptom: fronthaul_app.integration_test fails with validation errors.
Common error in logs:
[ERROR] [fronthaul_app] [fronthaul_app_utils.cpp:380] [Stats] Validation FAILED:
PRB mismatch - expected 76440, got 0
Root causes:
Clock synchronization missing - Second NIC not synchronized to system clock (most common)
RU emulator not running - Check that ru_emulator started successfully
Network configuration issue - NIC addresses or loopback connection problems
FAPI capture file issue - Missing or corrupt FAPI capture files
Diagnostic steps:
# 1. CRITICAL: Verify BOTH phc2sys services are running (one per NIC)
systemctl status 'phc2sys*'
# Should show TWO active services for loopback configuration
# 2. Check if ru_emulator is running
ps aux | grep ru_emulator
# 3. Verify both network interfaces are up
ip link show aerial00
ip link show aerial02
# 4. Regenerate FAPI capture files
ctest --preset <preset> -R fapi_sample.integration_test
# 5. Run with verbose output for detailed error messages
ctest --preset <preset> -R fronthaul_app.integration_test --verbose
See Real-Time Applications for detailed clock synchronization requirements.
4. CTest timeout for long-running integration tests#
Symptom: Integration tests terminate prematurely at exactly 1500 seconds when running
with large TEST_SLOTS values.
Cause: CTest has a default timeout of 1500 seconds.
Solution: Increase the timeout using the --timeout flag:
# Increase timeout for long-running tests
TEST_SLOTS=200000 ctest --preset gcc-release --timeout 3000 -R fronthaul_app.integration_test