Limitations#
Known Limitations#
The cuPHY library and binaries are intended for the Linux environment on the qualified platforms only.
The supported configurations are limited to those listed above. Other configurations are not supported and may not perform well.
Cell life cycle management:
All cells have to be configured before any cell start.
No In-service configuration update.
CONFIG.request received in CONFIGURED (Out-of-Service) state can be used to change PCI and the supported PRACH parameters specified in dynamic PRACH section in cuBB quickstart guide only. PHY ignores any other TLVs received in CONFIG.request. If CONFIG.response indicates success, then only PCI and supported PRACH parameters are changed. All other parameters remain as in the initial CONFIG.request received for the cell.
PHY reconfiguration of a cell in CONFIGURED (Out-of-Service) state can take upto 40ms to complete (details below). Another CONFIG.request for any cell during this time (around 20ms) that occurs before receiving a CONFIG.response returns a CONFIG.response with the error code “MSG_INVALID_STATE”. The ERROR.indication will NOT be sent for this error. L2 needs to wait to receive a CONFIG.response before sending a CONFIG.request for another cell in CONFIGURED state.
If Aerial is configured for 4 cells and 3 cells are In-service with data running, reconfiguration of 1 cell (Out-of-Service) can take around 40ms to complete
If Aerial is configured for 4 cells and 3 cells are In-service with no data running, reconfiguration of 1 cell (O-RU) can take around 20 ms to complete
If CONFIG.response is received with error code “MSG_INVALID_CONFIG”, then reconfiguration was unsuccessful and the cell is still with the configuration received in initial CONFIG.request.
No UE attach allowed in all cells during the reconfiguration time.
Dynamic M-plane parameters:
When OAM sends gRPC message to change MAC address in M-plane, it must be a valid O-RU MAC address.
The cuPHY LDPC decoder ‘tensor interface’ API is deprecated and will be removed in a future release. Please use the ‘transport block’ interface API.
Early HARQ in UCI.indication:
This feature is supported only for the first UL slot (x4 slots) and when all the early-HARQ bits are resident in symbols 0-3 for 4T4R and symbols 0-4 for 64T64R.
UCI.Indication with early HARQ will not have any measurement values.
If only HARQ is scheduled on PUSCH then with this feature enabled, no UCI.indication will be sent to L2 after full slot processing of PUSCH. Consequently no measurements for that slot will be reported to L2.
If CSI reports are also scheduled on PUSCH along with HARQ, then UCI.Indication with early HARQ will not have any measurement values. But the UCI.indication sent after full slot processing of PUSCH will have the measurements.
A constraint to enable early-HARQ is that these HARQ bits should be fully resident in OFDM symbols 0-3. So HARQ bits resident in OFDM symbols 0-3 will be in the 1st UCI.indication (that is, early-HARQ indication) and all other HARQ bits in the subsequent UCI.indication (that is, after full slot PUSCH processing completes).
Multiple cell operation without issuing dummy config.req:
L2 should wait for at least 40msec between two CONFIG.request even at the initial stage, so that CONFIG.response is received by L2.
L2 can retry the failed CONFIG.request for a given cell after 1 sec.
Multi-L2 with single cuphycontroller per GPU:
The total cell number of all L2 instances cannot exceed the cell_group_num configured in cuphycontroller yaml.
nvIPC only supports static cell allocation defined in the nvipc_multi_instances.yaml for multiple L2 instances. The number of cells and the cell mapping in each L2 instance cannot change after L1 is configured..
Support dynamic cell start/stop in each L2 instance. Do not support dynamic L2 restart. L2 instance needs to hold the nvipc instance after connecting to L1.
64T64R TDD single cell:
PDSCH Resource Allocation Type 0 (RAT0) is not supported.
SRS reports related to antennaSwitching (FAPI 222.10.04, Table 3-133 - Channel SVD Representation) is not supported.
When L2 restarts without restarting L1, L2 has to keep the same cell config and cell sequence. If not, it should restart L1 to ensure a clean state in PHY.
When running cuBB 59c peak cell test on MIG 4g.48g and LLM on MIG 3g.48gb, the validation was done up to 11C.
Mixed ModComp and BFP are not supported.
All UEs in the same UEG need to have the same QAM.
cuMAC-CP has 1 receiver thread and multiple worker threads, which need to be bound to dedicated CPU cores. The current release supports only 1 worker thread per core.
Known Issues#
SCHED_FIFO + 100% CPU poll thread causes the system to hang on the 5.4.0-65-lowlatency kernel. The solution is one of the following:
Configure the kernel option CONFIG_RCU_NOCB_CPU=y, recompile, and install the kernel.
Upgrade the host system to 5.15.0-71-lowlatency or later.
CUDA application on Grace Hopper:
CUDA applications on the Grace Hopper platform require ATS support. Currently, ATS is not enabled on the arm64 platform when IOMMU passthrough is enabled.
NIC string conversion issue on Grace Hopper:
While working on dynamic CPU core assignments in K8s pod, we need to parse and dump the cuphycontroller config yaml file. On the Grace Hopper, the nic: 0000:01:00.0 will be converted to nic: 60.0. This is because the PCIe address might be interpreted as a 60 based integer according to ‘https://yaml.org/type/int.html’. The fix is to explicitly tell yaml parser to interpret the PCIe address as a string by putting single quotation marks around or !!str before the pcie address, e.g., nic: ‘0000:01:00.0’ or nic: !!str 0000:01:00.0.
From sed -i "s/nic:.*/nic: 0000:01:00.0/" ${cuBB_SDK}/cuPHY-CP/cuphycontroller/config/cuphycontroller_P5G_FXN.yaml to sed -i "s/nic:.*/nic: ‘0000:01:00.0’/" ${cuBB_SDK}/cuPHY-CP/cuphycontroller/config/cuphycontroller_P5G_FXN.yaml (or sed -i "s/nic:.*/nic: \!\!str 0000:01:00.0/" ${cuBB_SDK}/cuPHY-CP/cuphycontroller/config/cuphycontroller_P5G_FXN.yaml)
There is a known issue (DL C-plane send error) when running multiple L2 instances if H2D copy thread is enabled. The workaround is to disable the H2D copy thread when running multiple L2 instances.
F08 20C_60c: cuPHYController Reports ‘DL C-plane Send Error’ (Error Type 1) during Initialization. This is a transient error only occurs at startup.
Duplicate configuration doesn’t work with mMIMO pattern 6C_69b.
There is a known issue (UE attach failure) when using cuBB compiled with SCF_FAPI_10_04 flag for E2E integration. The workaround is for L2 to send TRP scheme field when sending RX Beamforming PDU according to SCF FAPI 10.04 release.
The following test cases are not passing. They could be functionality issues or test framework issues:
Channel |
Test Cases |
Feature |
|---|---|---|
DL MIX |
193, 194 |
32-port CSI-RS |
CSI-RS |
4058, 4064 |
CSI-RS ModComp |
mSlot_mCell |
90628, 90629, 90636. |
64TR |
If below error is observed on startup of cuphycontroller, please try a different port for ‘aerial_metrics_backend_address’ in cuphycontroller yaml file, e.g., current default address is 127.0.0.1:8081, change it to 127.0.0.1:8082. We’ve seen this issue with RHOCP.
null context when constructing CivetServer. Possible problem binding to port.