OAM Configuration#

Startup Configuration (cuphycontroller)#

The application binary name for the combined cuPHY-CP + cuPHY is cuphycontroller. When cuphycontroller starts, it reads static configuration from configuration YAML files. This section describes the fields in the YAML files.

l2adapter_filename#

This field contains the filename of the YAML-format config file for l2 adapter configuration.

aerial_metrics_backend_address#

Aerial Prometheus metrics backend address.

low_priority_core#

CPU core shared by all low-priority threads, isolated CPU core is preferred. Can be non-isolated CPU core but make sure no other heavy load task on it.

nic_tput_alert_threshold_mbps#

This parameter is used to monitor NIC throughput. The units are in Mbps, that is, 85000 = 85 Gbps. This value is almost the max throughput that can be achieved with accurate send scheduling for a 100 Gbps link. A gRPC client(reference: $cuBB_SDK/cuPHY-CP/cuphyoam/examples/test_grpc_push_notification_client.cpp) needs to be implemented to receive the alert.

cuphydriver_config#

This container holds configuration for cuphydriver.

standalone#

0 - run cuphydriver integrated with other cuPHY-CP components

1 - run cuphydriver in standalone mode (no l2adapter, etc)

validation#

Enables additional validation checks at run-time.

0 - Disabled

1 - Enabled

num_slots#

Number of lots to run in cuphydriver standalone test.

log_level#

cuPHYDriver log level: DBG, INFO, ERROR.

profiler_sec#

Number of seconds to run the CUDA profiling tool.

dpdk_thread#

Sets the CPU core used by the primary DPDK thread. It does not have to be an isolated core. And the DPDK thread itself is defaulted to ‘SCHED_FIFO+priority 95’.

dpdk_verbose_logs#

Enable maximum log level in DPDK.

0 - Disable

1 - Enable

accu_tx_sched_res_ns#

Sets the accuracy of the accurate transmit scheduling, in units of nanoseconds.

accu_tx_sched_disable#

Disable accurate TX scheduling.

0 - packets are sent according to the TX timestamp

1 - packets are sent whenever it is convenient

fh_stats_dump_cpu_core#

Sets the CPU core used by the FH stats logging thread. It does not have to be an isolated core. And currently the default FH stats polling interval is 500ms.

pdump_client_thread#

CPU core to use for pdump client. Set to -1 to disable fronthaul RX traffic PCAP capture.

See:

https://doc.dpdk.org/guides/howto/packet_capture_framework.html
aerial-fh README.md

mps_sm_pusch#

Number of SMs for PUSCH channel.

mps_sm_pucch#

Number of SMs for PUCCH channel.

mps_sm_pusch#

Number of SMs for PUSCH channel.

mps_sm_prach#

Number of SMs for PRACH channel.

mps_sm_ul_order#

Number of SMs for UL order kernel.

mps_sm_pdsch#

Number of SMs for PDSCH channel.

mps_sm_pdcch#

Number of SMs for PDCCH channel.

mps_sm_pbch#

Number of SMs for PBCH channel.

mps_sm_srs#

Number of SMs for SRS channel.

mps_sm_gpu_comms#

Number of SMs for GPU comms.

nics#

Container for NIC configuration parameters.

nic#

PCIe bus address of the NIC port.

mtu#

Maximum transmission size, in bytes, supported by the Fronthaul U-plane and C-plane.

cpu_mbufs#

Number of preallocated DPDK memory buffers (mbufs) used for Ethernet packets.

uplane_tx_handles#

The number of pre-allocated transmit handles that link the U-plane prepare() and transmit() functions.

txq_count#

NIC transmit queue count.

Must be large enough to handle all cells attached to this NIC port.

Each cell uses one TXQ for C-plane and txq_count_uplane TXQs for U-plane.

rxq_count#

Receive queue count.

This value must be large enough to handle all cell attached to this NIC port.

Each cell uses one RXQ to receive all uplink traffic.

txq_size#

Number of packets that can fit in each transmit queue.

rxq_size#

Number of packets that can be buffered in each receive queue.

gpu#

CUDA device to receive uplink packets from this NIC port.

gpus#

List of GPU device IDs. To use gpudirect, the GPU must be on the same PCIe root complex as the NIC. To maximize performance, the GPU should be on the same PCIe switch as the NIC. Only the first entry in the list is used.

workers_ul#

List of pinned CPU cores used for uplink worker threads.

workers_dl#

List of pinned CPU cores used for downlink worker threads.

debug_worker#

For performance debug purpose, this is set to a free core to work with the enable_*_tracing logs.

workers_sched_priority#

cuPHYDriver worker threads scheduling priority.

dpdk_file_prefix#

Shared data file prefix to use for the underlying DPDK process.

wfreq#

Filename containing the coefficients for channel estimation filters, in HDF5 (.h5) format.

cell_group#

Enable cuPHY cell groups.

0 - disable 1 - enable

cell_group_num#

Number of cells to be configured in L1 for the test.

enable_h2d_copy_thread#

Enable/disable offloading of h2d copy in L2A to a seperate copy thread.

h2d_copy_thread_cpu_affinity#

CPU core on which the h2d copy thread in L2A should run. Applicable only if enable_h2d_copy_thread is 1.

h2d_copy_thread_sched_priority#

h2d copy thread priority in L2A. Applicable only if enable_h2d_copy_thread is 1.

fix_beta_dl#

Fix the beta_dl for local test with RU Emulator so that the output values are a bytematch to the TV.

prometheus_thread#

Pinned CPU core for updating NIC metrics once per second.

start_section_id_srs#

ORAN CUS start section ID for the SRS channel.

start_section_id_prach#

ORAN CUS start section ID for the PRACH channel.

enable_ul_cuphy_graphs#

Enable UL processing with CUDA graphs.

enable_dl_cuphy_graphs#

Enable DL processing with CUDA graphs.

section_3_time_offset#

Time offset, in units of nanoseconds, for the PRACH channel.

ul_order_timeout_cpu_ns#

Timeout, in units of nanoseconds, for the uplink order kernel to receive any U-plane packets for this slot.

ul_order_timeout_gpu_ns#

Timeout, in units of nanoseconds, for the order kernel to complete execution on the GPU.

pusch_sinr#

Enable pusch sinr calculation (0 by default).

pusch_rssi#

Enable PUSCH RSSI calculation (0 by default).

pusch_tdi#

Enable PUSCH TDI processing (0 by default).

pusch_cfo#

Enable PUSCH CFO calculations (0 by default).

pusch_dftsofdm#

DFT-s-OFDM enable/disable flag: 0 - disable, 1 - enable.

pusch_to#

It is only used for timing offset reporting to L2. If the timing offset estimate is not used by L2, it can be disabled.

pusch_select_eqcoeffalgo#

Algorithm selector for PUSCH noise interference estimation and channel equalization. The following values are supported: 0: Regularized zero-forcing (RZF) 1: Diagonal MMSE regularization 2: Minimum Mean Square Error - Interference Rejection Combining (MMSE-IRC) 3: MMSE-IRC with RBLW covariance shrinkage 4: MMSE-IRC with OAS covariance shrinkage.

pusch_select_chestalgo#

Channel estimation algorithm selection: 0 - legacy MMSE, 1 - multi-stage MMSE with delay estimation.

pusch_tbsizecheck#

Tb size verification enable/disable flag: 0 - disable, 1 - enable.

pusch_subSlotProcEn#

Sub-slot processing enable/disable flag: 0 - disable, 1 - enable. The early HARQ feature will be enabled accordingly when this flag is enabled. To get HARQ values in UCI.indication for UCI on PUSCH, before complete PUSCH slot processing, L2 should include PHY configuration TLV 0x102B (indicationInstancesPerSlot) with UCI.indication set to 2, according to Table 3–36 in SCF FAPI 222.10.04. If UCI.indication set to 2 in CONFIG.request for any cell the early HARQ feature will get activated for all cells.

pusch_deviceGraphLaunchEn#

Static flag to allow device graph launch in PUSCH.

pusch_waitTimeOutPreEarlyHarqUs#

Timeout threshold in microseconds for receiving OFDM symbols for PUSCH early-HARQ processing.

pusch_waitTimeOutPostEarlyHarqUs#

Timeout threshold in microseconds for receiving OFDM symbols for PUSCH non-early-HARQ processing (essentially all the PUSCH symbols).

puxch_polarDcdrListSz#

List size used in List Decoding of Polar codes.

enable_cpu_task_tracing#

The flag is used to trace and instrument DL/UL CPU tasks running on existing cuphydriver cores.

enable_prepare_tracing#

It’s for tracing the U-plane packet preperation kernel durations and end times and need the debug worker to be enabled.

enable_dl_cqe_tracing#

Enables tracing of DL CQEs (debug feature to check for DL U-plane packets’ timing at the NIC).

ul_rx_pkt_tracing_level#

This YAML param can be set to 3 different values: 0 (default, recommended) : Only keeps count of the early/ontime/late packet counters per slot as seen by the DU (Reorder kernel) for the Uplink U-plane packets. 1 : Also Captures and logs earliest/latest packet timestamp per symbol per slot as seen by the DU. 2 : Also Captures and logs timestamp of each packet received per symbol per slot as seen by the DU.

split_ul_cuda_streams#

Keep default of 0. This allows back to back UL slots to overlap their processing. Keep disabled to maintain performance of first UL slot in every group of 2.

aggr_obj_non_avail_th#

Keep the default value at 5. This param sets the threshold for successive non-availability of L1 objects (can be interpreted as L1 handler necessary to schedule PHY compute tasks to the GPU). Unavailability could imply the execution timeline falling behind the expected L1 timeline budget.

dl_wait_th_ns#

This parameter is used for error handling in the event of GPU failure. You must keep the defaults.

sendCPlane_timing_error_th_ns#

Keep the default value at 50000 (50 us). The threshold is used as a check for the proximity of the current time during C-plane task’s execution to the actual scheduled C-plane packet’s transmission time. Meeting the threshold check would result in C-plane packet transmission being dropped for the slot.

pusch_forcedNumCsi2Bits#

Debug feaure if > 0, overrides the number of PUSCH CSI-P2 bits for all CSI-P2 UCIs with the non-zero value provided. Recommend setting it to 0.

mMIMO_enable#

Keep at default of 0. This flag is reserved for future capability.

enable_srs#

Enable/disable SRS

enable_csip2_v3#

Enable/disable the the support of CSI part2 defined by FAPI 10.03 Table 3-77

pusch_aggr_per_ctx#

Number of PUSCH objects per context (3 by default).

prach_aggr_per_ctx#

Number of PRACH objects per context (2 by default).

pucch_aggr_per_ctx#

Number of PUCCH objects per context (4 by default).

srs_aggr_per_ctx#

Number of SRS objects per context (2 by default).

ul_input_buffer_per_cell#

Number of UL buffers allocated per cell (10 by default).

ul_input_buffer_per_cell_srs#

Number of UL buffers allocated per cell for SRS (4 by default).

ue_mode#

Flag for spectral effeciency feature. Must be enabled on the RU side YAML to emulate UE operation.

cplane_disable#

Disable C-plane for all cells.

0 - Enable C-plane 1 - Disable C-plane

cells#

List of containers of cell parameters.

name#

Name of the cell

cell_id#

ID of the cell.

src_mac_addr#

Source MAC address for U-plane and C-plane packets. Set to 00:00:00:00:00:00 to use the MAC address of the NIC port in use.

dst_mac_addr#

Destination MAC address for U-plane and C-plane packets.

nic#

gNB NIC port to which the cell is attached.

Must match the ‘nic’ key value in one of the elements of in the ‘nics’ list.

vlan#

VLAN ID used for C-plane and U-plane packets.

pcp#

QoS priority codepoint used for C-plane and U-plane Ethernet packets.

txq_count_uplane#

Number of transmit queues used for U-plane.

eAxC_id_ssb_pbch#

List of eAxC IDs to use for SSB/PBCH.

eAxC_id_pdcch#

List of eAxC IDs to use for PDCCH.

eAxC_id_pdsch#

List of eAxC IDs to use for PDSCH.

eAxC_id_csirs#

List of eAxC IDs to use for CSI RS.

eAxC_id_pusch#

List of eAxC IDs to use for PUSCH.

eAxC_id_pucch#

List of eAxC IDs to use for PUCCH.

eAxC_id_srs#

List of eAxC IDs to use for SRS.

eAxC_id_prach#

List of eAxC IDs to use for PRACH.

dl_iq_data_fmt:comp_meth#

DL U-plane compression method: 0: Fixed point 1: BFP

dl_iq_data_fmt:bit_width#

Number of bits used for each RE on DL U-plane channels. Fixed point supported value: 16 BFP supported value: 9, 14, 16

ul_iq_data_fmt:comp_meth#

UL U-plane compression method: 0: Fixed point 1: BFP

ul_iq_data_fmt:bit_width#

Number of bits used per RE on uplink U-plane channels. Fixed point supported value: 16 BFP supported value: 9, 14, 16

fs_offset_dl#

Downlink U-plane scaling per ORAN CUS 6.1.3.

exponent_dl#

Downlink U-plane scaling per ORAN CUS 6.1.3.

ref_dl#

Downlink U-plane scaling per ORAN CUS 6.1.3.

fs_offset_ul#

Uplink U-plane scaling per ORAN CUS 6.1.3.

exponent_ul#

Uplink U-plane scaling per ORAN CUS 6.1.3.

max_amp_ul#

Maximum full scale amplitude used in uplink U-plane scaling per ORAN CUS 6.1.3.

mu#

3GPP subcarrier bandwidth index ‘mu’.

0 - 15 kHz 1 - 30 kHz 2 - 60 kHz 3 - 120 kHz 4 - 240 kHz

T1a_max_up_ns#

Scheduled timing advance before time-zero for downlink U-plane egress from DU, per ORAN CUS.

T1a_max_cp_ul_ns#

Scheduled timing advance before time-zero for uplink C-plane egress from DU, per ORAN CUS.

Ta4_min_ns#

Start of DU reception window after time-zero, per ORAN CUS.

Ta4_max_ns#

End of DU reception window after time-zero, per ORAN CUS.

Tcp_adv_dl_ns#

Downlink C-plane timing advance ahead of U-plane, in units of nanoseconds, per ORAN CUS.

ul_u_plane_tx_offset_ns#

Flag for spectral effeciency feature. Must be set on the RU side YAML to offset UL transmission start from T0.

pusch_prb_stride#

Memory stride, in units of PRBs, for the PUSCH channel. Affects GPU memory layout.

prach_prb_stride#

Memory stride, in units of PRBs, for the PRACH channel. Affects GPU memory layout.

srs_prb_stride#

Memory stride, in units of PRBs, for the SRS. Affects GPU memory layout.

pusch_ldpc_max_num_itr_algo_type#

0 - Fixed LDPC iteration count

1 - MCS based LDPC iteration count

Recommend setting pusch_ldpc_max_num_itr_algo_type:1

pusch_fixed_max_num_ldpc_itrs#

Unused currently, reserved to replace pusch_ldpc_n_iterations.

pusch_ldpc_n_iterations#

Iteration count is set to pusch_ldpc_n_iterations, when the fixed LDPC iteration count option is selected (pusch_ldpc_max_num_itr_algo_type:0). Because the default value of pusch_ldpc_max_num_itr_algo_type is 1 (iteration count optimized based on MCS), pusch_ldpc_n_iterations is unused.

pusch_ldpc_algo_index#

Algorithm index for LDPC decoder: 0 - automatic choice.

pusch_ldpc_flags#

pusch_ldpc_flags are flags that configure the LDPC decoder. pusch_ldpc_flags:2 selects an LDPC decoder that optimizes for throughput i..e processes more than one codeword (for example, 2) instead of latency.

pusch_ldpc_use_half#

Indication of input data type of LDPC decoder:: 0 - single precision, 1 - half precision

pusch_nMaxPrb#

This is for memory allocation of max PRB range of peak cells compared to average cells.

ul_gain_calibration#

UL Configured Gain used to convert dBFS to dBm. Default value, if unspecified: 48.68

lower_guard_bw#

Lower Guard Bandwidth expressed in kHZ. Used for deriving freqOffset for each Rach Occasion. Default is 845.

tv_pusch#

HDF5 file containing static configuration (for example, filter coefficients) for the PUSCH channel.

tv_prach#

HDF5 file containing static configuration (for example, filter coefficients) for the PRACH channel.

pusch_ldpc_n_iterations#

PUSCH LDPC channel coding iteration count.

pusch_ldpc_early_termination#

PUSCH LDPC channel coding early termination.

0 - Disable 1 - Enable

Startup Configuration (l2_adapter_config)#

msg_type#

Defines the L2/L1 interface API. Supported options are:

scf_fapi_gnb - Use the small cell forum API.

phy_class#

Same as msg_type.

tick_generator_mode#

The SLOT.incication interval generator mode:

0 - poll + sleep. During each tick the threads sleep some time to release the CPU core to avoid hanging the system, then they poll the system time. 1 - sleep. Sleep to absolute timestamp, no polling. 2 - timer_fd. Start a timer and call epoll_wait() on the timer_fd.

allowed_fapi_latency#

Allowed maximum latency of SLOT FAPI messages, which send from L2 to L1, otherwise the message is ignored and dropped.

Unit: slot. Default is 0, it means L2 message should be received in current slot.

allowed_tick_error#

Allowed tick interval error.

Unit: us

Tick interval error is printed in statistic style. If observed tick error > allowed, the log is printed as Error level.

timer_thread_config#

Configuration for the timer thread.

name#

Name of thread.

cpu_affinity#

Id of pinned CPU core used for timer thread.

sched_priority#

Scheduling priority of timer thread.

message_thread_config#

Configuration container for the L2/L1 message processing thread.

name#

Name of thread.

cpu_affinity#

Id of pinned CPU core used for timer thread.

sched_priority#

Scheduling priority of message thread.

ptp#

ptp configs for GPS_ALPHA, GPS_BETA.

gps_alpha#

GPS Alpha value for ORAN WG4 CUS section 9.7.2. Default value = 0, if undefined.

gps_beta#

GPS Beta value for ORAN WG4 CUS section 9.7.2. Default value = 0, if undefined.

mu_highest#

Highest supported mu, used for scheduling TTI tick rate.

slot_advance#

Timing advance ahead of time-zero, in units of slots, for L1 to notify L2 of a slot request.

enableTickDynamicSfnSlot#

Enable dynamic slot/sfn.

staticPucchSlotNum#

Debugging param for testing against RU Emulator to send set static PUCCH slot number.

staticPuschSlotNum#

Debugging param for testing against RU Emulator to send set static PUSCH slot number.

staticPdschSlotNum#

Debugging param for testing against RU Emulator to send set static PDSCH slot number.

staticPdcchSlotNum#

Debugging param for testing against RU Emulator to send set static PDCCH slot number.

staticCsiRsSlotNum#

Debugging param for testing against RU Emulator to send set static CSI-RS slot number.

staticSsbSlotNum#

Override the incoming slot number with the YAML configured SlotNumber for SS/PBCH.

Example

staticSsbSlotNum:10

staticSsbPcid#

Debugging param for testing against RU Emulator to send set static SSB phycellId.

staticSsbSFN#

Debugging param for testing against RU Emulator to send set static SSB SFN.

pucch_dtx_thresholds#

Array of scale factors for DTX Thresholds of each PUCCH format.

Default value, if not present, is 1.0, which means the thresholds are not scaled.

For PUCCH format 0 and 1, -100.0 is replaced with 1.0.

Example:

pucch_dtx_thresholds: [-100.0, -100.0, 1.0, 1.0, -100.0]

pusch_dtx_thresholds#

Scale factor for DTX Thresholds of UCI on PUSCH.

Default value, if not present, is 1.0, which means the threshold is not scaled.

Example:

pusch_dtx_thresholds: 1.0

enable_precoding#

Enable/Disable Precoding PDUs to be parsed in L2Adapter.

Default value is 0 enable_precoding: 0/1

prepone_h2d_copy#

Enable/Disable preponing of H2D copy in L2Adapter.

Default value is 1 prepone_h2d_copy: 0/1

enable_beam_forming#

Enables/Disables BeamIds to parsed in L2Adapter.

Default value : 0 enable_beam_forming: 1

dl_tb_loc#

app_config#

Configurations for all transport types, mostly used for debug.

grpc_forward#

Whether to enable forwarding nvipc messages and how many messages to be forwarded automatically from initialization. Here count = 0 means forwarding every message forever.

0: disabled; 1: enabled but doesn’t start forwarding at initial; -1: enabled and start forwarding at initial with count = 0; Other positive number: enabled and start forwarding at initial with count = grpc_forward.

debug_timing#

For debug only.

Whether to record timestamp of allocating, sending, receiving, releasing of all nvipc messages.

pcap_enable#

For debug only.

Whether to capture nvipc messages to pcap file.

pcap_cpu_core#

CPU core of background pcap log save thread.

pcap_cache_size_bits#

Size of /dev/shm/${prefix}_pcap. If set to 29, size is 2^29 = 512MB.

pcap_file_size_bits#

Max size of /dev/shm/${prefix}_pcap. If set to 31, size is 2^31 = 2GB.

pcap_max_data_size#

Max DL/UL FAPI data size to capture reduce pcap size.

Startup Configuration (ru-emulator)#

The application binary name for the combined O-RU + UE emulator is ru-emulator. When ru-emulator starts, it reads static configuration from a configuration YAML file. This section describes the fields in the YAML file.

core_list#

List of CPU cores that RU Emulator could use.

nic_interface#

PCIe address of NIC to use that is, b5:00.1.

peerethaddr#

MAC address of cuPHYController port.

nvlog_name#

The nvlog instance name for ru-emulator. Detailed nvlog configurations are in nvlog_config.yaml.

cell_configs#

Cell configs agreed upon with DU.

name#

Cell string name (largely unused).

eth#

Cell MAC address.

dl_iq_data_fmt:comp_meth#

DL U-plane compression method: 0: Fixed point 1: BFP

dl_iq_data_fmt:bit_width#

Number of bits used for each RE on DL U-plane channels. Fixed point supported value: 16 BFP supported value: 9, 14, 16

ul_iq_data_fmt:comp_meth#

UL U-plane compression method: 0: Fixed point 1: BFP

ul_iq_data_fmt:bit_width#

Number of bits used for each RE on UL U-plane channels. Fixed point supported value: 16 BFP supported value: 9, 14, 16

flow_list#

eAxC list

eAxC_prach_list#

eAxC prach list

vlan#

vlan to use for RX and TX

nic#

Index of the nic to use in the nics list.

tti#

Slot indication inverval.

validate_dl_timing#

Validate DL timing (need to be PTP synchronized).

timing_histogram#

generate histogram

timing_histogram_bin_size#

histogram bin size

oran_timing_info#

dl_c_plane_timing_delay#

t1a_max_up from ORAN

dl_c_plane_window_size#

DL C Plane RX ontime window size.

ul_c_plane_timing_delay#

T1a_max_cp_ul from ORAN.

ul_c_plane_window_size#

UL C Plane RX ontime window size.

dl_u_plane_timing_delay#

T2a_max_up from ORAN.

dl_u_plane_window_size#

DL U Plane RX ontime window size.

ul_u_plane_tx_offset#

Ta4_min_up from ORAN.

Run-time Configuration/Status#

During run-time, Aerial components can be re-configured or queried for status through gRPC remote procedure calls (RPCs). The RPCs are defined in “protocol buffers” syntax, allowing support for clients written in any of the languages supported by gRPC and protocol buffers.

More information about gRPC may be found at: https://grpc.io/docs/what-is-grpc/core-concepts/

More information about protocol buffers may be found at: https://developers.google.com/protocol-buffers

Simple Request/Reply Flow#

Aerial applications support a request/reply flow using the gRPC framework with protobufs messages. At run-time, certain configuration items may be updated and certain status information may be queried. An external OAM client interfaces with the Aerial application acting as the gRPC server.

Streaming Request/Replies#

Aerial applications support the gRPC streaming feature for sending periodic status between client and server.

Asynchronous Interthread Communication#

Certain request/reply scenarios require interaction with the high-priority CPU-pinned threads orchestrating GPU work. These interactions occur through Aerial-internal asynchronous queues, and requests are processed on a best effort basis that prioritizes the orchestration of GPU kernel launches and other L1 tasks.

Aerial Common Service Definition#

/\*

\* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.

\*

\* NVIDIA CORPORATION and its licensors retain all intellectual property

\* and proprietary rights in and to this software, related documentation

\* and any modifications thereto. Any use, reproduction, disclosure or

\* distribution of this software and related documentation without an
express

\* license agreement from NVIDIA CORPORATION is strictly prohibited.

\*/

syntax = "proto3";

package aerial;

service Common {

rpc GetSFN (GenericRequest) returns (SFNReply) {}

rpc GetCpuUtilization (GenericRequest) returns (CpuUtilizationReply) {}

rpc SetPuschH5DumpNextCrc (GenericRequest) returns (DummyReply) {}

rpc GetFAPIStream (FAPIStreamRequest) returns (stream FAPIStreamReply)
{}

}

message GenericRequest {

string name = 1;

}

message SFNReply {

int32 sfn = 1;

int32 slot = 2;

}

message DummyReply {

}

message CpuUtilizationPerCore {

int32 core_id = 1;

int32 utilization_x1000 = 2;

}

message CpuUtilizationReply {

repeated CpuUtilizationPerCore core = 1;

}

message FAPIStreamRequest {

int32 client_id = 1;

int32 total_msgs_requested = 2;

}

message FAPIStreamReply {

int32 client_id = 1;

bytes msg_buf = 2;

bytes data_buf = 3;

}

rpc GetCpuUtilization#

The GetCpuUtilization RPC returns a variable-length array of CPU utilization per-high-priority-core.

CPU utilization is available through the Prometheus node exporter, however the design approach used by Aerial high-priority threads results in a false 100% CPU core utilization per thread. This RPC allows retrieval of the actual CPU utilization of high-priority threads. High-priority threads are pinned to specific CPU cores.

rpc GetFAPIStream#

This RPC requests snooping of one or more (up to infinite number) of SCF FAPI messages. The snooped messages are delivered from the Aerial gRPC server to a third party client. See cuPHY-CP/cuphyoam/examples/aerial_get_l2msgs.py for an example client.

rpc TerminateCuphycontroller#

This RPC message terminates cuPHYController with immediate effect.

rpc CellParamUpdateRequest#

This RPC message updates cell configuration without stopping the cell. Message specification:

message CellParamUpdateRequest {

int32 cell_id = 1;

string dst_mac_addr = 2;

int32 vlan_tci = 3;

}

dst_mac_addr must be in ‘XX:XX:XX:XX:XX:XX’ format.

vlan_tci must include the 16-bit TCI value of 802.1Q tag.

List of Parameters Supported by Dynamic OAM via gRPC and CONFIG.request (M-plane)#

The Configuration unit is accross all cells/per cell config. The Cell outage is either in-service or out-of-service.

Note

With OAM commands, you can use localhost for the $SERVER_IP when running on DU server. Otherwise, use the DU server numeric IP address. $CELL_ID is the mplane id, which starts from 1. The default values of the parameters can be found in the corresponding cuphycontroller YAML config file: $cuBB_SDK/cuPHY-CP/cuphycontroller/config/cuphycontroller_xxx.yaml

ru_type#

Parameter Name: ru_type
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --ru_type $RU_TYPE

Notes: $RU_TYPE : 1 for FXN_RU, 2 for FJT_RU, 3 for OTHER_RU(including ru_emulator)

nic#

Parameter Name: nic
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --nic $NIC

Notes: The nic PCIe address. It has to be one of the nic ports configured in cuphycontroller YAML file.

dst_mac_addr#

Parameter Name: dst_mac_addr
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --dst_mac_addr $DST_MAC_ADDR --vlan_id $VLAN_ID --pcp $PCP

Notes: The dst_mac_addr, vlan_id, and pcp parameters must be updated together.

vlan_id#

Parameter Name: vlan_id
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --dst_mac_addr $DST_MAC_ADDR --vlan_id $VLAN_ID --pcp $PCP

Notes: The dst_mac_addr, vlan_id, and pcp parameters must be updated together.

pcp#

Parameter Name: pcp
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --dst_mac_addr $DST_MAC_ADDR --vlan_id $VLAN_ID --pcp $PCP

Notes: The dst_mac_addr, vlan_id, and pcp parameters must be updated together.

dl_iq_data_fmt#

Parameter Name: dl_iq_data_fmt
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --dl_comp_meth $COMP_METH --dl_bit_width $BIT_WIDTH

ul_iq_data_fmt#

Parameter Name: ul_iq_data_fmt
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --ul_comp_meth $COMP_METH --ul_bit_width $BIT_WIDTH

exponent_dl#

Parameter Name: exponent_dl
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --exponent_dl $EXPONENT_DL

exponent_ul#

Parameter Name: exponent_ul
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --exponent_ul $EXPONENT_UL

prusch_prb_stride#

Parameter Name: prusch_prb_stride
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --pusch_prb_stride $PUSCH_PRB_STRIDE

prach_prb_stride#

Parameter Name: prach_prb_stride
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --prach_prb_stride $PRACH_PRB_STRIDE

max_amp_ul#

Parameter Name: max_amp_ul
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --max_amp_ul $MAX_AMP_UL

section_3_time_offset#

Parameter Name: section_3_time_offset
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --section_3_time_offset $SECTION_3_TIME_OFFSET

fh_distance_range#

Parameter Name: fh_distance_range
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --fh_distance_range $FH_DISTANCE_RANGE

Notes: $FH_DISTANCE_RANGE: 0 for 0~30km, 1 for 20~50km. Suppose the following are the default configs in the cuhycontroller YAML config file that correspond to FH_DISTANCE_RANGE option 0 (0~30km).

t1a_max_up_ns : d1

t1a_max_cp_ul_ns : d2

ta4_min_ns : d3

ta4_max_ns : d4

Updating the FH_DISTANCE_RANGE option to 1 (20~50km) adjusts the following values:

t1a_max_up_ns :  d1+$FH_EXTENSION_DELAY_ADJUSTMENT

t1a_max_cp_ul_ns :  d2+$FH_EXTENSION_DELAY_ADJUSTMENT

ta4_min_ns :  d3+$FH_EXTENSION_DELAY_ADJUSTMENT

ta4_max_ns :  d4+$FH_EXTENSION_DELAY_ADJUSTMENT

$FH_EXTENSION_DELAY_ADJUSTMENT is 100us for now and can be tuned in source file:

${cuBB_SDK}/cuPHY-CP/cuphydriver/include/constant.hpp#L207

static constexpr uint32_t FH_EXTENSION_DELAY_ADJUSTMENT = 100000;//100us

ul_gain_calibration#

Parameter Name: ul_gain_calibration
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --ul_gain_calibration $UL_GAIN_CALIBRATION

lower_guard_bw#

Parameter Name: lower_guard_bw
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --lower_guard_bw $LOWER_GUARD_BW

ref_dl#

Parameter Name: ref_dl
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --cell_id $CELL_ID --ref_dl $REF_DL

attenuation_db#

Parameter Name: attenuation_db
Configuration Unit: per-cell config
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_param_attn_update.py $CELL_ID $ATTENUATION_DB

gps_alpha#

Parameter Name: gps_alpha
Configuration Unit: accross all cells
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --gps_alpha $GPS_ALPHA

gps_beta#

Parameter Name: gps_beta
Configuration Unit: accross all cells
Cell Outage: out-of-service

OAM Command:

cd $cuBB_SDK/build/cuPHY-CP/cuphyoam && python3 $cuBB_SDK/cuPHY-CP/cuphyoam/examples/aerial_cell_multi_attrs_update.py --server_ip $SERVER_IP --gps_beta $GPS_BETA

Notes: All cells have to be in idle state before configuring this param

prachRootSequenceIndex#

Parameter Name: prachRootSequenceIndex
Configuration Unit: per-cell config
Cell Outage: out-of-service
OAM Command: Via FAPI CONFIG.request. Refer to the Dynamic PRACH Configuration and Init Sequence Test section for more details.

prachZeroCorrConf#

Parameter Name: prachZeroCorrConf
Configuration Unit: per-cell config
Cell Outage: out-of-service
OAM Command: Via FAPI CONFIG.request. Refer to the Dynamic PRACH Configuration and Init Sequence Test section for more details.

numPrachFdOccasions#

Parameter Name: numPrachFdOccasions
Configuration Unit: per-cell config
Cell Outage: out-of-service
OAM Command: Via FAPI CONFIG.request. Refer to the Dynamic PRACH Configuration and Init Sequence Test section for more details.

restrictedSetConfig#

Parameter Name: restrictedSetConfig
Configuration Unit: per-cell config
Cell Outage: out-of-service
OAM Command: Via FAPI CONFIG.request. Refer to the Dynamic PRACH Configuration and Init Sequence Test section for more details.

prachConfigIndex#

Parameter Name: prachConfigIndex
Configuration Unit: per-cell config
Cell Outage: out-of-service
OAM Command: Via FAPI CONFIG.request. Refer to the Dynamic PRACH Configuration and Init Sequence Test section for more details.

K1#

Parameter Name: K1
Configuration Unit: per-cell config
Cell Outage: out-of-service
OAM Command: Via FAPI CONFIG.request. Refer to the Dynamic PRACH Configuration and Init Sequence Test section for more details.

M-Plane Hybrid Mode ORAN YANG Model Provisioning#

Aerial supports M-plane hybrid mode, which allows NMS/SMO, using ORAN YANG data models to pass RU capabilities, C/U–plane transport config, and U-plane config to L1.

Here is the high level sequence diagram:

../../../_images/m-plane-hybrid-mode-sequence-diagram.png

Data Model Procedures-Yang data tree write procedure

../../../_images/yang-data-tree-write-procedure.png

Data Model Procedures-Yang data tree read procedure

../../../_images/yang-data-tree-read-procedure.png

Data Model Transfer APIs(gRPC ProtoBuf contract)#

syntax = "proto3";
package p9_messages.v1;

service P9Messages {
   rpc HandleMsg (Msg) returns (Msg) {}
 }

message Msg
{
     Header header = 1;
     Body body = 2;
}

message Header
{
     string msg_id = 1;            // Message identifier to
                                   // 1) Identify requests and notifications
                                   // 2) Correlate requests and response
     optional string oru_name = 2; // The name (identifier) of the O-RU, if present.
     int32 vf_id = 3;              // The identifier for the FAPI VF ID
     int32 phy_id = 4;             // The identifier for the FAPI PHY ID
     optional int32 trp_id = 5;    // The identifier PHY’s TRP, if any
}

message Body
{
     oneof msg_body
     {
          Request request = 1;
          Response response = 2;
     }
}

message Request
{
     oneof req_type
     {
          Get get = 1;
          EditConfig edit_config = 2;
     }
}

message Response
{
     oneof resp_type
     {
          GetResp get_resp = 1;
          EditConfigResp edit_config_resp = 2;
     }
}

message Get { repeated bytes filter = 1; }

message GetResp
{
     Status status_resp = 1;
     bytes data = 2;
}

message EditConfig
{
     bytes delta_config = 1; // List of Node changes with the associated operation to apply to the node
}

message EditConfigResp { Status status_resp = 1; }

message Error
{                                // Type of error as defined in RFC 6241 section 4.3
     string error_type = 1;      // Error type defined in RFC 6241, Appendix B
     string error_tag = 2;       // Error tag defined in RFC 6241, Appendix B
     string error_severity = 3;  // Error severity defined in RFC 6241, Appendix B
     string error_app_tag = 4;   // Error app tag defined in RFC 6241, Appendix B
     string error_path = 5;      // Error path defined in RFC 6241, Appendix B
     string error_message = 6;   // Error message defined in RFC 6241, Appendix B
}

message Status
{
     enum StatusCode
     {
          OK = 0;
          ERROR_GENERAL = 1;
     }
     StatusCode status_code = 1;
     repeated Error error = 2; // Optional: Error information
}

List of Parameters Supported by YANG Model#

The Configuration unit is accross all cells/per cell config. The Cell outage is either in-service or out-of-service.

o-du-mac-address#

Parameter Name: o-du-mac-address
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: DU side mac address. It is translated to the corresponding ‘nic’ internally.
YANG Model:
- o-ran-uplane-conf.yang
- o-ran-processing-element.yang
- ietf-interfaces.yang
xpath: /processing-elements/ru-elements/transport-flow/eth-flow/o-du-mac-address

ru-mac-address#

Parameter Name: ru-mac-address
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: MAC address of the corresponding RU
YANG Model:
- o-ran-uplane-conf.yang
- o-ran-processing-element.yang
- ietf-interfaces.yang
xpath: /processing-elements/ru-elements/transport-flow/eth-flow/ru-mac-address

vlan-id#

Parameter Name: vlan-id
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: VLAN ID
YANG Model:
- ietf-interfaces.yang
- o-ran-interfaces.yang
- o-ran-processing-element.yang
xpath: /processing-elements/ru-elements/transport-flow/eth-flow/vlan-id

pcp#

Parameter Name: pcp
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: VLAN priority level
YANG Model:
- ietf-interfaces.yang
- o-ran-interfaces.yang
- o-ran-processing-element.yang
xpath: /interfaces/interface/class-of-service/u-plane-marking

ul_iq_data_fmt: bit_width#

Parameter Name: ul_iq_data_fmt: bit_width
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: Indicate the bit length after compression.
- BFP values: 9 and 14; 16 for no compression
- Fixed point values: Currently only supports 16
YANG Model: o-ran-uplane-conf.yang
xpath: /user-plane-configuration/low-level-tx-endpoints/compression/iq-bitwidth

ul_iq_data_fmt: comp_meth#

Parameter Name: ul_iq_data_fmt: comp_meth
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: Indicates the UL compression method.
- BFP values: BLOCK_FLOATING_POINT
- Fixed point values: NO_COMPRESSION
YANG Model: o-ran-uplane-conf.yang
xpath: /user-plane-configuration/low-level-tx-endpoints/compression/compression-method

dl_iq_data_fmt: bit_width#

Parameter Name: dl_iq_data_fmt: bit_width
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: Indicates the bit length after compression.
- BFP values: 9 and 14; 16 for no compression
- Fixed point values: Currently only supports 16
YANG Model: o-ran-uplane-conf.yang
xpath: /user-plane-configuration/low-level-rx-endpoints/compression/iq-bitwidth

dl_iq_data_fmt: comp_meth#

Parameter Name: dl_iq_data_fmt: comp_meth
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description: Indicates the DL compression method.
- BFP values: BLOCK_FLOATING_POINT
- Fixed point values: NO_COMPRESSION
YANG Model: o-ran-uplane-conf.yang
xpath: /user-plane-configuration/low-level-rx-endpoints/compression/compression-method

exponent_dl#

Parameter Name: exponent_dl
Configuration Unit: Per-cell config
Cell Outage: Out-of-service
Description:
YANG Model:
- o-ran-uplane-conf.yang
- o-ran-compression-factors.yang
xpath: /user-plane-configuration/low-level-rx-endpoints/compression/exponent

exponent_ul#

Parameter Name: exponent_ul
Configuration Unit: Per-cell config
Cell Outage: out-of-service
Description:
YANG Model:
- o-ran-uplane-conf.yang
- o-ran-compression-factors.yang
xpath: /user-plane-configuration/low-level-tx-endpoints/compression/exponent

Reference Examples#

The following is a client-side reference implementation:

$cuBB_SDK/cuPHY-CP/cuphyoam/examples/p9_msg_client_grpc_test.cpp

Below are a few examples for update and retrieval of related params.

Update ru-mac-address, vlan-id, and pcp#

#step 1: Edit $cuBB_SDK/cuPHY-CP/cuphyoam/examples/mac_vlan_pcp.xml and update ru_mac, vlan_id and pcp accordingly
#step 2: Run below cmd to do the provisioning
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd edit_config --xml_file $cuBB_SDK/cuPHY-CP/cuphyoam/examples/mac_vlan_pcp.xml
#step 3: Run below cmds to retrieve the config
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd get --xpath /o-ran-processing-element:processing-elements
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd get --xpath /ietf-interfaces:interfaces

Update o-du-mac-address(du nic port)#

#step 1: Edit $cuBB_SDK/cuPHY-CP/cuphyoam/examples/nic_du_mac.xml and update du_mac, which is translated to the corresponding nic port internally
#step 2: Run below cmd to do the provisioning
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd edit_config --xml_file $cuBB_SDK/cuPHY-CP/cuphyoam/examples/nic_du_mac.xml
#step 3: Run below cmd to retrieve the config
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd get --xpath /o-ran-processing-element:processing-elements

Update DL/UL IQ data format#

#step 1: Edit $cuBB_SDK/cuPHY-CP/cuphyoam/examples/iq_data_fmt.xml and update DL/UL IQ data format accordingly
(compression-method: BLOCK_FLOATING_POINT for BFP or NO_COMPRESSION for fixed point)
(iq-bitwidth: 9, 14, 16 for BFP or 16 for fixed point)
#step 2: Run below cmd to do the provisioning
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd edit_config --xml_file $cuBB_SDK/cuPHY-CP/cuphyoam/examples/iq_data_fmt.xml
#step 3: Run below cmd to retrieve the config
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd get --xpath /o-ran-uplane-conf:user-plane-configuration

Update dl and ul Exponent#

#step 1: Edit $cuBB_SDK/cuPHY-CP/cuphyoam/examples/dl_ul_exponent.xml and dl and ul exponent accordingly
#step 2: Run below cmd to do the provisioning
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd edit_config --xml_file $cuBB_SDK/cuPHY-CP/cuphyoam/examples/dl_ul_exponent.xml
#step 3: Run below cmd to retrieve the config
$cuBB_SDK/build/cuPHY-CP/cuphyoam/p9_msg_client_grpc_test --phy_id $mplane_id --cmd get --xpath /o-ran-uplane-conf:user-plane-configuration

Logging#

Log Levels#

Nvlog supports the following log levels: Fatal, Error, Console, Warning, Info, Debug, and Verbose.

A Fatal log message results in process termination. For other log levels, the process continues execution. A typical deployment sends Fatal, Error, and Console levels to stdout. Console level is for printing something that is neither a warning nor an error, but you want to print to stdout.

nvlog#

This YAML container contains parameters related to nvlog configuration, see nvlog_config.yaml.

name#

Used to create the shared memory log file. Shared memory handle is /dev/shm/${name}.log and temp logfile is named /tmp/${name}.log.

primary#

In all processes logging to the same file, set the first starting porcess to be primary, set others to be secondary.

shm_log_level#

Sets the log level threshold for the high performance shared memory logger. Log messages with a level at or below this threshold are sent to the shared memory logger.

Log levels: 0 - NONE, 1 - FATAL, 2 - ERROR, 3 - CONSOLE, 4 - WARNING, 5 - INFO, 6 - DEBUG, 7 - VERBOSE

Setting the log level to LOG_NONE means no logs are sent to the shared memory logger.

console_log_level#

Sets the log level threshold for printing to the console. Log messages with a level at or below this threshold are printed to stdout.

max_file_size_bits#

Define the rotating log file /var/log/aerial/${name}.log size. Size = 2 ^ bits.

shm_cache_size_bits#

Define the SHM cache file /dev/shm/${name}.log size. Size = 2 ^ bits.

log_buf_size#

Max log string length of one time call of the nvlog API.

max_threads#

The maximum number of threads that are using nvlog all together.

save_to_file#

Whether to copy and save the SHM cache log to a rotating log file under /var/log/aerial/ folder.

cpu_core_id#

CPU core ID for the background log saving thread. -1 means the core is not pinned.

prefix_opts#

bit5 - thread_id bit4 - sequence number bit3 - log level bit2 - module type bit1 - date bit0 - time stamp

Refer to nvlog.h for more details.

Metrics#

The OAM Metrics API is used internally by cuPHY-CP components to report metrics (counters, gauges, and histograms). The metrics are exposed via a Prometheus Aerial exporter.

Host Metrics#

Host metrics are provided via the Prometheus node exporter. The node exporter provides many thousands of metrics about the host hardware and OS, such as but not limited to:

CPU statistics
Disk statistics
Filesystem statistics
Memory statistics
Network statistics

See prometheus/node_exporter and https://prometheus.io/docs/guides/node-exporter/ for detailed documentation on the node exporter.

GPU Metrics#

GPU hardware metrics are provided through the GPU Operator via the Prometheus DCGM-Exporter. The DCGM-Exporter provides many thousands of metrics about the GPU and PCIe bus connection, such as but not limited to:

GPU hardware clock rates
GPU hardware temperatures
GPU hardware power consumption
GPU memory utilization
GPU hardware errors including ECC
PCIe throughput

See NVIDIA/gpu-operator for details on the GPU operator.

See NVIDIA/gpu-monitoring-tools for detailed documentation on the DCGM-Exporter.

An example Grafana dashboard is available at https://grafana.com/grafana/dashboards/12239.

Aerial Metric Naming Conventions#

In addition to metrics available through the node exporter and DCGM-Exporter, Aerial exposes several application metrics.

Metric names are per https://prometheus.io/docs/practices/naming/ and follows the format aerial_<component>_<sub-component>_<metricdescription>_<units>.

Metric types are per https://prometheus.io/docs/concepts/metric_types/.

The component and sub-component definitions are in the table below. For each metric, the description, metric type, and metric tags are provided. Tags are a way of providing granularity to metrics without creating new metrics.

Comp onent	Sub -Component	Description
cuphycp		cuPHY Control Plane application
	fapi	L2/L1 interface metrics
	cplane	Fronthaul C-plane metrics
	uplane	Fronthaul U-plane metrics
	net	Generic network interface metrics
cuphy		cuPHY L1 library
	pbch	Physical Broadcast Channel metrics
	pdsch	Physical Downlink Shared Channel metrics
	pdcch	Physical Downlink Common Channel metrics
	pusch	Physical Uplink Shared Channel metrics
	pucch	Physical Uplink Common Channel metrics
	prach	Physical Random Access Channel metrics

Metrics Exporter Port#

Metric tags:

msg_type: “type of PDU”
cell: “cell number”

cell: “cell number”
channel: One of “prach” or “pusch”

nic: “nic port BDF address”

Details:

This gauge shows the TX scheduling timestamp jitter, that is, how far each individual Clock Queue (CQ) completion is from UTC time.

If you set CQ completion frequency to 2MHz (tx_pp=500), you might
see the following completions:
cqe 0 at 0 ns
cqe 1 at 505 ns
cqe 2 at 996 ns
cqe 3 at 1514 ns
…

tx_pp_jitter is the time difference between two consecutive CQ completions.

aerial_cuphycp_net_tx_accu_sched_clock_queue_wander_ns#

Current measurement of the divergence of Clock Queue (CQ) completions from UTC time over a longer time period (~8s).

Metric type: Gauge

Metric tags:

nic: “nic port BDF address”

cell: “cell number”
le: Histogram bin less-than-or-equal-to for 2, 4, …, 24, +inf bins.

PRACH Metrics#

aerial_cuphy_prach_rx_preambles_total#

Counts the total number of detected preambles in PRACH channel.

Metric type: Counter

Metric tags:

cell: “cell number”

cell: “cell number”
le: Histogram bin less-than-or-equal-to for 2, 4, …, 24, +inf bins.