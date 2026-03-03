Field Description Data Type and Constraints Recommended/Typical Value

big_anchor_shape, mid_anchor_shape, and small_anchor_shape These settings should be 1-d arrays inside quotation marks. The elements of those arrays are tuples representing the pre-defined anchor shape in the order of “width, height”. The default YOLOv4 configuration has nine predefined anchor shapes. They are divided into three groups corresponding to big, medium, and small objects. The detection output corresponding to different groups are from different depths in the network. You should run the kmeans command ( tao model yolo_v4 kmeans ) to determine the best anchor shapes for your dataset and put those anchor shapes in the spec file. It is worth noting that the number of anchor shapes for any field is not limited to three; you only need to specify one anchor shape in each of those three fields. string Use the tao model yolo_v4 kmeans command to those shapes

box_matching_iou This field should be a float number between 0 and 1. Any anchor with at least this IoU to any ground truth boxes will be matched to the ground truth box it has the largest IoU with. In contrast with YOLOv3, one ground truth box might match to multiple anchors in YOLOv4. float 0.5

matching_neutral_box_iou This field should be a float number between 0.25 and 1. Any inferred bounding box with at least this IoU to any ground truth boxes will not be treated as negative box and will be assigned 0 for its negative objectiveness loss (neutral box) float 0.5

loss_loc_weight, loss_neg_obj_weights, and loss_class_weights These loss weights can be configured as float numbers. The YOLOv4 loss is a summation of localization loss, negative objectiveness loss, positive objectiveness loss, and classification loss. The weight of positive objectiveness loss is set to 1, while the weights of other losses are read from the config file. float loss_loc_weight: 5.0 loss_neg_obj_weights: 50.0 loss_class_weights: 1.0

label_smoothing Label smoothing applied to classification loss. float of [0, 0.3] 0, 0.1, 0.2

big_grid_xy_extend, mid_grid_xy_extend, and small_grid_xy_extend These settings should be small positive floats. The calculated box center relative to the anchor box will be re-calibrated according to following: center_xy = calculated_xy * (grid_xy_extend + 1.0) - grid_xy_extend / 2.0 The default YOLOv4 has nine predefined anchor shapes. They are divided into three groups corresponding to big, medium, and small objects. The detection output corresponding to different groups are from different depths in the network. The three different grid_xy_extend configs allow users to define different grid_xy_extend values for different anchor-shape groups. The grid_xy_extend settings make it easier for the network to propose an inferenced box with a center that is close to or on the anchor border. float of [0, 0.3] 0.05, 0.1, 0.2

arch The backbone for feature extraction. Currently, “resnet”, “vgg”, “darknet”, “googlenet”, “mobilenet_v1”, “mobilenet_v2”, “cspdarknet”, and “squeezenet” are supported. string resnet

activation The activation type used in YOLOv4 CSPDarkNet backbone. Only “relu”, “leaky_relu” and “mish” are supported. For other backbones, this parameter is not useful. string “leaky_relu”

nlayers The number of conv layers in a specific architecture. For “resnet”, 10, 18, 34, 50 and 101 are supported. For “vgg”, 16 and 19 are supported. For “darknet” or “cspdarknet”, 19 and 53 are supported. All other networks don’t have this configuration, in which case you should just delete this config from the config file. Unsigned int –

freeze_bn A flag specifying whether to freeze all batch normalization layers during training. Boolean False

freeze_blocks The list of block IDs to be frozen in the model during training. You can choose to freeze some of the CNN blocks in the model to make the training more stable and/or easier to converge. The definition of a block is heuristic for a specific architecture (for example, by stride or by logical blocks in the model). However, the block ID numbers identify the blocks in the model in a sequential order so you don’t have to know the exact locations of the blocks when you do training. A general principle to keep in mind is that the smaller the block ID, the closer it is to the model input; the larger the block ID, the closer it is to the model output. You can divide the whole model into several blocks and optionally freeze a subset of it. Note that for FasterRCNN, you can only freeze the blocks that are before the ROI pooling layer. Any layer after the ROI pooling layer will not be frozen anyway. For different backbones, the number of blocks and the block ID for each block are different. It deserves some detailed explanations on how to specify the block IDs for each backbone. list(repeated integers) ResNet series. For the ResNet series, the block IDs valid for freezing is any subset of [0, 1, 2, 3] (inclusive)

VGG series. For the VGG series, the block IDs valid for freezing is any subset of[1, 2, 3, 4, 5] (inclusive)

GoogLeNet. For the GoogLeNet, the block IDs valid for freezing is any subset of[0, 1, 2, 3, 4, 5, 6, 7] (inclusive)

MobileNet V1. For the MobileNet V1, the block IDs valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] (inclusive)

MobileNet V2. For the MobileNet V2, the block IDs valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] (inclusive)

DarkNet. For the DarkNet 19 and DarkNet 53, the block IDs valid for freezing is any subset of [0, 1, 2, 3, 4, 5] (inclusive) –