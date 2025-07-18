Field Description Data Type and Constraints Recommended/Typical Value

aspect_ratios_global Anchor boxes of aspect ratios defined in aspect_ratios_global will be generated for each feature layer used for prediction. Note: Only one of aspect_ratios_global or aspect_ratios is required. string “[1.0, 2.0, 0.5]”

aspect_ratios The length of the outer list must be equivalent to the number of feature layers used for anchor box generation. And the i-th layer will have anchor boxes with aspect ratios defined in aspect_ratios[i]. Note: Only one of aspect_ratios_global or aspect_ratios is required. string “[[1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0,2.0,0.5], [1.0, 2.0, 0.5, 3.0, 0.33]]”

two_boxes_for_ar1 This setting is only relevant for layers that have 1.0 as the aspect ratio. If two_boxes_for_ar1 is true, two boxes will be generated with an aspect ratio of 1. One whose scale is the scale for this layer and the other one whose scale is the geometric mean of the scale for this layer and the scale for the next layer. Boolean True

clip_boxes If true, all corner anchor boxes will be truncated so they are fully inside the feature images. Boolean False

scales scales is a list of positive floats containing scaling factors per convolutional predictor layer. This list must be one element longer than the number of predictor layers, so if two_boxes_for_ar1 is true, the second aspect ratio 1.0 box for the last layer can have a proper scale. Except for the last element in this list, each positive float is the scaling factor for boxes in that layer. For example, if for one layer the scale is 0.1, then the generated anchor box with aspect ratio 1 for that layer (the first aspect ratio 1 box if two_boxes_for_ar1 is true) will have its height and width as 0.1*min(img_h, img_w). min_scale and max_scale are two positive floats. If both of them appear in the config, the program can automatically generate the scales by evenly splitting the space between min_scale and max_scale. string “[0.05, 0.1, 0.25, 0.4, 0.55, 0.7, 0.85]”

min_scale/max_scale If both appear in the config, scales will be generated evenly by splitting the space between min_scale and max_scale. float –

loss_loc_weight This is a positive float controlling how much location regression loss should contribute to the final loss. The final loss is calculated as classification_loss + loss_loc_weight * loc_loss float 1.0

focal_loss_alpha Alpha is the focal loss equation. float 0.25

focal_loss_gamma Gamma is the focal loss equation. float 2.0

variances Variances should be a list of 4 positive floats. The four floats, in order, represent variances for box center x, box center y, log box height, log box width. The box offset for box center (cx, cy) and log box size (height/width) w.r.t. anchor will be divided by their respective variance value. Therefore, larger variances result in less significant differences between two different boxes on encoded offsets. –

steps An optional list inside quotation marks whose length is the number of feature layers for prediction. The elements should be floats or tuples/lists of two floats. Steps define how many pixels apart the anchor box center points should be. If the element is a float, both vertical and horizontal margin is the same. Otherwise, the first value is step_vertical and the second value is step_horizontal. If steps are not provided, anchor boxes will be distributed uniformly inside the image. string –

offsets An optional list of floats inside quotation marks whose length is the number of feature layers for prediction. The first anchor box will have offsets[i]*steps[i] pixels margin from the left and top borders. If offsets are not provided, 0.5 will be used as default value. string –

arch Backbone for feature extraction. Currently, “resnet”, “vgg”, “darknet”, “googlenet”, “mobilenet_v1”, “mobilenet_v2” and “squeezenet”, “efficientnet_b0” are supported. string resnet

nlayers Number of conv layers in specific arch. For “resnet”, 10, 18, 34, 50 and 101 are supported. For “vgg”, 16 and 19 are supported. For “darknet”, 19 and 53 are supported. All other networks don’t have this configuration and users should just delete this config from the config file. Unsigned int –

freeze_bn Whether to freeze all batch normalization layers during training. boolean False

freeze_blocks The list of block IDs to be frozen in the model during training. You can choose to freeze some of the CNN blocks in the model to make the training more stable and/or easier to converge. The definition of a block is heuristic for a specific architecture. For example, by stride or by logical blocks in the model, etc. However, the block ID numbers identify the blocks in the model in a sequential order so you don’t have to know the exact locations of the blocks when you do training. A general principle to keep in mind is: the smaller the block ID, the closer it is to the model input; the larger the block ID, the closer it is to the model output. list(repeated integers) ResNet series. For the ResNet series, the block IDs valid for freezing is any subset of [0, 1, 2, 3] (inclusive)

VGG series. For the VGG series, the block IDs valid for freezing is any subset of[1, 2, 3, 4, 5] (inclusive)

GoogLeNet. For the GoogLeNet, the block IDs valid for freezing is any subset of[0, 1, 2, 3, 4, 5, 6, 7] (inclusive)

MobileNet V1. For the MobileNet V1, the block IDs valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] (inclusive)

MobileNet V2. For the MobileNet V2, the block IDs valid for freezing is any subset of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] (inclusive)

DarkNet. For the DarkNet 19 and DarkNet 53, the block IDs valid for freezing is any subset of [0, 1, 2, 3, 4, 5] (inclusive) –

n_kernels This setting controls the number of convolutional layers in the RetinaNet subnets for classification and anchor box regression. A larger value generates a larger network and usually means the network is harder to train. Unsigned int 2

feature_size This setting controls the number of channels of the convolutional layers in the RetinaNet subnets for classification and anchor box regression. A larger value gives a larger network and usually means the network is harder to train. Note that RetinaNet FPN generates 5 feature maps, thus the scales field requires a list of 6 scaling factors. The last number is not used if two_boxes_for_ar1 is set to False. There are also three underlying scaling factors at each feature map level (2^0, 2^⅓, 2^⅔ ). Unsigned int 256