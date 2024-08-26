Below is a sample for the EfficientDet spec file. It has 5 major components: model_config , training_config , eval_config , augmentation_config and dataset_config . The format of the spec file is a protobuf text ( .prototxt ) message, and each of its fields can be either a basic data type or a nested message.

Copy Copied! training_config { train_batch_size: 16 iterations_per_loop: 10 checkpoint_period: 10 num_examples_per_epoch: 14700 num_epochs: 300 model_name: 'efficientdet-d0' profile_skip_steps: 100 tf_random_seed: 42 lr_warmup_epoch: 5 lr_warmup_init: 0.00005 learning_rate: 0.1 amp: True moving_average_decay: 0.9999 l2_weight_decay: 0.00004 l1_weight_decay: 0.0 checkpoint: "/path/to/your/pretrained_model" # pruned_model_path: "/path/to/your/pruned/model" } dataset_config { num_classes: 91 image_size: "512,512" training_file_pattern: "/path/to/coco/train-*" validation_file_pattern: "/path/to/coco/val-*" validation_json_file: "/path/to/coco/annotations/instances_val2017.json" } eval_config { eval_batch_size: 16 eval_epoch_cycle: 10 eval_after_training: True eval_samples: 5000 min_score_thresh: 0.4 max_detections_per_image: 100 } model_config { model_name: 'efficientdet-d0' min_level: 3 max_level: 7 num_scales: 3 } augmentation_config { rand_hflip: True random_crop_min_scale: 0.1 random_crop_max_scale: 2.0 }

The top level structure of the spec file is summarized in the following tables:

The training configuration( training_config ) defines the parameters needed for training, evaluation, and inference. Details are summarized in the table below.

Field Description Data Type and Constraints Recommended/Typical Value train_batch_size The batch size for each GPU. The effective batch size is batch_size_per_gpu * num_gpus . Unsigned int, positive 16 num_epochs The number of epochs to train the network Unsigned int, positive 300 num_examples_per _epoch The total number of images in the training set divided by the number of GPUs Unsigned int, positive – checkpoint The path to the pretrained model, if any String – pruned_model_path The path to the TAO pruned model for re-training, if any String – checkpoint_period The number of training epochs that should run per model checkpoint/validation Unsigned int, positive 10 amp A flag specifying whether to use mixed precision training Boolean – moving_average_decay The moving average decay Float 0.9999 l2_weight_decay The L2 weight decay Float – l1_weight_decay The L1 weight decay Float – lr_warmup_epoch The number of warmup epochs in the learning rate schedule Unsigned int, positive – lr_warmup_init The initial learning rate in the warmup period Float – learning_rate The maximum learning rate Float – tf_random_seed The random seed Unsigned int, positive 42 clip_gradients_norm The clip gradients by the norm value Float 5 skip_checkpoint _variables If specified, the weights of the layers with matching regular expressions will not be loaded. This is especially helpful for transfer learning. string “-predict*”

The evaluation configuration ( eval_config ) defines the parameters needed for the evaluation either during training or standalone. Details are summarized in the table below.

Field Description Data Type and Constraints Recommended/Typical Value eval_epoch_cycle The number of training epochs that should run per validation Unsigned int, positive 10 max_detections_per_image The maximum number of detections to visualize Unsigned int, positive 100 min_score_thresh The minimum confidence of the predicted box that can be considered a match Float 0.5 eval_batch_size The batch size for each GPU. The effective batch size is batch_size_per_gpu * num_gpus Unsigned int, positive 16 eval_samples The number of samples for evaluation Unsigned int –

The data configuration ( data_config ) specifies the input data source and format. This is used for training, evaluation, and inference. A detailed description is summarized in the table below.

Field Description Data Type and Constraints Recommended/Typical Value image_size The image dimension as a tuple within quote marks: “(height, width)”. This indicates the dimension of the resized and padded input. String “(512, 512)” training_file_pattern The TFRecord path for training String – validation_file_pattern The TFRecord path for validation String – val_json_file The annotation file path for validation String – num_classes The number of classes. If there are N categories in the annotation, num_classes should be N+1 (background class). Unsigned int – max_instances_per_image The maximum number of object instances to parse (default: 100) Unsigned int 100 skip_crowd_during_training Specifies whether to skip crowd during training Boolean True

The model configuration ( model_config ) specifies the model structure. A detailed description is summarized in the table below.

Field Description Data Type and Constraints Recommended/Typical Value model_name The EfficientDet model name string “efficientdet_d0” min_level The minimum level of the output feature pyramid Unsigned int 3 (only 3 is supported) max_level The maximum level of the output feature pyramid Unsigned int 7 (only 7 is supported) num_scales The number of anchor octave scales on each pyramid level (e.g. if set to 3, the anchor scales are [2^0, 2^(1/3), 2^(2/3)]) Unsigned int 3 max_instances_per_image The maximum number of object instances to parse (default: 100) Unsigned int 100 aspect_ratios A list of tuples representing the aspect ratios of anchors on each pyramid level string “[(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]” anchor_scale The scale of the base-anchor size to the feature-pyramid stride Unsigned int 4

The augmentation_config parameter defines image augmentation after preprocessing.