server_status.proto ==================== .. cpp:namespace:: nvidia::inferenceserver .. cpp:var:: message StatDuration Statistic collecting a duration metric. .. cpp:var:: uint64 count Cumulative number of times this metric occurred. .. cpp:var:: uint64 total_time_ns Total collected duration of this metric in nanoseconds. .. cpp:var:: message StatusRequestStats Statistics collected for Status requests. .. cpp:var:: StatDuration success Total time required to handle successful Status requests, not including HTTP or gRPC endpoint termination time. .. cpp:var:: message HealthRequestStats Statistics collected for Health requests. .. cpp:var:: StatDuration success Total time required to handle successful Health requests, not including HTTP or gRPC endpoint termination time. .. cpp:var:: message ModelControlRequestStats Statistics collected for ModelControl requests. .. cpp:var:: StatDuration success Total time required to handle successful ModelControl requests, not including HTTP or gRPC endpoint termination time. .. cpp:var:: message SharedMemoryControlRequestStats Statistics collected for SharedMemoryControl requests. .. cpp:var:: StatDuration success Total time required to handle successful SharedMemoryControl requests, not including HTTP or gRPC endpoint termination time. .. cpp:var:: message RepositoryRequestStats Statistics collected for Repository requests. .. cpp:var:: StatDuration success Total time required to handle successful Repository requests, not including HTTP or gRPC endpoint termination time. .. cpp:var:: message InferRequestStats Statistics collected for Infer requests. .. cpp:var:: StatDuration success Total time required to handle successful Infer requests, not including HTTP or GRPC endpoint handling time. .. cpp:var:: StatDuration failed Total time required to handle failed Infer requests, not including HTTP or GRPC endpoint handling time. .. cpp:var:: StatDuration compute Time required to run inferencing for an inference request; including time copying input tensors to GPU memory, time executing the model, and time copying output tensors from GPU memory. .. cpp:var:: StatDuration queue Time an inference request waits in scheduling queue for an available model instance. .. cpp:enum:: ModelReadyState Readiness status for models. .. cpp:enumerator:: ModelReadyState::MODEL_UNKNOWN = 0 The model is in an unknown state. The model is not available for inferencing. .. cpp:enumerator:: ModelReadyState::MODEL_READY = 1 The model is ready and available for inferencing. .. cpp:enumerator:: ModelReadyState::MODEL_UNAVAILABLE = 2 The model is unavailable, indicating that the model failed to load or has been implicitly or explicitly unloaded. The model is not available for inferencing. .. cpp:enumerator:: ModelReadyState::MODEL_LOADING = 3 The model is being loaded by the inference server. The model is not available for inferencing. .. cpp:enumerator:: ModelReadyState::MODEL_UNLOADING = 4 The model is being unloaded by the inference server. The model is not available for inferencing. .. cpp:enum:: ModelReadyStateReason Detail associated with a model's readiness status. .. cpp:var:: string message The message that explains the cause of being in the current readiness state. .. cpp:var:: message ModelVersionStatus Status for a version of a model. .. cpp:var:: ModelReadyState ready_state Current readiness state for the model. .. cpp:var:: ModelReadyStateReason ready_state_reason Supplemental information regarding the current readiness state. .. cpp:var:: map infer_stats Inference statistics for the model, as a map from batch size to the statistics. A batch size will not occur in the map unless there has been at least one inference request of that batch size. .. cpp:var:: uint64 model_execution_count Cumulative number of model executions performed for the model. A single model execution performs inferencing for the entire request batch and can perform inferencing for multiple requests if dynamic batching is enabled. .. cpp:var:: uint64 model_inference_count Cumulative number of model inferences performed for the model. Each inference in a batched request is counted as an individual inference. .. cpp:var:: uint64 last_inference_timestamp_milliseconds The timestamp of the last inference request made for this model, given as milliseconds since the epoch. .. cpp:var:: message ModelStatus Status for a model. .. cpp:var:: ModelConfig config The configuration for the model. .. cpp:var:: map version_status Duration statistics for each version of the model, as a map from version to the status. A version will not occur in the map unless there has been at least one inference request of that model version. A version of -1 indicates the status is for requests for which the version could not be determined. .. cpp:enum:: ServerReadyState Readiness status for the inference server. .. cpp:enumerator:: ServerReadyState::SERVER_INVALID = 0 The server is in an invalid state and will likely not response correctly to any requests. .. cpp:enumerator:: ServerReadyState::SERVER_INITIALIZING = 1 The server is initializing. .. cpp:enumerator:: ServerReadyState::SERVER_READY = 2 The server is ready and accepting requests. .. cpp:enumerator:: ServerReadyState::SERVER_EXITING = 3 The server is exiting and will not respond to requests. .. cpp:enumerator:: ServerReadyState::SERVER_FAILED_TO_INITIALIZE = 10 The server did not initialize correctly. Most requests will fail. .. cpp:var:: message SharedMemoryRegion The meta-data for the shared memory region registered in the inference server. .. cpp:var:: string name The name for this shared memory region. .. cpp:var:: string shared_memory_key The name of the shared memory region that holds the input data (or where the output data should be written). .. cpp:var:: uint64 offset This is the offset of the shared memory block from the start of the shared memory region. start = offset, end = offset + byte_size; .. cpp:var:: int64 device_id The GPU device ID on which the cudaIPC handle was created. .. cpp:var:: oneof shared_memory_types Types of shared memory identifiers .. cpp:var:: SystemSharedMemory system_shared_memory The status of this system shared memory region. .. cpp:var:: CudaSharedMemory cuda_shared_memory The status of this CUDA shared memory region. .. cpp:var:: uint64 byte_size Size of the shared memory block, in bytes. .. cpp:var:: message ServerStatus Status for the inference server. .. cpp:var:: string id The server's ID. .. cpp:var:: string version The server's version. .. cpp:var:: ServerReadyState ready_state Current readiness state for the server. .. cpp:var:: uint64 uptime_ns Server uptime in nanoseconds. .. cpp:var:: map model_status Status for each model, as a map from model name to the status. .. cpp:var:: StatusRequestStats status_stats Statistics for Status requests. .. cpp:var:: HealthRequestStats health_stats Statistics for Health requests. .. cpp:var:: ModelControlRequestStats model_control_stats Statistics for ModelControl requests. .. cpp:var:: SharedMemoryControlRequestStats shm_control_stats Statistics for SharedMemoryControl requests. .. cpp:var:: RepositoryRequestStats repository_stats Statistics for Repository requests. .. cpp:var:: message SharedMemoryStatus Shared memory status for the inference server. .. cpp:var:: SharedMemoryRegion shared_memory_region (repeated) The list of active/registered shared memory regions. .. cpp:var:: message ModelRepositoryIndex Index of the model repository monitored by the inference server. .. cpp:var:: message ModelEntry The basic information for a model. .. cpp:var:: string name The model's name. .. cpp:var:: ModelEntry models (repeated) The list of models in the model repository.