server_status.proto
====================

.. cpp:namespace:: nvidia::inferenceserver

.. cpp:var:: message StatDuration

   Statistic collecting a duration metric.

  .. cpp:var:: uint64 count

     Cumulative number of times this metric occurred.

  .. cpp:var:: uint64 total_time_ns

     Total collected duration of this metric in nanoseconds.


.. cpp:var:: message StatusRequestStats

   Statistics collected for Status requests.

  .. cpp:var:: StatDuration success

     Total time required to handle successful Status requests, not
     including HTTP or gRPC endpoint termination time.


.. cpp:var:: message HealthRequestStats

   Statistics collected for Health requests.

  .. cpp:var:: StatDuration success

     Total time required to handle successful Health requests, not
     including HTTP or gRPC endpoint termination time.


.. cpp:var:: message ModelControlRequestStats

   Statistics collected for ModelControl requests.

  .. cpp:var:: StatDuration success

     Total time required to handle successful ModelControl requests, not
     including HTTP or gRPC endpoint termination time.


.. cpp:var:: message SharedMemoryControlRequestStats

   Statistics collected for SharedMemoryControl requests.

  .. cpp:var:: StatDuration success

     Total time required to handle successful SharedMemoryControl
     requests, not including HTTP or gRPC endpoint termination time.


.. cpp:var:: message RepositoryRequestStats

   Statistics collected for Repository requests.

  .. cpp:var:: StatDuration success

     Total time required to handle successful Repository requests, not
     including HTTP or gRPC endpoint termination time.


.. cpp:var:: message InferRequestStats

   Statistics collected for Infer requests.

  .. cpp:var:: StatDuration success

     Total time required to handle successful Infer requests, not
     including HTTP or GRPC endpoint handling time.

  .. cpp:var:: StatDuration failed

     Total time required to handle failed Infer requests, not
     including HTTP or GRPC endpoint handling time.

  .. cpp:var:: StatDuration compute

     Time required to run inferencing for an inference request;
     including time copying input tensors to GPU memory, time
     executing the model, and time copying output tensors from GPU
     memory.

  .. cpp:var:: StatDuration queue

     Time an inference request waits in scheduling queue for an
     available model instance.


.. cpp:enum:: ModelReadyState

   Readiness status for models.

  .. cpp:enumerator:: ModelReadyState::MODEL_UNKNOWN = 0

     The model is in an unknown state. The model is not available for
     inferencing.

  .. cpp:enumerator:: ModelReadyState::MODEL_READY = 1

     The model is ready and available for inferencing.

  .. cpp:enumerator:: ModelReadyState::MODEL_UNAVAILABLE = 2

     The model is unavailable, indicating that the model failed to
     load or has been implicitly or explicitly unloaded. The model is
     not available for inferencing.

  .. cpp:enumerator:: ModelReadyState::MODEL_LOADING = 3

     The model is being loaded by the inference server. The model is
     not available for inferencing.

  .. cpp:enumerator:: ModelReadyState::MODEL_UNLOADING = 4

     The model is being unloaded by the inference server. The model is
     not available for inferencing.


.. cpp:enum:: ModelReadyStateReason

   Detail associated with a model's readiness status.

  .. cpp:var:: string message

     The message that explains the cause of being in the current readiness
     state.


.. cpp:var:: message ModelVersionStatus

   Status for a version of a model.

  .. cpp:var:: ModelReadyState ready_state

     Current readiness state for the model.

  .. cpp:var:: ModelReadyStateReason ready_state_reason

     Supplemental information regarding the current readiness state.

  .. cpp:var:: map<uint32, InferRequestStats> infer_stats

     Inference statistics for the model, as a map from batch size
     to the statistics. A batch size will not occur in the map
     unless there has been at least one inference request of
     that batch size.

  .. cpp:var:: uint64 model_execution_count

     Cumulative number of model executions performed for the
     model. A single model execution performs inferencing for
     the entire request batch and can perform inferencing for multiple
     requests if dynamic batching is enabled.

  .. cpp:var:: uint64 model_inference_count

     Cumulative number of model inferences performed for the
     model. Each inference in a batched request is counted as
     an individual inference.

  .. cpp:var:: uint64 last_inference_timestamp_milliseconds

     The timestamp of the last inference request made for this model,
     given as milliseconds since the epoch.


.. cpp:var:: message ModelStatus

   Status for a model.

  .. cpp:var:: ModelConfig config

     The configuration for the model.

  .. cpp:var:: map<int64, ModelVersionStatus> version_status

     Duration statistics for each version of the model, as a map
     from version to the status. A version will not occur in the map
     unless there has been at least one inference request of
     that model version. A version of -1 indicates the status is
     for requests for which the version could not be determined.


.. cpp:enum:: ServerReadyState

   Readiness status for the inference server.

  .. cpp:enumerator:: ServerReadyState::SERVER_INVALID = 0

     The server is in an invalid state and will likely not
     response correctly to any requests.

  .. cpp:enumerator:: ServerReadyState::SERVER_INITIALIZING = 1

     The server is initializing.

  .. cpp:enumerator:: ServerReadyState::SERVER_READY = 2

     The server is ready and accepting requests.

  .. cpp:enumerator:: ServerReadyState::SERVER_EXITING = 3

     The server is exiting and will not respond to requests.

  .. cpp:enumerator:: ServerReadyState::SERVER_FAILED_TO_INITIALIZE = 10

     The server did not initialize correctly. Most requests will fail.

.. cpp:var:: message SharedMemoryRegion

   The meta-data for the shared memory region registered in the inference
   server.


  .. cpp:var:: string name

     The name for this shared memory region.

  .. cpp:var:: string shared_memory_key

     The name of the shared memory region that holds the input data
     (or where the output data should be written).

  .. cpp:var:: uint64 offset

     This is the offset of the shared memory block from the start
     of the shared memory region.
     start = offset, end = offset + byte_size;

  .. cpp:var:: int64 device_id

     The GPU device ID on which the cudaIPC handle was created.

  .. cpp:var:: oneof shared_memory_types

     Types of shared memory identifiers


  .. cpp:var:: SystemSharedMemory system_shared_memory

     The status of this system shared memory region.


  .. cpp:var:: CudaSharedMemory cuda_shared_memory

     The status of this CUDA shared memory region.

  .. cpp:var:: uint64 byte_size

     Size of the shared memory block, in bytes.


.. cpp:var:: message ServerStatus

   Status for the inference server.

  .. cpp:var:: string id

     The server's ID.

  .. cpp:var:: string version

     The server's version.

  .. cpp:var:: ServerReadyState ready_state

     Current readiness state for the server.

  .. cpp:var:: uint64 uptime_ns

     Server uptime in nanoseconds.

  .. cpp:var:: map<string, ModelStatus> model_status

     Status for each model, as a map from model name to the
     status.

  .. cpp:var:: StatusRequestStats status_stats

     Statistics for Status requests.

  .. cpp:var:: HealthRequestStats health_stats

     Statistics for Health requests.

  .. cpp:var:: ModelControlRequestStats model_control_stats

     Statistics for ModelControl requests.

  .. cpp:var:: SharedMemoryControlRequestStats shm_control_stats

     Statistics for SharedMemoryControl requests.

  .. cpp:var:: RepositoryRequestStats repository_stats

     Statistics for Repository requests.


.. cpp:var:: message SharedMemoryStatus

   Shared memory status for the inference server.


  .. cpp:var:: SharedMemoryRegion shared_memory_region (repeated)

     The list of active/registered shared memory regions.


.. cpp:var:: message ModelRepositoryIndex

   Index of the model repository monitored by the inference server.


  .. cpp:var:: message ModelEntry

     The basic information for a model.

    .. cpp:var:: string name

       The model's name.


  .. cpp:var:: ModelEntry models (repeated)

     The list of models in the model repository.