grpc_service.proto
====================

.. cpp:namespace:: nvidia::inferenceserver

.. cpp:var:: service GRPCService

   Inference Server GRPC endpoints.

  .. cpp:var:: rpc Status(StatusRequest) returns (StatusResponse)

     Get status for entire inference server or for a specified model.

  .. cpp:var:: rpc Profile(ProfileRequest) returns (ProfileResponse)

     Enable and disable low-level GPU profiling.

  .. cpp:var:: rpc Health(HealthRequest) returns (HealthResponse)

     Check liveness and readiness of the inference server.

  .. cpp:var:: rpc Infer(InferRequest) returns (InferResponse)

     Request inference using a specific model. [ To handle large input
     tensors likely need to set the maximum message size to that they
     can be transmitted in one pass.

  .. cpp:var:: rpc StreamInfer(stream InferRequest) returns (stream
     InferResponse)

     Request inferences using a specific model in a streaming manner.
     Individual inference requests sent through the same stream will be
     processed in order and be returned on completion


.. cpp:var:: message StatusRequest

   Request message for Status gRPC endpoint.


  .. cpp:var:: string model_name

     The specific model status to be returned. If empty return status
     for all models.


.. cpp:var:: message StatusResponse

   Response message for Status gRPC endpoint.


  .. cpp:var:: RequestStatus request_status

     The status of the request, indicating success or failure.


  .. cpp:var:: ServerStatus server_status

     The server and model status.


.. cpp:var:: message ProfileRequest

   Request message for Profile gRPC endpoint.


  .. cpp:var:: string cmd

     The requested profiling action: 'start' requests that GPU
     profiling be enabled on all GPUs controlled by the inference
     server; 'stop' requests that GPU profiling be disabled on all GPUs
     controlled by the inference server.


.. cpp:var:: message ProfileResponse

   Response message for Profile gRPC endpoint.


  .. cpp:var:: RequestStatus request_status

     The status of the request, indicating success or failure.


.. cpp:var:: message HealthRequest

   Request message for Health gRPC endpoint.


  .. cpp:var:: string mode

     The requested health action: 'live' requests the liveness
     state of the inference server; 'ready' requests the readiness state
     of the inference server.


.. cpp:var:: message HealthResponse

   Response message for Health gRPC endpoint.


  .. cpp:var:: RequestStatus request_status

     The status of the request, indicating success or failure.


  .. cpp:var:: bool health

     The result of the request. True indicates the inference server is
     live/ready, false indicates the inference server is not live/ready.


.. cpp:var:: message InferRequest

   Request message for Infer gRPC endpoint.

  .. cpp:var:: string model_name

     The name of the model to use for inferencing.

  .. cpp:var:: int64 version

     The version of the model to use for inference. If -1
     the latest/most-recent version of the model is used.

  .. cpp:var:: InferRequestHeader meta_data

     Meta-data for the request profiling input tensors and requesting
     output tensors.

  .. cpp:var:: bytes raw_input (repeated)

     The raw input tensor data in the order specified in 'meta_data'.


.. cpp:var:: message InferResponse

   Response message for Infer gRPC endpoint.


  .. cpp:var:: RequestStatus request_status

     The status of the request, indicating success or failure.

  .. cpp:var:: InferResponseHeader meta_data

     The response meta-data for the output tensors.

  .. cpp:var:: bytes raw_output (repeated)

     The raw output tensor data in the order specified in 'meta_data'.