Get desktop application:
View/edit binary Protocol Buffers messages
@@ @@.. cpp:var:: service InferenceService @@ @@ Inference Server GRPC endpoints. @@
@@ .. cpp:var:: rpc ModelConfig(ModelConfigRequest) returns @@ (ModelConfigResponse) @@ @@ Get model configuration. @@
@@ @@.. cpp:var:: message ModelConfigRequest @@ @@ Request message for ModelConfig. @@
@@ @@ .. cpp:var:: string name @@ @@ The name of the model. @@
@@ .. cpp:var:: string version @@ @@ The version of the model. If not given the model version @@ is selected automatically based on the version policy. @@
@@ @@.. cpp:var:: message ModelConfigResponse @@ @@ Response message for ModelConfig. @@
@@ @@ .. cpp:var:: ModelConfig config @@ @@ The model configuration. @@
@@ .. cpp:var:: rpc ModelInfer(ModelInferRequest) returns @@ (ModelInferResponse) @@ @@ Perform inference using a specific model. @@
@@ .. cpp:var:: rpc ModelMetadata(ModelMetadataRequest) returns @@ (ModelMetadataResponse) @@ @@ Get model metadata. @@
@@ @@.. cpp:var:: message ModelMetadataRequest @@ @@ Request message for ModelMetadata. @@
@@ @@ .. cpp:var:: string name @@ @@ The name of the model. @@
@@ .. cpp:var:: string version @@ @@ The version of the model to check for readiness. If not @@ given the server will choose a version based on the @@ model and internal policy. @@
@@ @@.. cpp:var:: message ModelMetadataResponse @@ @@ Response message for ModelMetadata. @@
@@ @@ .. cpp:var:: string name @@ @@ The model name. @@
@@ @@ .. cpp:var:: string versions (repeated) @@ @@ The versions of the model. @@
@@ @@ .. cpp:var:: string platform @@ @@ The model's platform. @@
@@ @@ .. cpp:var:: TensorMetadata inputs (repeated) @@ @@ The model's inputs. @@
@@ @@ .. cpp:var:: TensorMetadata outputs (repeated) @@ @@ The model's outputs. @@
@@ .. cpp:var:: rpc ModelReady(ModelReadyRequest) returns @@ (ModelReadyResponse) @@ @@ Check readiness of a model in the inference server. @@
@@ @@.. cpp:var:: message ModelReadyRequest @@ @@ Request message for ModelReady. @@
@@ @@ .. cpp:var:: string name @@ @@ The name of the model to check for readiness. @@
@@ .. cpp:var:: string version @@ @@ The version of the model to check for readiness. If not given the @@ server will choose a version based on the model and internal policy. @@
@@ @@.. cpp:var:: message ModelReadyResponse @@ @@ Response message for ModelReady. @@
@@ @@ .. cpp:var:: bool ready @@ @@ True if the model is ready, false it not ready. @@
@@ .. cpp:var:: rpc ModelStatistics( @@ ModelStatisticsRequest) @@ returns (ModelStatisticsResponse) @@ @@ Get the cumulative inference statistics for a model. @@
@@ @@.. cpp:var:: message ModelStatisticsRequest @@ @@ Request message for ModelStatistics. @@
@@ .. cpp:var:: string name @@ @@ The name of the model. @@
@@ .. cpp:var:: string version @@ @@ The version of the model. If not given returns statistics for @@ all model versions. @@
@@ @@.. cpp:var:: message ModelStatisticsResponse @@ @@ Response message for ModelStatistics. @@
@@ .. cpp:var:: map<string, InferStatistics> inference @@ @@ Map from version to inference statistics for that version. @@
@@ .. cpp:var:: rpc ModelStreamInfer(stream ModelInferRequest) returns @@ (stream ModelStreamInferResponse) @@ @@ Perform streaming inference. @@
@@ @@.. cpp:var:: message ModelStreamInferResponse @@ @@ Response message for ModelStreamInfer. @@
@@ @@ .. cpp:var:: string error_message @@ @@ The message describing the error. The empty message @@ indicates the inference was successful without errors. @@
@@ @@ .. cpp:var:: ModelInferResponse infer_response @@ @@ Holds the results of the request. @@
@@ .. cpp:var:: rpc RepositoryIndex(RepositoryIndexRequest) returns @@ (RepositoryIndexResponse) @@ @@ Get the index of model repository contents. @@
@@ @@.. cpp:var:: message RepositoryIndexRequest @@ @@ Request message for RepositoryIndex. @@
@@ .. cpp:var:: string repository_name @@ @@ The name of the repository. If empty the index is returned @@ for all repositories. @@
@@ @@.. cpp:var:: message RepositoryIndexResponse @@ @@ Response message for RepositoryIndex. @@
@@ @@ .. cpp:var:: ModelIndex models (repeated) @@ @@ An index entry for each model. @@
@@ .. cpp:var:: rpc RepositoryModelLoad(RepositoryModelLoadRequest) returns @@ (RepositoryModelLoadResponse) @@ @@ Load or reload a model from a repository. @@
@@ @@.. cpp:var:: message RepositoryModelLoadRequest @@ @@ Request message for RepositoryModelLoad. @@
@@ .. cpp:var:: string repository_name @@ @@ The name of the repository to load from. If empty the model @@ is loaded from any repository. @@
@@ .. cpp:var:: string repository_name @@ @@ The name of the model to load, or reload. @@
@@ @@.. cpp:var:: message RepositoryModelLoadResponse @@ @@ Response message for RepositoryModelLoad. @@
(message has no fields)
@@ .. cpp:var:: rpc RepositoryModelUnload(RepositoryModelUnloadRequest) returns @@ (RepositoryModelUnloadResponse) @@ @@ Unload a model. @@
@@ @@.. cpp:var:: message RepositoryModelUnloadRequest @@ @@ Request message for RepositoryModelUnload. @@
@@ .. cpp:var:: string repository_name @@ @@ The name of the repository from which the model was originally @@ loaded. If empty the repository is not considered. @@
@@ .. cpp:var:: string repository_name @@ @@ The name of the model to unload. @@
@@ @@.. cpp:var:: message RepositoryModelUnloadResponse @@ @@ Response message for RepositoryModelUnload. @@
(message has no fields)
@@ .. cpp:var:: rpc ServerLive(ServerLiveRequest) returns @@ (ServerLiveResponse) @@ @@ Check liveness of the inference server. @@
@@ @@.. cpp:var:: message ServerLiveRequest @@ @@ Request message for ServerLive. @@
(message has no fields)
@@ @@.. cpp:var:: message ServerLiveResponse @@ @@ Response message for ServerLive. @@
@@ @@ .. cpp:var:: bool live @@ @@ True if the inference server is live, false it not live. @@
@@ .. cpp:var:: rpc ServerMetadata(ServerMetadataRequest) returns @@ (ServerMetadataResponse) @@ @@ Get server metadata. @@
@@ @@.. cpp:var:: message ServerMetadataRequest @@ @@ Request message for ServerMetadata. @@
(message has no fields)
@@ @@.. cpp:var:: message ServerMetadataResponse @@ @@ Response message for ServerMetadata. @@
@@ @@ .. cpp:var:: string name @@ @@ The server name. @@
@@ @@ .. cpp:var:: string version @@ @@ The server version. @@
@@ @@ .. cpp:var:: string extensions (repeated) @@ @@ The extensions supported by the server. @@
@@ .. cpp:var:: rpc ServerReady(ServerReadyRequest) returns @@ (ServerReadyResponse) @@ @@ Check readiness of the inference server. @@
@@ @@.. cpp:var:: message ServerReadyRequest @@ @@ Request message for ServerReady. @@
(message has no fields)
@@ @@.. cpp:var:: message ServerReadyResponse @@ @@ Response message for ServerReady. @@
@@ @@ .. cpp:var:: bool ready @@ @@ True if the inference server is ready, false it not ready. @@
@@ @@.. cpp:var:: service GRPCService @@ @@ Inference Server GRPC endpoints. @@
@@ .. cpp:var:: rpc Health(HealthRequest) returns (HealthResponse) @@ @@ Check liveness and readiness of the inference server. @@
@@ @@.. cpp:var:: message HealthRequest @@ @@ Request message for Health gRPC endpoint. @@
@@ @@ .. cpp:var:: string mode @@ @@ The requested health action: 'live' requests the liveness @@ state of the inference server; 'ready' requests the readiness state @@ of the inference server. @@
@@ @@.. cpp:var:: message HealthResponse @@ @@ Response message for Health gRPC endpoint. @@
@@ @@ .. cpp:var:: RequestStatus request_status @@ @@ The status of the request, indicating success or failure. @@
@@ @@ .. cpp:var:: bool health @@ @@ The result of the request. True indicates the inference server is @@ live/ready, false indicates the inference server is not live/ready. @@
@@ .. cpp:var:: rpc Infer(InferRequest) returns (InferResponse) @@ @@ Request inference using a specific model. [ To handle large input @@ tensors likely need to set the maximum message size to that they @@ can be transmitted in one pass. @@
@@ .. cpp:var:: rpc ModelControl(ModelControlRequest) returns @@ (ModelControlResponse) @@ @@ Request to load / unload a specified model. @@
@@ @@.. cpp:var:: message ModelControlRequest @@ @@ Request message for ModelControl gRPC endpoint. @@
@@ @@ .. cpp:var:: string model_name @@ @@ The target model name. @@
@@ @@ .. cpp:var:: Type type @@ @@ The control type that is operated on the specified model. @@
@@ @@.. cpp:var:: message ModelControlResponse @@ @@ Response message for ModelControl gRPC endpoint. @@
@@ @@ .. cpp:var:: RequestStatus request_status @@ @@ The status of the request, indicating success or failure. @@
@@ .. cpp:var:: rpc Status(RepositoryRequest) returns (RepositoryResponse) @@ @@ Get status associated with the model repository. @@
@@ @@.. cpp:var:: message RepositoryRequest @@ @@ Request message for Repository gRPC endpoint. @@
@@ .. cpp:var:: oneof request_type @@ @@ Types of the repository request @@
@@ @@ .. cpp:var:: bool index @@ @@ Request for the index of the model repository. @@
@@ @@.. cpp:var:: message RepositoryResponse @@ @@ Response message for Repository gRPC endpoint. @@
@@ @@ .. cpp:var:: RequestStatus request_status @@ @@ The status of the request, indicating success or failure. @@
@@ .. cpp:var:: oneof response_type @@ @@ Types of the repository reponse, which is one-to-one mapping to @@ the repository request type. @@
@@ @@ .. cpp:var:: bool index @@ @@ The index of the model repository. @@
@@ .. cpp:var:: rpc Status(StatusRequest) returns (StatusResponse) @@ @@ Get status for entire inference server or for a specified model. @@
@@ @@.. cpp:var:: message StatusRequest @@ @@ Request message for Status gRPC endpoint. @@
@@ @@ .. cpp:var:: string model_name @@ @@ The specific model status to be returned. If empty return status @@ for all models. @@
@@ @@.. cpp:var:: message StatusResponse @@ @@ Response message for Status gRPC endpoint. @@
@@ @@ .. cpp:var:: RequestStatus request_status @@ @@ The status of the request, indicating success or failure. @@
@@ @@ .. cpp:var:: ServerStatus server_status @@ @@ The server and model status. @@
@@ .. cpp:var:: rpc StreamInfer(stream InferRequest) returns (stream @@ InferResponse) @@ @@ Request inferences using a specific model in a streaming manner. @@ Individual inference requests sent through the same stream will be @@ processed in order and be returned on completion @@
@@ @@ .. cpp:var:: message RegionStatus @@ @@ Status for a shared memory region. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The name for the shared memory region. @@
@@ .. cpp:var:: uin64 device_id @@ @@ The GPU device ID where the cudaIPC handle was created. @@
@@ .. cpp:var:: uint64 byte_size @@ @@ Size of the shared memory region, in bytes. @@
@@ @@.. cpp:enum:: DataType @@ @@ Data types supported for input and output tensors. @@
Used in: , , , ,
@@ .. cpp:enumerator:: DataType::INVALID = 0
@@ .. cpp:enumerator:: DataType::BOOL = 1
@@ .. cpp:enumerator:: DataType::UINT8 = 2
@@ .. cpp:enumerator:: DataType::UINT16 = 3
@@ .. cpp:enumerator:: DataType::UINT32 = 4
@@ .. cpp:enumerator:: DataType::UINT64 = 5
@@ .. cpp:enumerator:: DataType::INT8 = 6
@@ .. cpp:enumerator:: DataType::INT16 = 7
@@ .. cpp:enumerator:: DataType::INT32 = 8
@@ .. cpp:enumerator:: DataType::INT64 = 9
@@ .. cpp:enumerator:: DataType::FP16 = 10
@@ .. cpp:enumerator:: DataType::FP32 = 11
@@ .. cpp:enumerator:: DataType::FP64 = 12
@@ .. cpp:enumerator:: DataType::STRING = 13
@@ @@.. cpp:var:: message HealthRequestStats @@ @@ Statistics collected for Health requests. @@
Used in:
@@ .. cpp:var:: StatDuration success @@ @@ Total time required to handle successful Health requests, not @@ including HTTP or gRPC endpoint termination time. @@
@@ @@.. cpp:var:: message InferParameter @@ @@ An inference parameter value. @@
Used in: , , ,
@@ .. cpp:var:: oneof parameter_choice @@ @@ The parameter value can be a string, an int64 or @@ a boolean @@
@@ .. cpp:var:: bool bool_param @@ @@ A boolean parameter value. @@
@@ .. cpp:var:: int64 int64_param @@ @@ An int64 parameter value. @@
@@ .. cpp:var:: string string_param @@ @@ A string parameter value. @@
@@ @@.. cpp:var:: message InferRequest @@ @@ Request message for Infer gRPC endpoint. @@
Used as request type in: GRPCService.Infer, GRPCService.StreamInfer
@@ .. cpp:var:: string model_name @@ @@ The name of the model to use for inferencing. @@
@@ .. cpp:var:: int64 version @@ @@ The version of the model to use for inference. If -1 @@ the latest/most-recent version of the model is used. @@
@@ .. cpp:var:: InferRequestHeader meta_data @@ @@ Meta-data for the request: input tensors, output @@ tensors, etc. @@
@@ .. cpp:var:: bytes raw_input (repeated) @@ @@ The raw input tensor data in the order specified in 'meta_data'. @@
@@ @@.. cpp:var:: message InferRequestHeader @@ @@ Meta-data for an inferencing request. The actual input data is @@ delivered separate from this header, in the HTTP body for an HTTP @@ request, or in the :cpp:var:`InferRequest` message for a gRPC request. @@
Used in:
@@ .. cpp:var:: uint64 id @@ @@ The ID of the inference request. The response of the request will @@ have the same ID in InferResponseHeader. The request sender can use @@ the ID to correlate the response to corresponding request if needed. @@
@@ .. cpp:var:: uint32 flags @@ @@ The flags associated with this request. This field holds a bitwise-or @@ of all flag values. @@
@@ .. cpp:var:: uint64 correlation_id @@ @@ The correlation ID of the inference request. Default is 0, which @@ indictes that the request has no correlation ID. The correlation ID @@ is used to indicate two or more inference request are related to @@ each other. How this relationship is handled by the inference @@ server is determined by the model's scheduling policy. @@
@@ .. cpp:var:: uint32 batch_size @@ @@ The batch size of the inference request. This must be >= 1. For @@ models that don't support batching, batch_size must be 1. @@
@@ .. cpp:var:: Input input (repeated) @@ @@ The input meta-data for the inputs provided with the the inference @@ request. @@
@@ .. cpp:var:: Output output (repeated) @@ @@ The output meta-data for the inputs provided with the the inference @@ request. @@
@@ .. cpp:var:: uint32 priority @@ @@ The priority value of this request. If priority handling is not @@ enable for the model, then this value is ignored. The default value @@ is 0 which indicates that the request will be assigned the default @@ priority associated with the model. @@
@@ .. cpp:var:: uint64 timeout_microseconds @@ @@ The timeout for this request. This value overrides the timeout @@ specified by the model, if the model allows timeout override and if @@ the value is less than the default timeout specified by the model. @@ If the request cannot be processed within this timeout, the request @@ will be handled based on the model's timeout policy. @@ Note that request for ensemble model cannot override the timeout @@ values for the composing models. @@ The default value is 0 which indicates that the request does not @@ override the model's timeout value. @@
@@ .. cpp:enum:: Flag @@ @@ Flags that can be associated with an inference request. @@ All flags are packed bitwise into the 'flags' field and @@ so the value of each must be a power-of-2. @@
@@ .. cpp:enumerator:: Flag::FLAG_NONE = 0 @@ @@ Value indicating no flags are enabled. @@
@@ .. cpp:enumerator:: Flag::FLAG_SEQUENCE_START = 1 << 0 @@ @@ This request is the start of a related sequence of requests. @@
@@ .. cpp:enumerator:: Flag::FLAG_SEQUENCE_END = 1 << 1 @@ @@ This request is the end of a related sequence of requests. @@
@@ .. cpp:var:: message Input @@ @@ Meta-data for an input tensor provided as part of an inferencing @@ request. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the input tensor. @@
@@ .. cpp:var:: int64 dims (repeated) @@ @@ The shape of the input tensor, not including the batch dimension. @@ Optional if the model configuration for this input explicitly @@ specifies all dimensions of the shape. Required if the model @@ configuration for this input has any wildcard dimensions (-1). @@
@@ .. cpp:var:: uint64 batch_byte_size @@ @@ The size of the full batch of the input tensor, in bytes. @@ Optional for tensors with fixed-sized datatypes. Required @@ for tensors with a non-fixed-size datatype (like STRING). @@
@@ .. cpp:var:: InferSharedMemory shared_memory @@ @@ It is the location in shared memory that contains the tensor data @@ for this input. Using shared memory is optional but if this @@ message is used, all fields are required. @@
@@ .. cpp:var:: message Output @@ @@ Meta-data for a requested output tensor as part of an inferencing @@ request. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the output tensor. @@
@@ .. cpp:var:: Class cls @@ @@ Optional. If defined return this output as a classification @@ instead of raw data. The output tensor will be interpreted as @@ probabilities and the classifications associated with the @@ highest probabilities will be returned. @@
@@ .. cpp:var:: InferSharedMemory shared_memory @@ @@ It is the location in shared memory that the result tensor data @@ for this output will be written. Using shared memory is optional @@ but if this message is used, all fields are required. @@
@@ .. cpp:var:: message Class @@ @@ Options for an output returned as a classification. @@
Used in:
@@ .. cpp:var:: uint32 count @@ @@ Indicates how many classification values should be returned @@ for the output. The 'count' highest priority values are @@ returned. @@
@@ @@.. cpp:var:: message InferRequestStats @@ @@ Statistics collected for Infer requests. @@
Used in:
@@ .. cpp:var:: StatDuration success @@ @@ Total time required to handle successful Infer requests, not @@ including HTTP or GRPC endpoint handling time. @@
@@ .. cpp:var:: StatDuration failed @@ @@ Total time required to handle failed Infer requests, not @@ including HTTP or GRPC endpoint handling time. @@
@@ .. cpp:var:: StatDuration compute @@ @@ Time required to run inferencing for an inference request; @@ including time copying input tensors to GPU memory, time @@ executing the model, and time copying output tensors from GPU @@ memory. @@
@@ .. cpp:var:: StatDuration queue @@ @@ Time an inference request waits in scheduling queue for an @@ available model instance. @@
@@ .. cpp:var:: StatisticDuration compute_input @@ @@ The count and cumulative duration to prepare input tensor data as @@ required by the model framework / backend. For example, this duration @@ should include the time to copy input tensor data to the GPU. @@
@@ .. cpp:var:: StatisticDuration compute_infer @@ @@ The count and cumulative duration to execute the model. @@
@@ .. cpp:var:: StatisticDuration compute_output @@ @@ The count and cumulative duration to extract output tensor data @@ produced by the model framework / backend. For example, this duration @@ should include the time to copy output tensor data from the GPU. @@
@@ @@.. cpp:var:: message InferResponse @@ @@ Response message for Infer gRPC endpoint. @@
Used as response type in: GRPCService.Infer, GRPCService.StreamInfer
@@ @@ .. cpp:var:: RequestStatus request_status @@ @@ The status of the request, indicating success or failure. @@
@@ .. cpp:var:: InferResponseHeader meta_data @@ @@ The response meta-data for the output tensors. @@
@@ .. cpp:var:: bytes raw_output (repeated) @@ @@ The raw output tensor data in the order specified in 'meta_data'. @@
@@ @@.. cpp:var:: message InferResponseHeader @@ @@ Meta-data for the response to an inferencing request. The actual output @@ data is delivered separate from this header, in the HTTP body for an HTTP @@ request, or in the :cpp:var:`InferResponse` message for a gRPC request. @@
Used in:
@@ .. cpp:var:: uint64 id @@ @@ The ID of the inference response. The response will have the same ID @@ as the ID of its originated request. The request sender can use @@ the ID to correlate the response to corresponding request if needed. @@
@@ .. cpp:var:: string model_name @@ @@ The name of the model that produced the outputs. @@
@@ .. cpp:var:: int64 model_version @@ @@ The version of the model that produced the outputs. @@
@@ .. cpp:var:: uint32 batch_size @@ @@ The batch size of the outputs. This will always be equal to the @@ batch size of the inputs. For models that don't support @@ batching the batch_size will be 1. @@
@@ .. cpp:var:: Output output (repeated) @@ @@ The outputs, in the same order as they were requested in @@ :cpp:var:`InferRequestHeader`. @@
@@ .. cpp:var:: message Output @@ @@ Meta-data for an output tensor requested as part of an inferencing @@ request. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the output tensor. @@
@@ .. cpp:var:: DataType data_type @@ @@ The datatype of the output tensor. @@
@@ .. cpp:var:: Raw raw @@ @@ If specified deliver results for this output as raw tensor data. @@ The actual output data is delivered in the HTTP body for an HTTP @@ request, or in the :cpp:var:`InferResponse` message for a gRPC @@ request. Only one of 'raw' and 'batch_classes' may be specified. @@
@@ .. cpp:var:: Classes batch_classes (repeated) @@ @@ If specified deliver results for this output as classifications. @@ There is one :cpp:var:`Classes` object for each batch entry in @@ the output. Only one of 'raw' and 'batch_classes' may be @@ specified. @@
@@ .. cpp:var:: message Class @@ @@ Information about each classification for this output. @@
Used in:
@@ .. cpp:var:: int32 idx @@ @@ The classification index. @@
@@ .. cpp:var:: float value @@ @@ The classification value as a float (typically a @@ probability). @@
@@ .. cpp:var:: string label @@ @@ The label for the class (optional, only available if provided @@ by the model). @@
@@ .. cpp:var:: message Classes @@ @@ Meta-data for an output tensor being returned as classifications. @@
Used in:
@@ .. cpp:var:: Class cls (repeated) @@ @@ The topk classes for this output. @@
@@ .. cpp:var:: message Raw @@ @@ Meta-data for an output tensor being returned as raw data. @@
Used in:
@@ .. cpp:var:: int64 dims (repeated) @@ @@ The shape of the output tensor, not including the batch @@ dimension. @@
@@ .. cpp:var:: uint64 batch_byte_size @@ @@ The full size of the output tensor, in bytes. For a @@ batch output, this is the size of the entire batch. @@
@@.. cpp:var:: message InferSharedMemory @@ @@ The meta-data for the shared memory from which to read the input @@ data and/or write the output data. @@
Used in: ,
@@ .. cpp:var:: string name @@ @@ The name given during registration of a shared memory region that @@ holds the input data (or where the output data should be written). @@
@@ .. cpp:var:: uint64 offset @@ @@ The offset from the start of the shared memory region. @@ start = offset, end = offset + size; @@
@@ .. cpp:var:: uint64 byte_size @@ @@ Size of the memory block, in bytes. @@
@@ @@.. cpp:var:: message InferStatistics @@ @@ Inference statistics. @@
Used in:
@@ .. cpp:var:: StatisticDuration success @@ @@ Cumulative count and duration for successful inference @@ request. @@
@@ .. cpp:var:: StatisticDuration fail @@ @@ Cumulative count and duration for failed inference @@ request. @@
@@ .. cpp:var:: StatisticDuration queue @@ @@ The count and cumulative duration that inference requests wait in @@ scheduling or other queues. @@
@@ .. cpp:var:: StatisticDuration compute_input @@ @@ The count and cumulative duration to prepare input tensor data as @@ required by the model framework / backend. For example, this duration @@ should include the time to copy input tensor data to the GPU. @@
@@ .. cpp:var:: StatisticDuration compute_infer @@ @@ The count and cumulative duration to execute the model. @@
@@ .. cpp:var:: StatisticDuration compute_output @@ @@ The count and cumulative duration to extract output tensor data @@ produced by the model framework / backend. For example, this duration @@ should include the time to copy output tensor data from the GPU. @@
@@ @@.. cpp:var:: message InferTensorContents @@ @@ The data contained in a tensor. For a given data type the @@ tensor contents can be represented in "raw" bytes form or in @@ the repeated type that matches the tensor's data type. Protobuf @@ oneof is not used because oneofs cannot contain repeated fields. @@
Used in: ,
@@ @@ .. cpp:var:: bytes raw_contents @@ @@ Raw representation of the tensor contents. The size of this @@ content must match what is expected by the tensor's shape @@ and data type. The raw data must be the flattened, one-dimensional, @@ row-major order of the tensor elements without any stride or padding @@ between the elements. Note that the FP16 data type must be @@ represented as raw content as there is no standard support for a @@ 16-bit float type. @@
@@ @@ .. cpp:var:: bool bool_contents (repeated) @@ @@ Representation for BOOL data type. The size must match what is @@ expected by the tensor's shape. The contents must be the flattened, @@ one-dimensional, row-major order of the tensor elements. @@
@@ @@ .. cpp:var:: int32 int_contents (repeated) @@ @@ Representation for INT8, INT16, and INT32 data types. The size @@ must match what is expected by the tensor's shape. The contents @@ must be the flattened, one-dimensional, row-major order of the @@ tensor elements. @@
@@ @@ .. cpp:var:: int64 int64_contents (repeated) @@ @@ Representation for INT64 data types. The size must match what @@ is expected by the tensor's shape. The contents must be the @@ flattened, one-dimensional, row-major order of the tensor elements. @@
@@ @@ .. cpp:var:: uint32 uint_contents (repeated) @@ @@ Representation for UINT8, UINT16, and UINT32 data types. The size @@ must match what is expected by the tensor's shape. The contents @@ must be the flattened, one-dimensional, row-major order of the @@ tensor elements. @@
@@ @@ .. cpp:var:: uint64 uint64_contents (repeated) @@ @@ Representation for UINT64 data types. The size must match what @@ is expected by the tensor's shape. The contents must be the @@ flattened, one-dimensional, row-major order of the tensor elements. @@
@@ @@ .. cpp:var:: float fp32_contents (repeated) @@ @@ Representation for FP32 data type. The size must match what is @@ expected by the tensor's shape. The contents must be the flattened, @@ one-dimensional, row-major order of the tensor elements. @@
@@ @@ .. cpp:var:: double fp64_contents (repeated) @@ @@ Representation for FP64 data type. The size must match what is @@ expected by the tensor's shape. The contents must be the flattened, @@ one-dimensional, row-major order of the tensor elements. @@
@@ @@ .. cpp:var:: bytes byte_contents (repeated) @@ @@ Representation for BYTES data type. The size must match what is @@ expected by the tensor's shape. The contents must be the flattened, @@ one-dimensional, row-major order of the tensor elements. @@
@@ @@.. cpp:var:: message ModelConfig @@ @@ A model configuration. @@
Used in: ,
@@ .. cpp:var:: string name @@ @@ The name of the model. @@
@@ .. cpp:var:: string platform @@ @@ The framework for the model. Possible values are @@ "tensorrt_plan", "tensorflow_graphdef", @@ "tensorflow_savedmodel", "caffe2_netdef", @@ "onnxruntime_onnx", "pytorch_libtorch" and "custom". @@
@@ .. cpp:var:: ModelVersionPolicy version_policy @@ @@ Policy indicating which version(s) of the model will be served. @@
@@ .. cpp:var:: int32 max_batch_size @@ @@ Maximum batch size allowed for inference. This can only decrease @@ what is allowed by the model itself. A max_batch_size value of 0 @@ indicates that batching is not allowed for the model and the @@ dimension/shape of the input and output tensors must exactly @@ match what is specified in the input and output configuration. A @@ max_batch_size value > 0 indicates that batching is allowed and @@ so the model expects the input tensors to have an additional @@ initial dimension for the batching that is not specified in the @@ input (for example, if the model supports batched inputs of @@ 2-dimensional tensors then the model configuration will specify @@ the input shape as [ X, Y ] but the model will expect the actual @@ input tensors to have shape [ N, X, Y ]). For max_batch_size > 0 @@ returned outputs will also have an additional initial dimension @@ for the batch. @@
@@ .. cpp:var:: ModelInput input (repeated) @@ @@ The inputs request by the model. @@
@@ .. cpp:var:: ModelOutput output (repeated) @@ @@ The outputs produced by the model. @@
@@ .. cpp:var:: ModelOptimizationPolicy optimization @@ @@ Optimization configuration for the model. If not specified @@ then default optimization policy is used. @@
@@ .. cpp:var:: oneof scheduling_choice @@ @@ The scheduling policy for the model. If not specified the @@ default scheduling policy is used for the model. The default @@ policy is to execute each inference request independently. @@
@@ .. cpp:var:: ModelDynamicBatching dynamic_batching @@ @@ If specified, enables the dynamic-batching scheduling @@ policy. With dynamic-batching the scheduler may group @@ together independent requests into a single batch to @@ improve inference throughput. @@
@@ .. cpp:var:: ModelSequenceBatching sequence_batching @@ @@ If specified, enables the sequence-batching scheduling @@ policy. With sequence-batching, inference requests @@ with the same correlation ID are routed to the same @@ model instance. Multiple sequences of inference requests @@ may be batched together into a single batch to @@ improve inference throughput. @@
@@ .. cpp:var:: ModelEnsembling ensemble_scheduling @@ @@ If specified, enables the model-ensembling scheduling @@ policy. With model-ensembling, inference requests @@ will be processed according to the specification, such as an @@ execution sequence of models. The input specified in this model @@ config will be the input for the ensemble, and the output @@ specified will be the output of the ensemble. @@
@@ .. cpp:var:: ModelInstanceGroup instance_group (repeated) @@ @@ Instances of this model. If not specified, one instance @@ of the model will be instantiated on each available GPU. @@
@@ .. cpp:var:: string default_model_filename @@ @@ Optional filename of the model file to use if a @@ compute-capability specific model is not specified in @@ :cpp:var:`cc_model_filenames`. If not specified the default name @@ is 'model.graphdef', 'model.savedmodel', 'model.plan' or @@ 'model.netdef' depending on the model type. @@
@@ .. cpp:var:: map<string,string> cc_model_filenames @@ @@ Optional map from CUDA compute capability to the filename of @@ the model that supports that compute capability. The filename @@ refers to a file within the model version directory. @@
@@ .. cpp:var:: map<string,string> metric_tags @@ @@ Optional metric tags. User-specific key-value pairs for metrics @@ reported for this model. These tags are applied to the metrics @@ reported on the HTTP metrics port. @@
@@ .. cpp:var:: map<string,ModelParameter> parameters @@ @@ Optional model parameters. User-specified parameter values that @@ are made available to custom backends. @@
@@ .. cpp:var:: ModelWarmup model_warmup (repeated) @@ @@ Warmup setting of this model. If specified, all instances @@ will be run with the request samples in sequence before @@ serving the model. @@ This field can only be specified if the model is not an ensemble @@ model. @@
@@ .. cpp:enum:: Type @@ @@ Types of control operation @@
Used in:
@@ .. cpp:enumerator:: Type::UNLOAD = 0 @@ @@ To unload the specified model. @@
@@ .. cpp:enumerator:: Type::LOAD = 1 @@ @@ To load the specified model. If the model has been loaded, @@ it will be reloaded to fetch the latest change. @@
@@ @@.. cpp:var:: message ModelControlRequestStats @@ @@ Statistics collected for ModelControl requests. @@
Used in:
@@ .. cpp:var:: StatDuration success @@ @@ Total time required to handle successful ModelControl requests, not @@ including HTTP or gRPC endpoint termination time. @@
@@ @@.. cpp:var:: message ModelDynamicBatching @@ @@ Dynamic batching configuration. These settings control how dynamic @@ batching operates for the model. @@
Used in:
@@ .. cpp:var:: int32 preferred_batch_size (repeated) @@ @@ Preferred batch sizes for dynamic batching. If a batch of one of @@ these sizes can be formed it will be executed immediately. If @@ not specified a preferred batch size will be chosen automatically @@ based on model and GPU characteristics. @@
@@ .. cpp:var:: uint64 max_queue_delay_microseconds @@ @@ The maximum time, in microseconds, a request will be delayed in @@ the scheduling queue to wait for additional requests for @@ batching. Default is 0. @@
@@ .. cpp:var:: bool preserve_ordering @@ @@ Should the dynamic batcher preserve the ordering of responses to @@ match the order of requests received by the scheduler. Default is @@ false. If true, the responses will be returned in the same order as @@ the order of requests sent to the scheduler. If false, the responses @@ may be returned in arbitrary order. This option is specifically @@ needed when a sequence of related inference requests (i.e. inference @@ requests with the same correlation ID) are sent to the dynamic @@ batcher to ensure that the sequence responses are in the correct @@ order. @@
@@ .. cpp:var:: uint32 priority_levels @@ @@ The number of priority levels to be enabled for the model, @@ the priority level starts from 1 and 1 is the highest priority. @@ Requests are handled in priority order with all priority 1 requests @@ processed before priority 2, all priority 2 requests processed before @@ priority 3, etc. Requests with the same priority level will be @@ handled in the order that they are received. @@
@@ .. cpp:var:: uint32 default_priority_level @@ @@ The priority level used for requests that don't specify their @@ priority. The value must be in the range [ 1, 'priority_levels' ]. @@
@@ .. cpp:var:: ModelQueuePolicy default_queue_policy @@ @@ The default queue policy used for requests that don't require @@ priority handling and requests that specify priority levels where @@ there is no specific policy given. If not specified, a policy with @@ default field values will be used. @@
@@ .. cpp:var:: map<uint32, ModelQueuePolicy> priority_queue_policy @@ @@ Specify the queue policy for the priority level. The default queue @@ policy will be used if a priority level doesn't specify a queue @@ policy. @@
@@ @@.. cpp:var:: message ModelEnsembling @@ @@ Model ensembling configuration. These settings specify the models that @@ compose the ensemble and how data flows between the models. @@
Used in:
@@ .. cpp:var:: Step step (repeated) @@ @@ The models and the input / output mappings used within the ensemble. @@
@@ .. cpp:var:: message Step @@ @@ Each step specifies a model included in the ensemble, @@ maps ensemble tensor names to the model input tensors, @@ and maps model output tensors to ensemble tensor names @@
Used in:
@@ .. cpp:var:: string model_name @@ @@ The name of the model to execute for this step of the ensemble. @@
@@ .. cpp:var:: int64 model_version @@ @@ The version of the model to use for inference. If -1 @@ the latest/most-recent version of the model is used. @@
@@ .. cpp:var:: map<string,string> input_map @@ @@ Map from name of an input tensor on this step's model to ensemble @@ tensor name. The ensemble tensor must have the same data type and @@ shape as the model input. Each model input must be assigned to @@ one ensemble tensor, but the same ensemble tensor can be assigned @@ to multiple model inputs. @@
@@ .. cpp:var:: map<string,string> output_map @@ @@ Map from name of an output tensor on this step's model to ensemble @@ tensor name. The data type and shape of the ensemble tensor will @@ be inferred from the model output. It is optional to assign all @@ model outputs to ensemble tensors. One ensemble tensor name @@ can appear in an output map only once. @@
@@ @@.. cpp:var:: message ModelInferRequest @@ @@ Request message for ModelInfer. @@
Used as request type in: GRPCInferenceService.ModelInfer, GRPCInferenceService.ModelStreamInfer
@@ .. cpp:var:: string model_name @@ @@ The name of the model to use for inferencing. @@
@@ .. cpp:var:: string model_version @@ @@ The version of the model to use for inference. If not @@ given the latest/most-recent version of the model is used. @@
@@ .. cpp:var:: string id @@ @@ Optional identifier for the request. If specified will be @@ returned in the response. @@
@@ .. cpp:var:: map<string,InferParameter> parameters @@ @@ Optional inference parameters. @@
@@ @@ .. cpp:var:: InferInputTensor inputs (repeated) @@ @@ The input tensors for the inference. @@
@@ @@ .. cpp:var:: InferRequestedOutputTensor outputs (repeated) @@ @@ The requested output tensors for the inference. Optional, if not @@ specified all outputs specified in the model config will be @@ returned. @@
@@ @@ .. cpp:var:: message InferInputTensor @@ @@ An input tensor for an inference request. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The tensor name. @@
@@ @@ .. cpp:var:: string datatype @@ @@ The tensor data type. @@
@@ @@ .. cpp:var:: int64 shape (repeated) @@ @@ The tensor shape. @@
@@ .. cpp:var:: map<string,InferParameter> parameters @@ @@ Optional inference input tensor parameters. @@
@@ .. cpp:var:: InferTensorContents @@ @@ The input tensor data. @@
@@ @@ .. cpp:var:: message InferRequestedOutputTensor @@ @@ An output tensor requested for an inference request. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The tensor name. @@
@@ .. cpp:var:: map<string,InferParameter> parameters @@ @@ Optional requested output tensor parameters. @@
@@ @@.. cpp:var:: message ModelInferResponse @@ @@ Response message for ModelInfer. @@
Used as response type in: GRPCInferenceService.ModelInfer
Used as field type in:
@@ .. cpp:var:: string model_name @@ @@ The name of the model used for inference. @@
@@ .. cpp:var:: string model_version @@ @@ The version of the model used for inference. @@
@@ .. cpp:var:: string id @@ @@ The id of the inference request if one was specified. @@
@@ .. cpp:var:: map<string,InferParameter> parameters @@ @@ Optional inference response parameters. @@
@@ @@ .. cpp:var:: InferOutputTensor outputs (repeated) @@ @@ The output tensors holding inference results. @@
@@ @@ .. cpp:var:: message InferOutputTensor @@ @@ An output tensor returned for an inference request. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The tensor name. @@
@@ @@ .. cpp:var:: string datatype @@ @@ The tensor data type. @@
@@ @@ .. cpp:var:: int64 shape (repeated) @@ @@ The tensor shape. @@
@@ .. cpp:var:: InferTensorContents @@ @@ The output tensor data. @@
@@ @@.. cpp:var:: message ModelInput @@ @@ An input required by the model. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the input. @@
@@ .. cpp:var:: DataType data_type @@ @@ The data-type of the input. @@
@@ .. cpp:var:: Format format @@ @@ The format of the input. Optional. @@
@@ .. cpp:var:: int64 dims (repeated) @@ @@ The dimensions/shape of the input tensor that must be provided @@ when invoking the inference API for this model. @@
@@ .. cpp:var:: ModelTensorReshape reshape @@ @@ The shape expected for this input by the backend. The input will @@ be reshaped to this before being presented to the backend. The @@ reshape must have the same number of elements as the input shape @@ specified by 'dims'. Optional. @@
@@ .. cpp:var:: bool is_shape_tensor @@ @@ Whether or not the input is a shape tensor to the model. This field @@ is currently supported only for the TensorRT model. An error will be @@ generated if this specification does not comply with underlying @@ model. @@
@@ .. cpp:var:: bool allow_ragged_batch @@ @@ Whether or not the input is allowed to be "ragged" in a dynamically @@ created batch. Default is false indicating that two requests will @@ only be batched if this tensor has the same shape in both requests. @@ True indicates that two requests can be batched even if this tensor @@ has a different shape in each request. A true value is currently @@ supported only for custom models. @@
@@ @@ .. cpp:enum:: Format @@ @@ The format for the input. @@
Used in:
@@ .. cpp:enumerator:: Format::FORMAT_NONE = 0 @@ @@ The input has no specific format. This is the default. @@
@@ .. cpp:enumerator:: Format::FORMAT_NHWC = 1 @@ @@ HWC image format. Tensors with this format require 3 dimensions @@ if the model does not support batching (max_batch_size = 0) or 4 @@ dimensions if the model does support batching (max_batch_size @@ >= 1). In either case the 'dims' below should only specify the @@ 3 non-batch dimensions (i.e. HWC or CHW). @@
@@ .. cpp:enumerator:: Format::FORMAT_NCHW = 2 @@ @@ CHW image format. Tensors with this format require 3 dimensions @@ if the model does not support batching (max_batch_size = 0) or 4 @@ dimensions if the model does support batching (max_batch_size @@ >= 1). In either case the 'dims' below should only specify the @@ 3 non-batch dimensions (i.e. HWC or CHW). @@
@@ @@.. cpp:var:: message ModelInstanceGroup @@ @@ A group of one or more instances of a model and resources made @@ available for those instances. @@
Used in:
@@ .. cpp:var:: string name @@ @@ Optional name of this group of instances. If not specified the @@ name will be formed as <model name>_<group number>. The name of @@ individual instances will be further formed by a unique instance @@ number and GPU index: @@
@@ .. cpp:var:: Kind kind @@ @@ The kind of this instance group. Default is KIND_AUTO. If @@ KIND_AUTO or KIND_GPU then both 'count' and 'gpu' are valid and @@ may be specified. If KIND_CPU or KIND_MODEL only 'count' is valid @@ and 'gpu' cannot be specified. @@
@@ .. cpp:var:: int32 count @@ @@ For a group assigned to GPU, the number of instances created for @@ each GPU listed in 'gpus'. For a group assigned to CPU the number @@ of instances created. Default is 1.
@@ .. cpp:var:: int32 gpus (repeated) @@ @@ GPU(s) where instances should be available. For each GPU listed, @@ 'count' instances of the model will be available. Setting 'gpus' @@ to empty (or not specifying at all) is eqivalent to listing all @@ available GPUs. @@
@@ .. cpp:var:: string profile (repeated) @@ @@ For TensorRT models, using inputs with dynamic shape, this @@ parameter specifies a set of optimization profiles available to this @@ instance group. The inference server will choose the optimal profile @@ based on the shapes of the input tensors. This field should lie @@ between 0 and <TotalNumberOfOptimizationProfilesInPlanModel> - 1 @@ and be specified only for TensorRT backend, otherwise an error will @@ be generated. @@
@@ @@ .. cpp:enum:: Kind @@ @@ Kind of this instance group. @@
Used in:
@@ .. cpp:enumerator:: Kind::KIND_AUTO = 0 @@ @@ This instance group represents instances that can run on either @@ CPU or GPU. If all GPUs listed in 'gpus' are available then @@ instances will be created on GPU(s), otherwise instances will @@ be created on CPU. @@
@@ .. cpp:enumerator:: Kind::KIND_GPU = 1 @@ @@ This instance group represents instances that must run on the @@ GPU. @@
@@ .. cpp:enumerator:: Kind::KIND_CPU = 2 @@ @@ This instance group represents instances that must run on the @@ CPU. @@
@@ .. cpp:enumerator:: Kind::KIND_MODEL = 3 @@ @@ This instance group represents instances that should run on the @@ CPU and/or GPU(s) as specified by the model or backend itself. @@ The inference server will not override the model/backend @@ settings. @@ Currently, this option is supported only for Tensorflow models. @@
@@ @@ .. cpp:var:: message TensorMetadata @@ @@ Metadata for a tensor. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The tensor name. @@
@@ @@ .. cpp:var:: string datatype @@ @@ The tensor data type. @@
@@ @@ .. cpp:var:: int64 shape (repeated) @@ @@ The tensor shape. A variable-size dimension is represented @@ by a -1 value. @@
@@ @@.. cpp:var:: message ModelOptimizationPolicy @@ @@ Optimization settings for a model. These settings control if/how a @@ model is optimized and prioritized by the backend framework when @@ it is loaded. @@
Used in:
@@ .. cpp:var:: Graph graph @@ @@ The graph optimization setting for the model. Optional. @@
@@ .. cpp:var:: ModelPriority priority @@ @@ The priority setting for the model. Optional. @@
@@ .. cpp:var:: Cuda cuda @@ @@ CUDA-specific optimization settings. Optional. @@
@@ .. cpp:var:: ExecutionAccelerators execution_accelerators @@ @@ The accelerators used for the model. Optional. @@
@@ .. cpp:var:: PinnedMemoryBuffer input_pinned_memory @@ @@ Use pinned memory buffer when the data transfer for inputs @@ is between GPU memory and non-pinned system memory. @@ Default is true. @@
@@ .. cpp:var:: PinnedMemoryBuffer output_pinned_memory @@ @@ Use pinned memory buffer when the data transfer for outputs @@ is between GPU memory and non-pinned system memory. @@ Default is true. @@
@@ @@ .. cpp:var:: message Cuda @@ @@ CUDA-specific optimization settings. @@
Used in:
@@ .. cpp:var:: bool graphs @@ @@ Use CUDA graphs API to capture model operations and execute @@ them more efficiently. Currently only recognized by TensorRT @@ backend. @@
@@ @@ .. cpp:var:: message ExecutionAccelerators @@ @@ Specify the preferred execution accelerators to be used to execute @@ the model. Currently only recognized by ONNX Runtime backend and @@ TensorFlow backend. @@ @@ For ONNX Runtime backend, it will deploy the model with the execution @@ accelerators by priority, the priority is determined based on the @@ order that they are set, i.e. the provider at the front has highest @@ priority. Overall, the priority will be in the following order: @@ <gpu_execution_accelerator> (if instance is on GPU) @@ CUDA Execution Provider (if instance is on GPU) @@ <cpu_execution_accelerator> @@ Default CPU Execution Provider @@
Used in:
@@ .. cpp:var:: Accelerator gpu_execution_accelerator (repeated) @@ @@ The preferred execution provider to be used if the model instance @@ is deployed on GPU. @@ @@ For ONNX Runtime backend, possible value is "tensorrt" as name, @@ and no parameters are required. @@ @@ For TensorFlow backend, possible values are "tensorrt", "gpu_io". @@ @@ For "tensorrt", the following parameters can be specified: @@ "precision_mode": The precision used for optimization. @@ Allowed values are "FP32" and "FP16". Default value is "FP32". @@ @@ "max_cached_engines": The maximum number of cached TensorRT @@ engines in dynamic TensorRT ops. Default value is 100. @@ @@ "minimum_segment_size": The smallest model subgraph that will @@ be considered for optimization by TensorRT. Default value is 3. @@ @@ "max_workspace_size_bytes": The maximum GPU memory the model @@ can use temporarily during execution. Default value is 1GB. @@ @@ For "gpu_io", no parameters are required. If set, the model will @@ be executed using TensorFlow Callable API to set input and output @@ tensors in GPU memory if possible, which can reduce data transfer @@ overhead if the model is used in ensemble. However, the Callable @@ object will be created on model creation and it will request all @@ outputs for every model execution, which may impact the @@ performance if a request does not require all outputs. This @@ optimization will only take affect if the model instance is @@ created with KIND_GPU. @@
@@ .. cpp:var:: Accelerator cpu_execution_accelerator (repeated) @@ @@ The preferred execution provider to be used if the model instance @@ is deployed on CPU. @@ @@ For ONNX Runtime backend, possible value is "openvino" as name, @@ and no parameters are required. @@
@@ @@ .. cpp:var:: message Accelerator @@ @@ Specify the accelerator to be used to execute the model. @@ Accelerator with the same name may accept different parameters @@ depending on the backends. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the execution accelerator. @@
@@ .. cpp:var:: map<string, string> parameters @@ @@ Additional paremeters used to configure the accelerator. @@
@@ @@ .. cpp:var:: message Graph @@ @@ Enable generic graph optimization of the model. If not specified @@ the framework's default level of optimization is used. Supports @@ TensorFlow graphdef and savedmodel and Onnx models. For TensorFlow @@ causes XLA to be enabled/disabled for the model. For Onnx defaults @@ to enabling all optimizations, -1 enables only basic optimizations, @@ +1 enables only basic and extended optimizations. @@
Used in:
@@ .. cpp:var:: int32 level @@ @@ The optimization level. Defaults to 0 (zero) if not specified. @@ @@ - -1: Disabled @@ - 0: Framework default @@ - 1+: Enable optimization level (greater values indicate @@ higher optimization levels) @@
@@ @@ .. cpp:enum:: ModelPriority @@ @@ Model priorities. A model will be given scheduling and execution @@ preference over models at lower priorities. Current model @@ priorities only work for TensorRT models. @@
Used in:
@@ .. cpp:enumerator:: ModelPriority::PRIORITY_DEFAULT = 0 @@ @@ The default model priority. @@
@@ .. cpp:enumerator:: ModelPriority::PRIORITY_MAX = 1 @@ @@ The maximum model priority. @@
@@ .. cpp:enumerator:: ModelPriority::PRIORITY_MIN = 2 @@ @@ The minimum model priority. @@
@@ @@ .. cpp:var:: message PinnedMemoryBuffer @@ @@ Specify whether to use a pinned memory buffer when transferring data @@ between non-pinned system memory and GPU memory. Using a pinned @@ memory buffer for system from/to GPU transfers will typically provide @@ increased performance. For example, in the common use case where the @@ request provides inputs and delivers outputs via non-pinned system @@ memory, if the model instance accepts GPU IOs, the inputs will be @@ processed by two copies: from non-pinned system memory to pinned @@ memory, and from pinned memory to GPU memory. Similarly, pinned @@ memory will be used for delivering the outputs. @@
Used in:
@@ .. cpp:var:: bool enable @@ @@ Use pinned memory buffer. Default is true. @@
@@ @@.. cpp:var:: message ModelOutput @@ @@ An output produced by the model. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the output. @@
@@ .. cpp:var:: DataType data_type @@ @@ The data-type of the output. @@
@@ .. cpp:var:: int64 dims (repeated) @@ @@ The dimensions/shape of the output tensor. @@
@@ .. cpp:var:: ModelTensorReshape reshape @@ @@ The shape produced for this output by the backend. The output will @@ be reshaped from this to the shape specifed in 'dims' before being @@ returned in the inference response. The reshape must have the same @@ number of elements as the output shape specified by 'dims'. Optional. @@
@@ .. cpp:var:: string label_filename @@ @@ The label file associated with this output. Should be specified only @@ for outputs that represent classifications. Optional. @@
@@ .. cpp:var:: bool is_shape_tensor @@ @@ Whether or not the output is a shape tensor to the model. This field @@ is currently supported only for the TensorRT model. An error will be @@ generated if this specification does not comply with underlying @@ model. @@
@@ @@.. cpp:var:: message ModelParameter @@ @@ A model parameter. @@
Used in:
@@ .. cpp:var:: string string_value @@ @@ The string value of the parameter. @@
@@ @@.. cpp:var:: message ModelQueuePolicy @@ @@ Queue policy for inference requests. @@
Used in:
@@ @@ .. cpp:var:: TimeoutAction timeout_action @@ @@ The action applied to timed-out request. @@ The default action is REJECT. @@
@@ @@ .. cpp:var:: uint64 default_timeout_microseconds @@ @@ The default timeout for every request, in microseconds. @@ The default value is 0 which indicates that no timeout is set. @@
@@ @@ .. cpp:var:: bool allow_timeout_override @@ @@ Whether individual request can override the default timeout value. @@ When true, individual requests can set a timeout that is less than @@ the default timeout value but may not increase the timeout. @@ The default value is false. @@
@@ @@ .. cpp:var:: uint32 max_queue_size @@ @@ The maximum queue size for holding requests. A request will be @@ rejected immediately if it can't be enqueued because the queue is @@ full. The default value is 0 which indicates that no maximum @@ queue size is enforced. @@
@@ @@ .. cpp:enum:: TimeoutAction @@ @@ The action applied to timed-out requests. @@
Used in:
@@ .. cpp:enumerator:: Action::REJECT = 0 @@ @@ Reject the request and return error message accordingly. @@
@@ .. cpp:enumerator:: Action::DELAY = 1 @@ @@ Delay the request until all other requests at the same @@ (or higher) priority levels that have not reached their timeouts @@ are processed. A delayed request will eventually be processed, @@ but may be delayed indefinitely due to newly arriving requests. @@
@@ @@.. cpp:enum:: ModelReadyState @@ @@ Readiness status for models. @@
Used in:
@@ .. cpp:enumerator:: ModelReadyState::MODEL_UNKNOWN = 0 @@ @@ The model is in an unknown state. The model is not available for @@ inferencing. @@
@@ .. cpp:enumerator:: ModelReadyState::MODEL_READY = 1 @@ @@ The model is ready and available for inferencing. @@
@@ .. cpp:enumerator:: ModelReadyState::MODEL_UNAVAILABLE = 2 @@ @@ The model is unavailable, indicating that the model failed to @@ load or has been implicitly or explicitly unloaded. The model is @@ not available for inferencing. @@
@@ .. cpp:enumerator:: ModelReadyState::MODEL_LOADING = 3 @@ @@ The model is being loaded by the inference server. The model is @@ not available for inferencing. @@
@@ .. cpp:enumerator:: ModelReadyState::MODEL_UNLOADING = 4 @@ @@ The model is being unloaded by the inference server. The model is @@ not available for inferencing. @@
@@ @@.. cpp:enum:: ModelReadyStateReason @@ @@ Detail associated with a model's readiness status. @@
Used in:
@@ .. cpp:var:: string message @@ @@ The message that explains the cause of being in the current readiness @@ state. @@
@@ @@.. cpp:var:: message ModelRepositoryIndex @@ @@ Index of the model repository monitored by the inference server. @@
Used in:
@@ @@ .. cpp:var:: ModelEntry models (repeated) @@ @@ The list of models in the model repository. @@
@@ @@ .. cpp:var:: message ModelEntry @@ @@ The basic information for a model. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The model's name. @@
@@ @@.. cpp:var:: message ModelSequenceBatching @@ @@ Sequence batching configuration. These settings control how sequence @@ batching operates for the model. @@
Used in:
@@ .. cpp:var:: oneof strategy_choice @@ @@ The strategy used by the sequence batcher. Default strategy @@ is 'direct'. @@
@@ .. cpp:var:: StrategyDirect direct @@ @@ StrategyDirect scheduling strategy. @@
@@ .. cpp:var:: StrategyOldest oldest @@ @@ StrategyOldest scheduling strategy. @@
@@ .. cpp:var:: uint64 max_sequence_idle_microseconds @@ @@ The maximum time, in microseconds, that a sequence is allowed to @@ be idle before it is aborted. The inference server considers a @@ sequence idle when it does not have any inference request queued @@ for the sequence. If this limit is exceeded, the inference server @@ will free the sequence slot allocated by the sequence and make it @@ available for another sequence. If not specified (or specified as @@ zero) a default value of 1000000 (1 second) is used. @@
@@ .. cpp:var:: ControlInput control_input (repeated) @@ @@ The model input(s) that the server should use to communicate @@ sequence start, stop, ready and similar control values to the @@ model. @@
@@ .. cpp:var:: message Control @@ @@ A control is a signal that the sequence batcher uses to @@ communicate with a backend. @@
Used in:
@@ .. cpp:var:: Kind kind @@ @@ The kind of this control. @@
@@ .. cpp:var:: int32 int32_false_true (repeated) @@ @@ The control's true and false setting is indicated by setting @@ a value in an int32 tensor. The tensor must be a @@ 1-dimensional tensor with size equal to the batch size of @@ the request. 'int32_false_true' must have two entries: the @@ first the false value and the second the true value. @@
@@ .. cpp:var:: float fp32_false_true (repeated) @@ @@ The control's true and false setting is indicated by setting @@ a value in a fp32 tensor. The tensor must be a @@ 1-dimensional tensor with size equal to the batch size of @@ the request. 'fp32_false_true' must have two entries: the @@ first the false value and the second the true value. @@
@@ .. cpp:var:: DataType data_type @@ @@ The control's datatype. @@
@@ @@ .. cpp:enum:: Kind @@ @@ The kind of the control. @@
Used in:
@@ .. cpp:enumerator:: Kind::CONTROL_SEQUENCE_START = 0 @@ @@ A new sequence is/is-not starting. If true a sequence is @@ starting, if false a sequence is continuing. Must @@ specify either int32_false_true or fp32_false_true for @@ this control. This control is optional. @@
@@ .. cpp:enumerator:: Kind::CONTROL_SEQUENCE_READY = 1 @@ @@ A sequence is/is-not ready for inference. If true the @@ input tensor data is valid and should be used. If false @@ the input tensor data is invalid and inferencing should @@ be "skipped". Must specify either int32_false_true or @@ fp32_false_true for this control. This control is optional. @@
@@ .. cpp:enumerator:: Kind::CONTROL_SEQUENCE_END = 2 @@ @@ A sequence is/is-not ending. If true a sequence is @@ ending, if false a sequence is continuing. Must @@ specify either int32_false_true or fp32_false_true for @@ this control. This control is optional. @@
@@ .. cpp:enumerator:: Kind::CONTROL_SEQUENCE_CORRID = 3 @@ @@ The correlation ID of the sequence. The correlation ID @@ is an uint64_t value that is communicated in whole or @@ in part by the tensor. The tensor's datatype must be @@ specified by data_type and must be TYPE_UINT64, TYPE_INT64, @@ TYPE_UINT32 or TYPE_INT32. If a 32-bit datatype is specified @@ the correlation ID will be truncated to the low-order 32 @@ bits. This control is optional. @@
@@ .. cpp:var:: message ControlInput @@ @@ The sequence control values to communicate by a model input. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the model input. @@
@@ .. cpp:var:: Control control (repeated) @@ @@ The control value(s) that should be communicated to the @@ model using this model input. @@
@@ .. cpp:var:: message StrategyDirect @@ @@ The sequence batcher uses a specific, unique batch @@ slot for each sequence. All inference requests in a @@ sequence are directed to the same batch slot in the same @@ model instance over the lifetime of the sequence. This @@ is the default strategy. @@
Used in:
(message has no fields)
@@ .. cpp:var:: message StrategyOldest @@ @@ The sequence batcher maintains up to 'max_candidate_sequences' @@ candidate sequences. 'max_candidate_sequences' can be greater @@ than the model's 'max_batch_size'. For inferencing the batcher @@ chooses from the candidate sequences up to 'max_batch_size' @@ inference requests. Requests are chosen in an oldest-first @@ manner across all candidate sequences. A given sequence is @@ not guaranteed to be assigned to the same batch slot for @@ all inference requests of that sequence. @@
Used in:
@@ .. cpp:var:: int32 max_candidate_sequences @@ @@ Maximum number of candidate sequences that the batcher @@ maintains. Excess seqences are kept in an ordered backlog @@ and become candidates when existing candidate sequences @@ complete. @@
@@ .. cpp:var:: int32 preferred_batch_size (repeated) @@ @@ Preferred batch sizes for dynamic batching of candidate @@ sequences. If a batch of one of these sizes can be formed @@ it will be executed immediately. If not specified a @@ preferred batch size will be chosen automatically @@ based on model and GPU characteristics. @@
@@ .. cpp:var:: uint64 max_queue_delay_microseconds @@ @@ The maximum time, in microseconds, a candidate request @@ will be delayed in the dynamic batch scheduling queue to @@ wait for additional requests for batching. Default is 0. @@
@@ @@.. cpp:var:: message ModelStatus @@ @@ Status for a model. @@
Used in:
@@ .. cpp:var:: ModelConfig config @@ @@ The configuration for the model. @@
@@ .. cpp:var:: map<int64, ModelVersionStatus> version_status @@ @@ Duration statistics for each version of the model, as a map @@ from version to the status. A version will not occur in the map @@ unless there has been at least one inference request of @@ that model version. A version of -1 indicates the status is @@ for requests for which the version could not be determined. @@
@@ @@.. cpp:var:: message ModelTensorReshape @@ @@ Reshape specification for input and output tensors. @@
Used in: ,
@@ .. cpp:var:: int64 shape (repeated) @@ @@ The shape to use for reshaping. @@
@@ @@.. cpp:var:: message ModelVersionPolicy @@ @@ Policy indicating which versions of a model should be made @@ available by the inference server. @@
Used in:
@@ .. cpp:var:: oneof policy_choice @@ @@ Each model must implement only a single version policy. The @@ default policy is 'Latest'. @@
@@ .. cpp:var:: Latest latest @@ @@ Serve only latest version(s) of the model. @@
@@ .. cpp:var:: All all @@ @@ Serve all versions of the model. @@
@@ .. cpp:var:: Specific specific @@ @@ Serve only specific version(s) of the model. @@
@@ .. cpp:var:: message All @@ @@ Serve all versions of the model. @@
Used in:
(message has no fields)
@@ .. cpp:var:: message Latest @@ @@ Serve only the latest version(s) of a model. This is @@ the default policy. @@
Used in:
@@ .. cpp:var:: uint32 num_versions @@ @@ Serve only the 'num_versions' highest-numbered versions. T @@ The default value of 'num_versions' is 1, indicating that by @@ default only the single highest-number version of a @@ model will be served. @@
@@ .. cpp:var:: message Specific @@ @@ Serve only specific versions of the model. @@
Used in:
@@ .. cpp:var:: int64 versions (repeated) @@ @@ The specific versions of the model that will be served. @@
@@ @@.. cpp:var:: message ModelVersionStatus @@ @@ Status for a version of a model. @@
Used in:
@@ .. cpp:var:: ModelReadyState ready_state @@ @@ Current readiness state for the model. @@
@@ .. cpp:var:: ModelReadyStateReason ready_state_reason @@ @@ Supplemental information regarding the current readiness state. @@
@@ .. cpp:var:: map<uint32, InferRequestStats> infer_stats @@ @@ Inference statistics for the model, as a map from batch size @@ to the statistics. A batch size will not occur in the map @@ unless there has been at least one inference request of @@ that batch size. However, for V2 API all InferRequestStats are @@ recorded at a single key which is 1. @@
@@ .. cpp:var:: uint64 model_execution_count @@ @@ Cumulative number of model executions performed for the @@ model. A single model execution performs inferencing for @@ the entire request batch and can perform inferencing for multiple @@ requests if dynamic batching is enabled. @@
@@ .. cpp:var:: uint64 model_inference_count @@ @@ Cumulative number of model inferences performed for the @@ model. Each inference in a batched request is counted as @@ an individual inference. @@
@@ .. cpp:var:: uint64 last_inference_timestamp_milliseconds @@ @@ The timestamp of the last inference request made for this model, @@ given as milliseconds since the epoch. @@
@@ @@.. cpp:var:: message ModelWarmup @@ @@ Settings used to construct the request sample for model warmup. @@
Used in:
@@ .. cpp:var:: string name @@ @@ The name of the request sample. @@
@@ .. cpp:var:: uint32 batch_size @@ @@ The batch size of the inference request. This must be >= 1. For @@ models that don't support batching, batch_size must be 1. If @@ batch_size > 1, the 'inputs' specified below will be duplicated to @@ match the batch size requested. @@
@@ .. cpp:var:: map<string, Input> inputs @@ @@ The warmup meta data associated with every model input, including @@ control tensors. @@
@@ @@ .. cpp:var:: message Input @@ @@ Meta data associated with an input. @@
Used in:
@@ .. cpp:var:: DataType data_type @@ @@ The data-type of the input. @@
@@ .. cpp:var:: int64 dims (repeated) @@ @@ The shape of the input tensor, not including the batch dimension. @@
@@ .. cpp:var:: oneof input_data_type @@ @@ Specify how the input data is generated. If the input has STRING @@ data type and 'random_data' is set, the data generation will fall @@ back to 'zero_data'. @@
@@ @@ .. cpp:var:: bool zero_data @@ @@ The identifier for using zeros as input data. Note that the @@ value of 'zero_data' will not be checked, instead, zero data @@ will be used as long as the field is set. @@
@@ @@ .. cpp:var:: bool random_data @@ @@ The identifier for using random data as input data. Note that @@ the value of 'random_data' will not be checked, instead, @@ random data will be used as long as the field is set. @@
@@ .. cpp:var:: string input_data_file @@ @@ The file whose content will be used as raw input data in @@ row-major order. The file must be provided in a sub-directory @@ 'warmup' under the model directory. @@
@@ @@ .. cpp:var:: message ModelIndex @@ @@ Index entry for a model. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The name of the model. @@
@@ @@.. cpp:var:: message RepositoryRequestStats @@ @@ Statistics collected for Repository requests. @@
Used in:
@@ .. cpp:var:: StatDuration success @@ @@ Total time required to handle successful Repository requests, not @@ including HTTP or gRPC endpoint termination time. @@
@@ @@.. cpp:var:: message RequestStatus @@ @@ Status returned for all inference server requests. The @@ RequestStatus provides a :cpp:enum:`RequestStatusCode`, an @@ optional status message, and server and request IDs. @@
Used in: , , , , ,
@@ .. cpp:var:: RequestStatusCode code @@ @@ The status code. @@
@@ .. cpp:var:: string msg @@ @@ The optional status message. @@
@@ .. cpp:var:: string server_id @@ @@ The identifying string for the server that is returning @@ this status. @@
@@ .. cpp:var:: string request_id @@ @@ Unique identifier for the request assigned by the inference @@ server. Value 0 (zero) indicates the request ID is not known. @@
@@ @@.. cpp:enum:: RequestStatusCode @@ @@ Status codes returned for inference server requests. The @@ :cpp:enumerator:`RequestStatusCode::SUCCESS` status code indicates @@ not error, all other codes indicate an error. @@
Used in:
@@ .. cpp:enumerator:: RequestStatusCode::INVALID = 0 @@ @@ Invalid status. Used internally but should not be returned as @@ part of a :cpp:var:`RequestStatus`. @@
@@ .. cpp:enumerator:: RequestStatusCode::SUCCESS = 1 @@ @@ Error code indicating success. @@
@@ .. cpp:enumerator:: RequestStatusCode::UNKNOWN = 2 @@ @@ Error code indicating an unknown failure. @@
@@ .. cpp:enumerator:: RequestStatusCode::INTERNAL = 3 @@ @@ Error code indicating an internal failure. @@
@@ .. cpp:enumerator:: RequestStatusCode::NOT_FOUND = 4 @@ @@ Error code indicating a resource or request was not found. @@
@@ .. cpp:enumerator:: RequestStatusCode::INVALID_ARG = 5 @@ @@ Error code indicating a failure caused by an unknown argument or @@ value. @@
@@ .. cpp:enumerator:: RequestStatusCode::UNAVAILABLE = 6 @@ @@ Error code indicating an unavailable resource. @@
@@ .. cpp:enumerator:: RequestStatusCode::UNSUPPORTED = 7 @@ @@ Error code indicating an unsupported request or operation. @@
@@ .. cpp:enumerator:: RequestStatusCode::ALREADY_EXISTS = 8 @@ @@ Error code indicating an already existing resource. @@
@@ @@.. cpp:enum:: ServerReadyState @@ @@ Readiness status for the inference server. @@
Used in:
@@ .. cpp:enumerator:: ServerReadyState::SERVER_INVALID = 0 @@ @@ The server is in an invalid state and will likely not @@ response correctly to any requests. @@
@@ .. cpp:enumerator:: ServerReadyState::SERVER_INITIALIZING = 1 @@ @@ The server is initializing. @@
@@ .. cpp:enumerator:: ServerReadyState::SERVER_READY = 2 @@ @@ The server is ready and accepting requests. @@
@@ .. cpp:enumerator:: ServerReadyState::SERVER_EXITING = 3 @@ @@ The server is exiting and will not respond to requests. @@
@@ .. cpp:enumerator:: ServerReadyState::SERVER_FAILED_TO_INITIALIZE = 10 @@ @@ The server did not initialize correctly. Most requests will fail. @@
@@ @@.. cpp:var:: message ServerStatus @@ @@ Status for the inference server. @@
Used in:
@@ .. cpp:var:: string id @@ @@ The server's ID. @@
@@ .. cpp:var:: string version @@ @@ The server's version. @@
@@ .. cpp:var:: ServerReadyState ready_state @@ @@ Current readiness state for the server. @@
@@ .. cpp:var:: uint64 uptime_ns @@ @@ Server uptime in nanoseconds. @@
@@ .. cpp:var:: map<string, ModelStatus> model_status @@ @@ Status for each model, as a map from model name to the @@ status. @@
@@ .. cpp:var:: StatusRequestStats status_stats @@ @@ Statistics for Status requests. @@
@@ .. cpp:var:: HealthRequestStats health_stats @@ @@ Statistics for Health requests. @@
@@ .. cpp:var:: ModelControlRequestStats model_control_stats @@ @@ Statistics for ModelControl requests. @@
@@ .. cpp:var:: SharedMemoryControlRequestStats shm_control_stats @@ @@ [DEPRECATED] Statistics for SharedMemoryControl requests. @@
@@ .. cpp:var:: RepositoryRequestStats repository_stats @@ @@ Statistics for Repository requests. @@
@@ .. cpp:var:: message Register @@ @@ Register a shared memory region. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The name for this shared memory region. @@
@@ .. cpp:var:: oneof shared_memory_types @@ @@ Types of shared memory identifiers @@
@@ @@ .. cpp:var:: SystemSharedMemoryIdentifier system_shared_memory @@ @@ The identifier for this system shared memory region. @@
@@ @@ .. cpp:var:: CUDASharedMemoryIdentifier cuda_shared_memory @@ @@ The identifier for this CUDA shared memory region. @@
@@ .. cpp:var:: uint64 byte_size @@ @@ Size of the shared memory block, in bytes. @@
@@ @@ .. cpp:var:: message CUDASharedMemoryIdentifier @@ @@ The identifier for this system shared memory region. @@
Used in:
@@ .. cpp:var:: bytes raw_handle @@ @@ The raw serialized cudaIPC handle. @@
@@ .. cpp:var:: int64 device_id @@ @@ The GPU device ID on which the cudaIPC handle was created. @@
@@ @@ .. cpp:var:: message SystemSharedMemoryIdentifier @@ @@ The identifier for this system shared memory region. @@
Used in:
@@ .. cpp:var:: string shared_memory_key @@ @@ The name of the shared memory region that holds the input data @@ (or where the output data should be written). @@
@@ .. cpp:var:: uint64 offset @@ @@ This is the offset of the shared memory block from the start @@ of the shared memory region. @@ start = offset, end = offset + byte_size; @@
@@ .. cpp:var:: message GetStatus @@ @@ Get the status of all active shared memory regions. @@
Used in:
(message has no fields)
@@ .. cpp:var:: message Unregister @@ @@ Unregister a specified shared memory region. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The name for this shared memory region to unregister. @@
@@ .. cpp:var:: message UnregisterAll @@ @@ Unregister all shared memory regions. @@
Used in:
(message has no fields)
@@ @@.. cpp:var:: message SharedMemoryControlRequestStats @@ @@ Statistics for SharedMemoryControl requests @@ @@ [DEPRECATED] The message has been deprecated and will @@ always report 0. @@
Used in:
@@ .. cpp:var:: StatDuration success @@ @@ Total time required to handle successful SharedMemoryControl @@ requests, not including HTTP or gRPC endpoint termination time. @@
@@ @@.. cpp:var:: message Status @@ @@ Status of all active shared memory regions. @@
Used in:
@@ @@ .. cpp:var:: SharedMemoryRegion shared_memory_region @@ @@ The list of active/registered shared memory regions. @@
@@.. cpp:var:: message SharedMemoryRegion @@ @@ The meta-data for the shared memory region registered in the inference @@ server. @@
Used in: ,
@@ @@ .. cpp:var:: string name @@ @@ The name for this shared memory region. @@
@@ .. cpp:var:: oneof shared_memory_types @@ @@ Types of shared memory identifiers @@
@@ @@ .. cpp:var:: SystemSharedMemory system_shared_memory @@ @@ The status of this system shared memory region. @@
@@ @@ .. cpp:var:: CudaSharedMemory cuda_shared_memory @@ @@ The status of this CUDA shared memory region. @@
@@ .. cpp:var:: uint64 byte_size @@ @@ Size of the shared memory block, in bytes. @@
Used in:
@@ .. cpp:var:: int64 device_id @@ @@ The GPU device ID on which the cudaIPC handle was created. @@
Used in:
@@ .. cpp:var:: string shared_memory_key @@ @@ The name of the shared memory region that holds the input data @@ (or where the output data should be written). @@
@@ .. cpp:var:: uint64 offset @@ @@ This is the offset of the shared memory block from the start @@ of the shared memory region. @@ start = offset, end = offset + byte_size; @@
@@ @@.. cpp:var:: message SharedMemoryStatus @@ @@ Shared memory status for the inference server. @@
@@ @@ .. cpp:var:: SharedMemoryRegion shared_memory_region (repeated) @@ @@ The list of active/registered shared memory regions. @@
@@ @@.. cpp:var:: message StatDuration @@ @@ Statistic collecting a duration metric. @@
Used in: , , , , ,
@@ .. cpp:var:: uint64 count @@ @@ Cumulative number of times this metric occurred. @@
@@ .. cpp:var:: uint64 total_time_ns @@ @@ Total collected duration of this metric in nanoseconds. @@
@@ @@.. cpp:var:: message StatisticDuration @@ @@ Statistic recording a cumulative duration metric. @@
Used in:
@@ .. cpp:var:: uint64 count @@ @@ Cumulative number of times this metric occurred. @@
@@ .. cpp:var:: uint64 total_time_ns @@ @@ Total collected duration of this metric in nanoseconds. @@
@@ @@.. cpp:var:: message StatusRequestStats @@ @@ Statistics collected for Status requests. @@
Used in:
@@ .. cpp:var:: StatDuration success @@ @@ Total time required to handle successful Status requests, not @@ including HTTP or gRPC endpoint termination time. @@
@@ @@ .. cpp:var:: message RegionStatus @@ @@ Status for a shared memory region. @@
Used in:
@@ @@ .. cpp:var:: string name @@ @@ The name for the shared memory region. @@
@@ .. cpp:var:: string shared_memory_key @@ @@ The key of the underlying memory object that contains the @@ shared memory region. @@
@@ .. cpp:var:: uint64 offset @@ @@ Offset, in bytes, within the underlying memory object to @@ the start of the shared memory region. @@
@@ .. cpp:var:: uint64 byte_size @@ @@ Size of the shared memory region, in bytes. @@