Get desktop application:
View/edit binary Protocol Buffers messages
ModelService provides methods to query and update the state of the server, e.g. which models/versions are being served.
Gets status of model. If the ModelSpec in the request does not specify version, information about all versions of the model will be returned. If the ModelSpec in the request does specify a version, the status of only that version will be returned.
GetModelStatusRequest contains a ModelSpec indicating the model for which to get status.
Model Specification. If version is not specified, information about all versions of the model will be returned. If a version is specified, the status of only that version will be returned.
Response for ModelStatusRequest on successful run.
Version number and status information for applicable model version(s).
Reloads the set of served models. The new config supersedes the old one, so if a model is omitted from the new config it will be unloaded and no longer served.
open source marker; do not remove PredictionService provides access to machine-learned models loaded by model_servers.
Classify.
GetModelMetadata - provides access to metadata for loaded models.
Model Specification indicating which model we are querying for metadata. If version is not specified, will use the latest (numerical) version.
Metadata fields to get. Currently supported: "signature_def".
Model Specification indicating which model this metadata belongs to.
Map of metadata field name to metadata field. The options for metadata field name are listed in GetModelMetadataRequest. Currently supported: "signature_def".
MultiInference API for multi-headed models.
Predict -- provides access to loaded TensorFlow model.
Regress.
SessionService defines a service with which a client can interact to execute Tensorflow model inference. The SessionService::SessionRun method is similar to MasterService::RunStep of Tensorflow, except that all sessions are ready to run, and you request a specific model/session with ModelSpec.
Runs inference of a given model.
Batching parameters. Each individual parameter is optional. If omitted, the default value from the relevant batching config struct (SharedBatchScheduler ::Options or BatchSchedulerRetrier::Options) is used.
SharedBatchScheduler options (see shared_batch_scheduler.h):
Used in:
The maximum size of each batch. IMPORTANT: As discussed above, use 'max_batch_size * 2' client threads to achieve high throughput with batching.
If a task has been enqueued for this amount of time (in microseconds), and a thread is available, the scheduler will immediately form a batch from enqueued tasks and assign the batch to the thread for processing, even if the batch's size is below 'max_batch_size'.
The maximum length of the queue, in terms of the number of batches. (A batch that has been scheduled on a thread is considered to have been removed from the queue.)
The number of threads to use to process batches. Must be >= 1, and should be tuned carefully.
The name to use for the pool of batch threads.
The allowed batch sizes. (Ignored if left empty.) Requirements: - The entries must be in increasing order. - The final entry must equal 'max_batch_size'.
Whether to pad variable-length inputs when a batch is formed.
A single class.
Used in:
Label or name of the class.
Score for this class (e.g., the probability the item belongs to this class). As per the proto3 default-value semantics, if the score is missing, it should be treated as 0.
Used as request type in: PredictionService.Classify
Used as field type in:
Model Specification. If version is not specified, will use the latest (numerical) version.
Input data.
Used as response type in: PredictionService.Classify
Used as field type in:
Effective Model Specification used for classification.
Result of the classification.
Contains one result per input example, in the same order as the input in ClassificationRequest.
Used in: ,
List of classes for a single item (tensorflow.Example).
Used in:
Used in:
Specifies one or more fully independent input Examples. See examples at: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto
Used in:
Specifies one or more independent input Examples, with a common context Example. The common use case for context is to cleanly and optimally specify some features that are common across multiple examples. See example below with a search query as the context and multiple restaurants to perform some inference on. context: { feature: { key : "query" value: { bytes_list: { value: [ "pizza" ] } } } } examples: { feature: { key : "cuisine" value: { bytes_list: { value: [ "Pizzeria" ] } } } } examples: { feature: { key : "cuisine" value: { bytes_list: { value: [ "Taqueria" ] } } } } Implementations of ExampleListWithContext merge the context Example into each of the Examples. Note that feature keys must not be duplicated between the Examples and context Example, or the behavior is undefined. See also: tensorflow/core/example/example.proto https://developers.google.com/protocol-buffers/docs/proto3#maps
Used in:
Config proto for FileSystemStoragePathSource.
The servables to monitor for new versions, and aspire.
A single servable name/base_path pair to monitor. DEPRECATED: Use 'servables' instead. TODO(b/30898016): Stop using these fields, and ultimately remove them here.
How long to wait between file-system polling to look for children of 'base_path', in seconds. For testing use only: a negative value disables the polling thread.
If true, then FileSystemStoragePathSource::Create() and ::UpdateConfig() fail if, for any configured servables, the file system doesn't currently contain at least one version under the base path. (Otherwise, it will emit a warning and keep pinging the file system to check for a version to appear later.)
A servable name and base path to look for versions of the servable.
Used in:
The servable name to supply in aspired-versions callback calls. Child paths of 'base_path' are considered to be versions of this servable.
The path to monitor, i.e. look for child paths of the form base_path/123.
The policy to determines the number of versions of the servable to be served at the same time.
A policy that dictates which version(s) of a servable should be served.
Used in: ,
Serve all versions found on disk.
Used in:
(message has no fields)
Serve the latest versions (i.e. the ones with the highest version numbers), among those found on disk. This is the default policy, with the default number of versions as 1.
Used in:
Number of latest versions to serve. (The default is 1.)
Serve a specific version (or set of versions). This policy is useful for rolling back to a specific version, or for canarying a specific version while still serving a separate stable version.
Used in:
The version numbers to serve.
Config proto for HashmapSourceAdapter.
The format used by the file containing a serialized hashmap.
Used in:
A simple kind of CSV text file of the form: key0,value0\n key1,value1\n ...
Inference result, matches the type of request or is an error.
Used in:
Inference request such as classification, regression, etc...
Used in:
Model Specification. If version is not specified, will use the latest (numerical) version. All ModelSpecs in a MultiInferenceRequest must access the same model name.
Signature's method_name. Should be one of the method names defined in third_party/tensorflow/python/saved_model/signature_constants.py. e.g. "tensorflow/serving/classify".
Used in: , ,
Used in:
Identifies the type of the LogCollector we will use to collect these logs.
The prefix to use for the filenames of the logs.
Metadata logged along with the request logs.
Used in:
TODO(b/33279154): Add more metadata as mentioned in the bug.
Configuration for logging query/responses.
Used in:
Common configuration for loading a model being served.
Used in:
Name of the model.
Base path to the model, excluding the version directory. E.g> for a model at /foo/bar/my_model/123, where 123 is the version, the base path is /foo/bar/my_model. (This can be changed once a model is in serving, *if* the underlying data remains the same. Otherwise there are no guarantees about whether the old or new data will be used for model versions currently loaded.)
Type of model. TODO(b/31336131): DEPRECATED. Please use 'model_platform' instead.
Type of model (e.g. "tensorflow"). (This cannot be changed once a model is in serving.)
Version policy for the model indicating which version(s) of the model to load and make available for serving simultaneously. The default option is to serve only the latest version of the model. (This can be changed once a model is in serving.)
String labels to associate with versions of the model, allowing inference queries to refer to versions by label instead of number. Multiple labels can map to the same version, but not vice-versa. An envisioned use-case for these labels is canarying tentative versions. For example, one can assign labels "stable" and "canary" to two specific versions. Perhaps initially "stable" is assigned to version 0 and "canary" to version 1. Once version 1 passes canary, one can shift the "stable" label to refer to version 1 (at that point both labels map to the same version -- version 1 -- which is fine). Later once version 2 is ready to canary one can move the "canary" label to version 2. And so on.
Configures logging requests and responses, to the model. (This can be changed once a model is in serving.)
Static list of models to be loaded for serving.
Used in:
ModelServer config.
Used in:
ModelServer takes either a static file-based model config list or an Any proto representing custom model config that is fetched dynamically at runtime (through network RPC, custom service, etc.).
Metadata for an inference request such as the model name and version.
Used in: , , , , , , , , , , , , ,
Required servable name.
Optional choice of which version of the model to use. Recommended to be left unset in the common case. Should be specified only when there is a strong version consistency requirement. When left unspecified, the system will serve the best available version. This is typically the latest version, though during version transitions, notably when serving on a fleet of instances, may be either the previous or new version.
Use this specific version number.
Use the version associated with the given label.
A named signature to evaluate. If unspecified, the default signature will be used.
The type of model. TODO(b/31336131): DEPRECATED.
Used in:
Version number, state, and status for a single version of a model.
Used in:
Model version.
Model state.
Model status.
States that map to ManagerState enum in tensorflow_serving/core/servable_state.h
Used in:
Default value.
The manager is tracking this servable, but has not initiated any action pertaining to it.
The manager has decided to load this servable. In particular, checks around resource availability and other aspects have passed, and the manager is about to invoke the loader's Load() method.
The manager has successfully loaded this servable and made it available for serving (i.e. GetServableHandle(id) will succeed). To avoid races, this state is not reported until *after* the servable is made available.
The manager has decided to make this servable unavailable, and unload it. To avoid races, this state is reported *before* the servable is made unavailable.
This servable has reached the end of its journey in the manager. Either it loaded and ultimately unloaded successfully, or it hit an error at some point in its lifecycle.
Configuration for monitoring.
Used in:
Inference request containing one or more requests.
Used as request type in: PredictionService.MultiInference
Used as field type in:
Inference tasks.
Input data.
Inference request containing one or more responses.
Used as response type in: PredictionService.MultiInference
Used as field type in:
List of results; one for each InferenceTask in the request, returned in the same order as the request.
Configuration for a servable platform e.g. tensorflow or other ML systems.
Used in:
The config proto for a SourceAdapter in the StoragePathSourceAdapter registry.
A map from a platform name to a platform config. The platform name is used in ModelConfig.model_platform.
Used in:
PredictRequest specifies which TensorFlow model to run, as well as how inputs are mapped to tensors and how outputs are filtered before returning to user.
Used as request type in: PredictionService.Predict
Used as field type in:
Model Specification. If version is not specified, will use the latest (numerical) version.
Input tensors. Names of input tensor are alias names. The mapping from aliases to real input tensor names is stored in the SavedModel export as a prediction SignatureDef under the 'inputs' field.
Output filter. Names specified are alias names. The mapping from aliases to real output tensor names is stored in the SavedModel export as a prediction SignatureDef under the 'outputs' field. Only tensors specified here will be run/fetched and returned, with the exception that when none is specified, all tensors specified in the named signature will be run/fetched and returned.
Response for PredictRequest on successful run.
Used as response type in: PredictionService.Predict
Used as field type in:
Effective Model Specification used to process PredictRequest.
Output tensors.
Logged model inference request.
Configuration for Prometheus monitoring.
Used in:
Whether to expose Prometheus metrics.
The endpoint to expose Prometheus metrics. If not specified, PrometheusExporter::kPrometheusPath value is used.
Used in:
Regression result for a single item (tensorflow.Example).
Used in:
Used as request type in: PredictionService.Regress
Used as field type in:
Model Specification. If version is not specified, will use the latest (numerical) version.
Input data.
Used as response type in: PredictionService.Regress
Used as field type in:
Effective Model Specification used for regression.
Contains one result per input example, in the same order as the input in RegressionRequest.
Used in: ,
One kind of resource on one device (or type of device).
Used in:
The type of device on which the resource resides, e.g. CPU or GPU.
A specific instance of the device of type 'device' to which the resources are bound (instances are assumed to be numbered 0, 1, ...). When representing the resources required by a servable that has yet to be loaded, this field is optional. If not set, it denotes that the servable's resources are not (yet) bound to a specific instance.
The kind of resource on the device (instance), e.g. RAM or compute share. A given type of resource should have a standard unit that represents the smallest useful quantization. We strongly recommend including the unit (e.g. bytes or millicores) in this string, as in "ram_bytes".
An allocation of one or more kinds of resources, along with the quantity of each. Used to denote the resources that a servable (or collection of servables) will use or is currently using. Also used to denote resources available to the serving system for loading more servables.
A collection of resources, each with a quantity. Treated as a resource-> quantity map, i.e. no resource can repeat and the order is immaterial.
Used in:
Configuration for a secure gRPC channel
private server key for SSL
public server certificate
custom certificate authority
valid client certificate required ?
Used in: ,
Requests will be logged uniformly at random with this probability. Valid range: [0, 1.0].
Config proto for SavedModelBundleSourceAdapter.
A SessionBundleConfig. FOR INTERNAL USE ONLY DURING TRANSITION TO SAVED_MODEL. WILL BE DEPRECATED. TODO(b/32248363): Replace this field with the "real" field(s).
Configuration parameters for a SessionBundle, with optional batching.
Used in: ,
The TensorFlow runtime to connect to. See full documentation in tensorflow/core/public/session_options.h. For single machine serving, we recommend using the empty string "", which will configure the local TensorFlow runtime implementation. This provides the best isolation currently available across multiple Session servables.
TensorFlow Session configuration options. See details at tensorflow/core/protobuf/config.proto.
If set, each emitted session is wrapped with a layer that schedules Run() calls in batches. The batching layer is transparent to the client (implements the tensorflow::Session API). IMPORTANT: With batching enabled, client threads will spend most of their time blocked on Session::Run() calls, waiting for enough peer threads to also call Session::Run() such that a large batch can be formed. For good throughput, we recommend setting the number of client threads equal to roughly twice the maximum batch size ('max_batch_size' below). The batching layer uses a SharedBatchScheduler to coordinate batching across multiple session servables emitted by this source adapter. A BatchSchedulerRetrier is added on top of each batching session.
If set, session run calls use a separate threadpool for restore and init ops as part of loading the session-bundle. The value of this field should correspond to the index of the tensorflow::ThreadPoolOptionProto defined as part of `session_config.session_inter_op_thread_pool`.
EXPERIMENTAL. THIS FIELD MAY CHANGE OR GO AWAY. USE WITH CAUTION. Transient memory used while loading a model, which is released once the loading phase has completed. (This is on top of the memory used in steady- state while the model is in memory after it has finished loading.) TODO(b/38376838): This is a temporary hack, and it applies to all models. Remove it once resource estimates are moved inside SavedModel.
Set of SavedModel tags identifying the specific meta graph def to be loaded.
EXPERIMENTAL. THIS FIELD MAY CHANGE OR GO AWAY. USE WITH CAUTION. Input tensors to append to every Session::Run() call.
Enables model warmup.
Config proto for SessionBundleSourceAdapter.
Used in:
Used as request type in: SessionService.SessionRun
Used as field type in:
Model Specification. If version is not specified, will use the latest (numerical) version.
Tensors to be fed in the step. Each feed is a named tensor.
Fetches. A list of tensor names. The caller expects a tensor to be returned for each fetch[i] (see RunResponse.tensor). The order of specified fetches does not change the execution order.
Target Nodes. A list of node names. The named nodes will be run to but their outputs will not be fetched.
Options for the run call. **Currently ignored.**
Used as response type in: SessionService.SessionRun
Used as field type in:
Effective Model Specification used for session run.
NOTE: The order of the returned tensors may or may not match the fetch order specified in RunRequest.
Returned metadata if requested in the options.
Message returned for "signature_def" field.
Config proto for StaticStoragePathSource.
The single servable name, version number and path to supply statically.
Status that corresponds to Status in third_party/tensorflow/core/lib/core/status.h.
Used in: ,
Error code.
Error message. Will only be set if an error was encountered.