Get desktop application:
View/edit binary Protocol Buffers messages
////////////////////////////// Called by clients. //////////////////////////////
Starts serving a model on N model servers.
(message has no fields)
Updates a published model.
(message has no fields)
Stops serving a model.
(message has no fields)
Lists actively serving models.
If empty, lists all actively serving models in the system.
Gets stats of a cell.
If empty, returns stats for all actively serving models in the system.
This counts the maximum number of servers that can be used to serve a servable model path. Clients can use this field to calculate their expected serving capacity.
Watches for changes of model server address(es) for a given model.
An ID to identify the model. Must be globally unique, e.g., /sax/bar/lm_cloud_spmd_1024b
Identifier of the admin server incarnation. If does not match the current admin server's id, the server will send back a full set and the current admin server's id.
The client has synchronized its local state about addresses of servers serving this model right before 'seqno'.
Waits for a certain number of replicas to be ready for a given model.
(message has no fields)
Periodically called by a model server to tell the admin server it has come/is online. The admin server keeps track of healthy model servers.
The network address and port identifying a model server, e.g., [1::2]:8888 An RPC server listens at this address for ModeletService.{Load, Unload, Status}, etc.
If non-empty, the server has a status http server at this address for diagnosis purpose. Otherwise, uses 'address'.
Client connects to server at this address if non-empty. Otherwise, uses 'address'.
(message has no fields)
Loads a model onto the model server.
TODO(yuanzx): Add a way to override static model parameters.
Key identifying the model to load.
Path of the model in Sax's model registry linked in the server binary. This is the name used to locate a model in Sax, e.g., lingvo.lm.lm_cloud.LmCloudSpmd1024B
Path to checkpoint, e.g., gs://model/path/checkpoints/checkpoint_00050000
ACLs protecting data methods supported by this model.
model config overrides, e.g. BATCH_SIZE: 1
(message has no fields)
Updates a model loaded on the model server.
Key identifying the model to load.
ACLs protecting data methods supported by this model.
Checkpoint path.
(message has no fields)
Unloads a model from the model server.
(message has no fields)
Exports a method of a model.
Key identifying the model to export.
The names of the method to export.
The Signatures of the exported methods. If unspecified, default to `serving_default` and only works when having one method_name specified. If exporting multiple method_names, the signatures need to be a list that corresponds to method names. e.g., if we export with `method_names : [Generate, GenerateStream]`, the signatures here need to be: ['signature_for_generate', 'signature_for_generate_stream'].
Path in which to save the exported model.
The format of the serialized model.
The RNG seed mode.
If true, enable the multi-device execution type for GPU.
(message has no fields)
Reports server status such as models loaded.
TODO(jiawenhao): Add MemoryStats and LoadStats. MemoryStats: Per-device/total used, free, etc. LoadStats: Per-model/method RPCs minute/hour/total.
Saves checkpoint of a model.
Key identifying the model to save.
Path to checkpoint, e.g., gs://model/path/checkpoints/checkpoint_00050000
(message has no fields)
Wake-up a dormant server.
(message has no fields)
(message has no fields)
Used in:
, ,items[method] specifies the access control list name for the given method. A method corresponds to the model data method, e.g., lm.score, lm.generate, vm.classify. The ACL name is up to the implementation to interpret, but in general the ACL name is a group name. E.g., the following ACLs opens scoring method to all and restricts generation method to the group foo. items { "lm.score" : "all" "lm.generate" : "foo" }
The file system root under which all Sax cell states are stored, e.g., gs://sax-data/
ACL protecting admin methods running in this cell, including publish, update, and unpublish. The content is up to the implementation to interpret, but in general it is a group name.
Used in:
If this is left unspecified, it is up to Sax to determine a mode that matches its native serving behavior. Currently, the native serving behavior is STATEFUL.
The exported method takes a uint32 tensor of shape `[batch_size]` and named `rng_seed`. `rng_seed[0]` will be used as the seed for the whole batch and other entries in `rng_seed` are ignored.
# The exported method uses an in-graph tf.random.uniform internally to generate the rng seed.
The exported method uses an in-graph tf.constant() as the random seed.
Used in:
Invalid.
The TensorFlow SavedModel format.
Used in:
, , , , , , , , , , , , , , , , , , , ,items[input_key] specifies value set for an input_key. E.g., the following extra inputs will change input.tempeature to 0.1 in sampling decode. items { "temperature" : "0.1" }
tensors[input_key] specifies tensors set for an input_key. E.g., the following extra inputs will change input.tensors as soft prompt. tensors { "prompt_embeddings" : [0.1, 0.2, 0.3, 0.4] } It is invalid for the same key to appear in both items and tensors.
strings[input_key] specifies value in string type set for an input_key. E.g., the following extra inputs will change input.strings as decoding constraint. strings { "regex" : "a*b*c*d*e*f*g*h*" } It is invalid if the same key has appeared in items and tensors.
Method stats shown on modelet home pages.
Used in:
The method name.
THe number of calls on this model/method waiting on the server.
The QPS of failed requests in the past minute.
The QPS of succeeded requests in the past minute.
The mean latency of succeeded requests in the past minute.
The 50 percentile latency of succeeded requests in the past minute.
The 95 percentile latency of succeeded requests in the past minute.
The 99 percentile latency of succeeded requests in the past minute.
The recent 10 batch sizes.
Used in:
only filled in if requested
Only filled if request.include_method_stats=true.
Used in:
Optionally, human readable explanation for the server state.
Used in:
The server is usable and ready to serve.
The server is offline and unusable now, but the job is able to be back online and become active when needed.
Used in:
The QPS of early rejected requests in the past 10s. Early rejected requests are requests that are rejected by the server with "kUnavailable" error before they are processed, e.g., due to server is dormant.
The state of a joined model server.
milliseconds since Unix epoch
model ID: status
model ID: error message
state
stats
e.g. IP:port
The configuration of a model.
Used in:
, , ,An ID to identify the model. Must be globally unique, e.g., /sax/test/lm_cloud_spmd_2b
Path to a model in Sax's model registry linked in the server binary, e.g., saxml.server.pax.lm.params.lm_cloud.LmCloudSpmd2B
Path to a checkpoint, e.g., gs://sax-data/checkpoints/checkpoint_00000000
The number of model servers to serve this model on. The admin server periodically examines active/available model servers and tries its best to keep these many replicas active for this model.
ACLs protecting data methods supported by this model.
ACL protecting admin methods running on this model, including update and unpublish. The content is up to the implementation to interpret, but in general it is a group name.
model config overrides, e.g. BATCH_SIZE: 1
Identifies specific deployment of the model to the SAX. It spans lifetime of the model from publish to unpublish. It's a random 128-bit number represented as an array of bytes.
The capabilities of a model server.
Used in:
,The server informs the admin which model paths it supported. Hence, it is expected that the admin will only ask this server to load models whose model paths in this list. E.g., saxml.lm.params.Gemma7B
A set of strings associated with this server. Each tag is a free form string. The admin may use these tags during the model assignment.
Used in:
,Used in:
,Used in:
number of active replicas
Used in:
,Unused: unloaded models are removed from responses.
This model is being loaded and can't serve yet.
This model is loaded and ready to serve.
This model failed to load or unload.
This model is being unloaded and can't serve anymore.
The state of a published model.
Used in:
Used in:
Tensors in float flattend to 1d. Reshaping information can be infered from model attributes or other extra inputs.
Used in:
Used in:
The seqno the client should use for the next Watch call.
If has_fullset is true, the server sends back the complete set in 'values' together with a sequence of changes in 'changelog'. If has_fullset is false, 'changelog' contains mutations within [req.seqno .. next_seqno).
Used in: