Get desktop application:
View/edit binary Protocol Buffers messages
this is a grpc version of the external model-mesh interface for managing and serving models
Registers a trained model to this model-mesh cluster
whether the model should be loaded immediately
if loadNow is true, whether this method should block until the load completes
OPTIONAL, ADVANCED - lastUsed timestamp to assign to newly registered model, for initial priority in cache. This should not typically be set (defaults to "recent")
Unregisters (deletes) a model from this model-mesh cluster, has no effect if the specified model isn't found
(message has no fields)
Returns the status of the specified model. See the ModelStatus enum
Ensures the model with the specified id is loaded in this model-mesh cluster
timestamp to use when touching the model, 0 for "now" (default)
whether to block until specified model completes loading
Creates a new vmodel id (alias) which maps to a new or existing concrete model, or sets the target model for an existing vmodel to a new or existing concrete model
if set and the vmodel does not already exist, it will be created with this owner. if set and the vmodel already exists, the existing vmodel's owner must match or else the call will fail with an ALREADY_EXISTS error
if true, the request will fail with NOT_FOUND if the vmodel does not already exist; if false, non-existent vmodel ids will be created
optional ModelInfo for target model - if provided then target model will be created, otherwise it's expected to already exist
whether the newly created target model should be automatically deleted once no longer referenced by any vmodel(s); applies only if modelInfo is provided
whether the new target model should be loaded immediately, even if the current active model isn't loaded (otherwise the target model will be loaded to the same scale as the current active model before it becomes the active model)
if true, the active model will be updated immediately, regardless of the relative states of the target and currently-active models
whether this method should block until the transition completes. if the vmodel didn't already exist and loadNow is set to true, this will cause the method to block until the target of the newly created vmodel has completed loading
if provided, the request will only succeed (atomically) if the value matches the vmodel's current targetModelId. If the provided value is equal to the targetModelId in this same request message, the request will succeed only if the vmodel doesn't already exist *or* exists with the same targetModelId (in the latter case having no effect)
Deletes a vmodel, optionally deleting any referenced concrete models at the same time
if provided the specified vmodel will be deleted only if its owner matches
(message has no fields)
Gets the status of a vmodel, including associated target/active model ids If the vmodel is not found, the returned VModelStatusInfo will have empty active and target model ids and an active model status of NOT_FOUND
if provided the specified vmodel must have matching owner or else the returned response will indicate not found
this is the internal "sidecar" API for interfacing with a colocated model runtime container
Load a model, return when model is fully loaded. Include size of loaded model in response if no additional cost. A gRPC error code of PRECONDITION_FAILED or INVALID_ARGUMENT should be returned if no attempt to load the model was made (so can be sure that no space remains used). Note that the RPC may be cancelled by model-mesh prior to completion, after which an unloadModel call will immediately be sent for the same model. To avoid state inconsistency and "leaking" memory, implementors should ensure that this case is properly handled, i.e. that the model doesn't remain loaded after returning successfully from this unloadModel call.
OPTIONAL - If nontrivial cost is involved in determining the size, return 0 here and do the sizing in the modelSize function
EXPERIMENTAL - Applies only if limitModelConcurrency = true was returned from runtimeStatus rpc. See RuntimeStatusResponse.limitModelConcurrency for more detail
Unload a previously loaded (or failed) model. Return when model is fully unloaded, or immediately if not found/loaded.
(message has no fields)
Predict size of not-yet-loaded model - must return almost immediately. Should not perform expensive computation or remote lookups. Should be a conservative estimate. NOTE: Implementation of this RPC is optional.
Calculate size (memory consumption) of currently-loaded model. NOTE: Implementation of this RPC is only required if models' size is not returned in the response to loadModel. If the size computation takes a nontrivial amount of time, it's better to return from loadModel immediately and implement this to perform the sizing separately.
Provide basic runtime status and parameters; called only during startup. Before returning a READY status, implementations should check for and purge any/all currently-loaded models. Since this is only called during startup, there should very rarely be any, but if there are it implies the model-mesh container restarted unexpectedly and such a purge must be done to ensure continued consistency of state and avoid over-committing resources.
(message has no fields)
memory capacity for static loaded models, in bytes
maximum number of model loads that can be in-flight at the same time
timeout for model loads in milliseconds
conservative "default" model size, such that "most" models are smaller than this
version string for this model server code
DEPRECATED - the value of this field is not used, it will be removed in a future update
Map containing information about specific inferencing gRPC methods exposed by this runtime, such as a path within the protobuf message indicating where the model id should be injected. If non-empty, and allowAnyMethod is not set to true, only RPCs of inference methods contained in this map will be forwarded to the runtime (acts as an allow-list). The method name keys in the map must be fully qualified, including the service name, i.e. "package.ServiceName/MethodName"
EXPERIMENTAL - Set to true to enable the mode where each loaded model reports a maximum inferencing concurrency via the maxConcurrency field of the LoadModelResponse message. Additional requests are queued in the modelmesh framework. Turning this on will also enable latency-based autoscaling for the models, which attempts to minimize request queueing time and requires no other configuration/tuning.
If true, any/all RPCs will be forwarded to the runtime irrespective of the service/method name. Otherwise, only those present in the methodInfos map will be permitted. NOTE that this will default to being effectively true if the methodInfos map is empty.
Parameters holding information necessary to locate and load a given model, optional and for use only by your model runtime logic - they are passed to the model runtime loadModel api each time the model is loaded. These should *not* be use to store large amounts of data - the size of the strings should be as small as possible.
Used in:
,arbitrary model metadata parameter, must be non-empty
arbitrary model metadata parameter
arbitrary model metadata parameter
Used as response type in: ModelMesh.ensureLoaded, ModelMesh.getModelStatus, ModelMesh.registerModel
Used as field type in:
Internal state of individual copies of this model - intended for debugging/advanced uses only. The top-level model status field should be sufficient for most cases. Arranged in reverse chronological order.
Used in:
id of instance in which the model copy resides
status of this copy, one of LOADING, LOADED, LOADING_FAILED, UNKNOWN
time of latest state change
Used in:
,model is not registered with the cluster
model is registered but not currently loaded anywhere
model is in the process of loading somewhere (and otherwise not loaded)
model is loaded in at least one cluster instance
model loading failed; will be retried periodically
Used in:
Optional path of protobuf field numbers, pointing to a string field within the RPC's request message that should be replaced with the model id for which the request applies to. All but the last field in the list must be of "embedded message" type, the last one must be of string type.
Used in:
not used yet
Used as response type in: ModelMesh.getVModelStatus, ModelMesh.setVModel
id of underlying model to which apply/prediction requests sent to this vmodel will be routed
if targetModelId is not equal to activeModelId then the vmodel is in a transitional state (waiting for the target model to be in an appropriate state before it's promoted to be the active model)
status of the currently active model
status of the target model, set only if targetModelId != activeModelId
the owner of this vmodel, if any
Used in:
vmodel is not registered with the cluster
vmodel is registered and in a steady-state (activeModelId == targetModelId)
vmodel is waiting for a new target model to be ready before transitioning to it (activeModelId != targetModelId)
the target model failed to load and so the transition is blocked; will be retried periodically so *may* automatically recover from this state