Get desktop application:
View/edit binary Protocol Buffers messages
Allocate attempts to allocate a Unit for this invocation. If successful, invocations will keep the allocation alive by calling Refresh(). If the invocation is instead queued, invocations will continue polling with Allocate() until allocation is successful. Returns: * NOT_FOUND if the Unit is not known to the server * INVALID_ARGUMENT if the request is malformed (see request type for details)
Invocation details for this allocation
Returned when allocation is successful; the invocation should be able to proceed while simultaneously calling Refresh().
Returned when allocation is unsuccessful due to license contention. The invocation should continue to poll by calling Allocate() passing the invocation_id returned in this message.
Refresh refreshes a allocation while an invocation is still potentially using the Unit. If the invocation fails to refresh the allocation, the Unit may be allocated to another invocation. Returns: * NOT_FOUND if allocation is not known to the server, and the client should kill the invocation.
Existing invocation to refresh. Must have been allocated by a call to Allocate() successfully (not queued).
Invocation details for this allocation
Allocated Topology
AllocateResponse id that allocated this Unit.
Time at which the request license will be revoked. The client should issue another RefreshRequest for this invocation_id before this time.
Release immediately returns an allocated Unit to the pool. This should be called by clients to return Units so they can be quickly allocated to other queued invocations; otherwise, the server will need to wait for the client to time out issuing Refresh calls. Returns: * NOT_FOUND if the allocation is not known to the server.
AllocateResponse that allocated this license.
Empty response
(message has no fields)
Status returns all Unit configurations and their status.
Empty request
(message has no fields)
TODO: Fill in ACF fields
Used in:
(message has no fields)
Used in:
Opaque identifier for this invocation, determined by the server. This id should be used by the client in subsequent RefreshRequests for this Unit.
Time at which the request license will be revoked. The client should issue a RefreshRequest for this invocation_id before this time.
Actual allocated topology
Used in:
Unit currently allocated
Unit may be available soon, in deferred release state
Unit not currently allocated
Message used for server config file
OBSOLETED: Replaced with list of TopologyConfigs, below repeated UnitConfig units = 1;
List of known Topology configurations
Contains information about a single CPU on a host machine
Used in:
The 0-based idx of the CPU, as seen in /proc/cpuinfo
The family code for the CPU (eg. 25 for Zen)
The model string of the CPU (eg. AMD EPYC 7413 24-Core Processor)
The speed of the CPU, in MHz (eg. 2650.0)
The number of cores had by the CPU (eg. 24)
The amount of cache, represented by a string (eg. 512 KB) TODO: Parse the string properly, so we can have the cache size in int format
Allocates licenses so that they are spread evenly across users. This means that users can jump ahead of the line, or be bumped after new requests.
Used in:
(message has no fields)
Default prioritizer. Licenses are allocated in the order they are requested.
Used in:
(message has no fields)
Contains information about a single GPU on a host machine
Used in: ,
The bus_id of the GPU on the host (eg. 46:00.0)
The model string of the GPU (eg. NVIDIA Corporation GP104GL)
The model string of the GPU's card (eg. Tesla P4)
The amount of video memory available to the GPU, in megabytes
Used in:
Unit is healthy
Unit health is unknown
Unit health is broken/stuck/bad, try automated repair
Unit is being sidelined, do not attempt automated repair
Contains information about a single Host machine
Used in: , ,
hostname of the host (i.e. nc-gpu-17.rdu)
Indicates whether this host was reachable when the inventory was created
Indicates if the host's bb_clientd service was healthy at inventory creation time
Indicates if the host's bb_clientd_fuse service was healthy at inventory creation time
A list of information about GPUs found on this host (GpuInfo below)
A list of information about CPUs found on this host (CpuInfo below)
Contains the entire inventory of known hosts, along with information gathered about those hosts Map is hostname -> HostInfo (below)
if no fields are provided (empty message), any host will match
Used in:
if specified, you are requesting a specific host by hostname. Other fields will be ignored
if specified, return a host that has the given number of CPUs, or more
if specified, return a host that has the given number of GPUs, or more
Used in: ,
Opaque identifier for this invocation, determined by the server. This id should be used by the client in subsequent RefreshRequests for this Unit.
Owning entity issuing the allocation request. Used for logging purposes only. This could be the name of the user or system issuing the request. This must be sent on every Allocate() and Refresh() call in case the server is restarted.
TODO: for CI use, use the test target so we can estimate run duration. what else can we use this for?
topology request
Used in:
Opaque identifier for this invocation, determined by the server. This id associates the invocation with a specific spot in the queue; it should be used in subsequent Allocate RPCs or the invocation will be placed at the back of the queue.
Position of this invocation in the queue. Invocations in position 1 are next to be allocated, with higher positions getting allocations later than lower positions. The queue_position can increase or decrease over time depending on the prioritization strategy, which may allow users to jump ahead of the line, or be bumped to the end of the line.
Time at which client should retry Allocate. The client should issue its next rpc after this time; if it fails to poll for significantly longer (>5s) it may be moved to the back of the queue.
General options for the entire instance
Used in:
Interval on which actions should refresh their queue position while in queue for an allocation. Default: 15s
Interval on which actions should refresh their allocation while the action is executing. Default: 30s
Interval on which to clean up expired/released allocations and queue entries, and promote queued entries to allocations. Default: 1s
When the service first starts, for this period of time it holds off on allocating any new Units, and instead listens for and "adopts" allocations that it hears about via the "Refresh" RPC. This duration should be >= allocation_refresh_duration_seconds, to guarantee that all clients Refresh() their allocation before the server moves into the normal operating state. Default: 45s
OBSOLETED: replaced with UnitInfo, below Topology topology = 1;
Used in:
Current state of the Unit
Unit's info
Used in:
OBSOLETED: replaced with list of HostInfo, etc string config = 1;
Used in: ,
Unique name extracted from the config for speed/convenience.
start with just hosts in the topology for first step
Used in:
Strategy to distribute Units.
Used in:
if specified, you are requesting a known topology by name. Other fields will be ignored
if specified, checks for hosts that match the given criteria
A "polymorphic" representation of the various types of info a Unit can hold
Used in: