Get desktop application:
View/edit binary Protocol Buffers messages
* ResourceManagerService describes the internal interface of Resource Manager to other Peloton applications such as Job Manager and Placement Engine. This includes the EnqueueGangs and GetPlacements APIs called by Job Manager, and DequeueGangs and SetPlacements APIs called by Placement Engine.
* Enqueue a list of Gangs, each of which is a list of one or more tasks, to a given leaf resource pool for scheduling. The Gangs will be in PENDING state first and then transit to READY state when the resource pool has available resources. This method will be called by Job Manager when a new job is created or new Gangs are added. If any Gangs fail to enqueue, Job Manager should retry those failed Gangs.
ResourcePool
The list of gangs to enqueue
The reason for enqueuing the gang, needed for resmgr internal task state debugging. e.g. tasks returned by placement engine should have specific reason for why task cannot be placed thus returned.
* Dequeue a list of Gangs, each comprised of tasks that are in READY state for placement. The tasks will transit from READY to PLACING state after the return of this method. This method will be called by Placement Engine to retrieve a list of gangs for computing placement. If tasks are in PLACING state for too long in case of Placement Engine failures, the tasks will be timed out and transit back to READY state.
Max number of ready gangs to dequeue
Timeout in milliseconds if no gangs are ready
Task Type to identify which kind of tasks need to be dequeued
The list of gangs that have been dequeued
* Sets the placement information for successfully placed tasks and the reason for unsuccessful tasks. For successfully placed tasks the tasks will transit from PLACING to PLACED state after this call. This method will be called by Placement Engine after it computes the placement decision for those tasks. For unsuccessful tasks the tasks are returned back to the resource manager along with the reason for the failure. These tasks will be tried again at a later time for placement again.
List of successful task placements to set
List of failed task placements to return
* Get the placement information for a list of tasks. The tasks will transit from PLACED to LAUNCHING state after this call. This method is called by Job Manager to launch the tasks on Mesos. If the tasks are in LAUNCHING state for too long without transiting to RUNNING state, the tasks will be timedout and transit back to PLACED state.
Max number of placements to retrieve
Timeout in milliseconds if no placements
List of task placements to return
* Notifies task status updates to resource manager. This will be called by Host manager to notify resource manager on task status updates.
* Get the list of Tasks running on the the list of host provided. This information is needed from the placement engines to find out which tasks are running on which hosts so the placement engine can place tasks taking this information into account.
Task Type to identify which kind of tasks need to be dequeued, if this is left out all tasks will be returned.
This will return a map from hostname to a list of tasks running on the host.
* Get task to state map. This information is helpful for debug purpose.
optional jobID to filter out tasks
optional respoolID to filter out tasks
optional states to filter out tasks
This will return a map from task id to state. DEPRECATED
This will return a map from state to list of tasks.
* Returns the tasks which are waiting on resources in a resource pool in the order in which they were added, up to a max limit number of gangs. Eg specifying a limit of 10 would return pending tasks from the first 10 gangs in the queue. The tasks are grouped according to their gang membership since one gang can contain multiple tasks and it is the unit of scheduling.
Returns the pending tasks in a resource pool in the order in which they will be processed, grouped by the gang in which they belong.
respoolID of the pool
limit is the number of gangs to be returned.
* Response message for GetPendingTasks method Return errors: NOT_FOUND: if the resource pool is not found. INVALID_ARGUMENT: if the resource pool is not supplied or is not a leaf node INTERNAL: if failed to get pending tasks because of internal errors.
This will return a map from queue type to the pending gangs
* Kill Tasks kills/Delete the tasks in Resource Manager
Peloton Task Ids for
* Get the list of tasks to preempt. The tasks will transition from RUNNING to PREEMPTING state after the return of this method. This method will be called by the job manager to kill the tasks and re-enqueue them.
Max number of running tasks to dequeue
Timeout in milliseconds if no tasks are ready
DEPRECATED by preemptionCandidates The list of tasks that have been dequeued
The list of tasks to be preempted
* UpdateTasksState is used to let the resource manager know that the tasks in the request have been moved to corresponding state.
UpdateTasksStateRequest is the request message for updating task's state to a desired state in resource manager
List of UpdateTaskEntry
UpdateTasksStateResponse is the response message for UpdateTasksState
(message has no fields)
* GetOrphanTasks returns the list of orphan tasks in resource manager. This API is for debug purpose only.
GetOrphanTasksRequest is the request message for GetOrphanTasks
optional respoolID to filter out tasks
GetOrphanTasksResponse is the response message for GetOrphanTasksResponse
* GetHostsByScores returns a list of batch hosts by rankings, in which hosts are ranked by suitability for draining including number of running tasks, normalized task priorities, average task runtime, etc.
GetHostsByScoresRequest is the request message for GetHostsByScores
Max number of hosts to retrieve
GetHostsByScoresResponse is the response message for GetHostsByScores
The list of hosts to be moved to different partition
Used in:
Used in:
EnqueueGangsFailure will be return as part of failure in enqueue Gangs
Used in:
List of failed tasks in gangs which are failed to enqueue/requeue
ErrorCode returns the errorcode for the failure
Used in:
Error code UNKNOWN
Error code if task is failed to be enqueued/requeued
Error code if same task is already present
Error code if other tasks in gang failed
Used in:
Resmgr task which is failed to enqueue/requeue
Error message with failed reason
Error code associated with the failure by that caller can identify the failure
Used in:
Used in:
, ,List of tasks to be scheduled together
Used in:
Used in:
Used in:
Mesos task ID of the task.
State of the task.
Reason for the task being the current state.
Last time the state was updated
Depending on the state of the task, this can either mean the host where the task has been placed OR where the task is running. This field will not be set for tasks in PENDING and PLACING states.
List of pending tasks IDs in a gang
Used in:
List of pending gangs
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
* Placement describes the mapping of a list of tasks to a host so that Job Manager can launch the tasks on the host.
Used in:
, ,The name of the host where the tasks are placed
The Mesos agent ID of the host where the tasks are placed
The list of allocated ports which should be sufficient for all placed tasks
Type of the tasks in the placement. Note all tasks must belong to same type. By default the type is batch task.
The unique offer id of the offers on the host where the tasks are placed
The list of tasks to be placed
Task to be placed
Used in:
PreemptionCandidate represents a task which has been chosen to be preempted
Used in:
The unique ID of the task Deprecated in favor of task_id
The reason for choosing the task for preemption
The unique ID of the task
The reason for choosing the task for preemption
Used in:
Reserved for compatibility
Resource Preemption
Host maintenance
Used in:
,Used in:
Used in:
Used in:
Used in:
Represents a failed gang which couldn't be placed.
Used in:
The reason for the failure.
The gang which couldn't be placed.
Used in:
* Task describes a task instance at Resource Manager layer. Only includes the minimal set of fields required for Resource Manager and Placement Engine, such as resource config, constraint etc.
Used in:
, , , , , ,Name of the task
The unique ID of the task
The Job ID of the task for use cases like gang scheduling
The mesos task ID of the task
Resource config of the task
Priority of a task. Higher value takes priority over lower value when making scheduling decisions as well as preemption decisions
Whether the task is preemptible. If a task is not preemptible, then it will have to be launched using reserved resources.
List of user-defined labels for the task, these are used to enforce the constraint. These are copied from the TaskConfig.
Constraint on the labels of the host or tasks on the host that this task should run on. This is copied from the TaskConfig.
Type of the Task
Number of dynamic ports
Minimum number of running instances. Value > 1 indicates task is in scheduling gang of that size; task instanceID is in [0..minInstances-1]. If value <= 1, task is not in scheduling gang and is scheduled singly.
Hostname of the host on which the task is running on.
Whether this is a controler task. A controller is a special batch task which controls other tasks inside a job. E.g. spark driver tasks in a spark job will be a controller task.
This is the timeout for the placement, it needs to be set for different timeout value for each placement iteration. Also it will also be used for communicating between placement engine in case of Host reservation.
Retry count for how many cycles this task is failed to be Placed. This is needed for calculating next backoff period and decide when we need to do Host reservation.
Whether the task is revocable. If true, then it will be launched with usage slack resources. Revocable tasks will be killed by QoS controller, if resources are required by tasks to which initial allocation was done.
The name of the host where the instance should be running on upon restart. It is used for best effort in-place update/restart. When this field is set upon enqueuegang, the task would directly move to ready queue.
Number of attempts this task has been retried for placement in a cycle.
Flag to indicate the task is ready for host reservation
Preference for placing tasks of the job on hosts.
Used in:
* TaskType task type definition such as batch, service and infra agent.
Used in:
, , ,This is unknown type, this is also used in DequeueGangsRequest to indicate that we want tasks of any task type back.
Normal batch task
STATELESS task which is long running and will be restarted upon failures.
STATEFUL task which is using persistent volume and is long running
DAEMON task which has one instance running on each host for infra agents like muttley, m3collector etc.
Used in:
UpdateTaskStateEntry is the entry for UpdateTaskState Request will have list of UpdateTaskStateEntry
Used in:
Peloton Task ID
Mesos task ID for this instance
Desired state for the resource manager task