Get desktop application:
View/edit binary Protocol Buffers messages
Collection events apply to the collection as a whole.
Common fields shared between instances and collections.
Timestamp, in microseconds since the start of the trace.
What type of event is this?
The identity of the collection.
How latency-sensitive is the collection?
Was there any missing data? If so, why?
What type of collection is this?
Cluster-level scheduling priority for the collection.
The ID of the alloc set that this job is to run in, or NO_ALLOC_COLLECTION (only for jobs).
The user who runs the collection
Obfuscated name of the collection.
Obfuscated logical name of the collection.
ID of the collection that this is a child of. (Used for stopping a collection when the parent terminates.)
IDs of collections that must finish before this collection may start.
Maximum number of instances of this collection that may be placed on one machine (or 0 if unlimited).
Maximum number of instances of this collection that may be placed on machines connected to a single Top of Rack switch (or 0 if unlimited).
How/whether vertical scaling should be done for this collection.
The preferred cluster scheduler to use.
Collections are either jobs (which have tasks) or alloc sets (which have alloc instances).
Used in:
, ,Values used to indicate "not present" for special cases.
The thing is not bound to a machine.
The thing is bound to a dedicated machine.
The thing is not running in an alloc set.
The thing does not have an alloc instance index.
This enum is used in the 'type' field of the CollectionEvent and InstanceEvent tables.
Used in:
,The collection or instance was submitted to the scheduler for scheduling.
The collection or instance was marked not eligible for scheduling by the batch scheduler.
The collection or instance became eligible for scheduling.
The collection or instance started running.
The collection or instance was descheduled because of a higher priority collection or instance, or because the scheduler overcommitted resources.
The collection or instance was descheduled due to a failure.
The collection or instance completed normally.
The collection or instance was cancelled by the user or because a depended-upon collection died.
The collection or instance was presumably terminated, but due to missing data there is insufficient information to identify when or how.
The collection or instance was updated (scheduling class or resource requirements) while it was waiting to be scheduled.
The collection or instance was updated while it was scheduled somewhere.
Instance and collection events both share a common prefix, followed by specific fields. Information about an instance event (task or alloc instance).
Common fields shared between instances and collections.
Timestamp, in microseconds since the start of the trace.
What type of event is this?
The identity of the collection that this instance is part of.
How latency-sensitive is the instance?
Was there any missing data? If so, why?
What type of collection this instance belongs to.
Cluster-level scheduling priority for the instance.
(Tasks only) The ID of the alloc set that this task is running in, or NO_ALLOC_COLLECTION if it is not running in an alloc.
Begin: fields specific to instances The index of the instance in its collection (starts at 0).
The ID of the machine on which this instance is placed (or NO_MACHINE if not placed on one, or DEDICATED_MACHINE if it's on a dedicated machine).
(Tasks only) The index of the alloc instance that this task is running in, or NO_ALLOC_INDEX if it is not running in an alloc.
The resources requested when the instance was submitted or last updated.
Currently active scheduling constraints.
Information about resource consumption (usage) during a sample window (which is typically 300s, but may be shorter if the instance started and/or ended during a measurement window).
Sample window end points, in microseconds since the start of the trace.
ID of collection that this instance belongs to.
Index of this instance's position in that collection (starts at 0).
Unique ID of the machine on which the instance has been placed.
ID and index of the alloc collection + instance in which this instance is running, or NO_ALLOC_COLLECTION / NO_ALLOC_INDEX if it is not running inside an alloc.
Type of the collection that this instance belongs to.
Average (mean) usage over the measurement period.
Observed maximum usage over the measurement period. This measurement may be fully or partially missing in some cases.
Observed CPU usage during a randomly-sampled second within the measurement window. (No memory data is provided here.)
The memory limit imposed on this instance; normally, it will not be allowed to exceed this amount of memory.
Amount of memory that is used for the instance's file page cache in the OS kernel.
Average (mean) number of processor and memory cycles per instruction.
The average (mean) number of data samples collected per second (e.g., sample_rate=0.5 means a sample every 2 seconds on average).
CPU usage percentile data. The cpu_usage_distribution vector contains 10 elements, representing 0%ile (aka min), 10%ile, 20%ile, ... 90%ile, 100%ile (aka max) of the normalized CPU usage in NCUs. Note that the 100%ile may not exactly match the maximum_usage value because of interpolation effects.
The tail_cpu_usage_distribution vector contains 9 elements, representing 91%ile, 92%ile, 93%ile, ... 98%ile, 99%ile of the normalized CPU resource usage in NCUs.
How latency-sensitive a thing is to CPU scheduling delays when running on a machine, in increasing-sensitivity order. Note that this is _not_ the same as the thing's cluster-scheduling priority although latency-sensitive things do tend to have higher priorities.
Used in:
,Also known as "best effort".
Often used for batch jobs.
Used for latency-sensitive jobs.
Used for the most latency-senstive jobs.
A machine attribute update or (if time = 0) its initial value.
Timestamp, in microseconds since the start of the trace. [key]
Unique ID of the machine within the cluster. [key]
Obfuscated unique name of the attribute (unique across all clusters). [key]
Value of the attribute. If this is unset, then 'deleted' must be true.
True if the attribute is being deleted at this time.
A constraint represents a request for a thing to be placed on a machine (or machines) with particular attributes.
Used in:
Obfuscated name of the constraint.
Target value for the constraint (e.g., a minimum or equality).
Comparison operator.
Comparison operation between the supplied value and the machine's value. For EQUAL and NOT_EQUAL relationships, the comparison is a string comparison; for LESS_THAN, GREATER_THAN, etc., the values are converted to floating point numbers first; for PRESENT and NOT_PRESENT, the test is merely whether the supplied attribute exists for the machine in question, and the value field of the constraint is ignored.
Used in:
Machine events describe the addition, removal, or update (change) of a machine in the cluster at a particular time.
Timestamp, in microseconds since the start of the trace. [key]
Unique ID of the machine within the cluster. [key]
Specifies the type of event
Obfuscated name of the Top of Rack switch that this machine is attached to.
Available resources that the machine supplies. (Note: may be smaller than the physical machine's raw capacity.)
An obfuscated form of the machine platform (microarchitecture + motherboard design).
Did we detect possibly-missing data?
Used in:
Should never happen :-).
Machine added to the cluster.
Machine removed from cluster (usually due to failure or repairs).
Machine capacity updated (while not removed).
If we detect that data is missing, why do we know this?
Used in:
No data is missing.
We observed that a change to the state of a machine must have occurred from an internal state snapshot, but did not see a corresponding transition event during the trace.
Represents reasons why we synthesized a scheduler event to replace apparently missing data.
Used in:
,No data was missing.
A common structure for CPU and memory resource units. All resource measurements are normalized and scaled.
Used in:
, ,Normalized GCUs (NCUs).
Normalized RAM bytes.
Represents the type of scheduler that is handling a job.
Used in:
Handled by the default cluster scheduler.
Handled by a secondary scheduler, optimized for batch loads.
How the collection is verically auto-scaled.
Used in:
We were unable to determine the setting.
Vertical scaling was disabled, e.g., in the collection creation request.
Vertical scaling was enabled, with user-supplied lower and/or upper bounds for GCU and/or RAM.
Vertical scaling was enabled, with no user-provided bounds.