package waymo.open_dataset

Get desktop application:
View/edit binary Protocol Buffers messages

A segment of a lane with a given adjacent boundary.

optional int32 lane_start_index = 1
The index into the lane's polyline where this lane boundary starts.
optional int32 lane_end_index = 2
The index into the lane's polyline where this lane boundary ends.
optional int64 boundary_feature_id = 3
The adjacent boundary feature ID of the MapFeature for the boundary. This can either be a RoadLine feature or a RoadEdge feature.
optional RoadLine.RoadLineType boundary_type = 4
The adjacent boundary type. If the boundary is a road edge instead of a road line, this will be set to TYPE_UNKNOWN.

optional Vector2d center = 1
Box coordinates in image frame.
optional Vector2d size = 2
Dimensions of the box. length: dim x. width: dim y.
optional double heading = 3
The heading of the bounding box (in radians). The heading is the angle required to rotate +x to the surface normal of the box front face. It is normalized to [-pi, pi).

optional Vector3d center = 1
Box coordinates in image frame.
optional Vector3d size = 2
Dimensions of the box. length: dim x. width: dim y.
optional double heading = 3
The heading of the bounding box (in radians). The heading is the angle required to rotate +x to the surface normal of the box front face. It is normalized to [-pi, pi).

A breakdown generator defines a way to shard a set of objects such that users can compute metrics for different subsets of objects. Each breakdown generator comes with a unique breakdown generator ID.

Used in: DetectionMeasurements, DetectionMetrics, TrackingMeasurements, TrackingMetrics

optional Breakdown.GeneratorId generator_id = 1
The breakdown generator ID.
optional int32 shard = 2
The breakdown generator shard.
optional Label.DifficultyLevel difficulty_level = 3
The difficulty level.

Used in: Breakdown, Config

UNKNOWN = 0
ONE_SHARD = 1
Everything is in one shard.
OBJECT_TYPE = 2
Shard by object types.
RANGE = 3
Shard by box center distance.
TIME_OF_DAY = 4
Shard by time of the day at which the scene is.
LOCATION = 5
Shard by location of the scene.
WEATHER = 6
Shard by the weather of the scene.
VELOCITY = 7
Shard by object velocity.
ALL_BUT_SIGN = 8
All types except SIGN. This is NOT the same as ALL_NS in the leaderboard!! ALL_NS in the leaderboard is the mean of VEHICLE, PED, CYCLIST metrics.
SIZE = 9
Shard by the object size (the max of length, width, height).
CAMERA = 10
Shard by the corresponding camera.

Used in: Context

optional CameraName.Name name = 1
repeated double intrinsic = 2
1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}]. Note that this intrinsic corresponds to the images after scaling. Camera model: pinhole camera. Lens distortion: Radial distortion coefficients: k1, k2, k3. Tangential distortion coefficients: p1, p2. k_{1, 2, 3}, p_{1, 2} follows the same definition as OpenCV. https://en.wikipedia.org/wiki/Distortion_(optics) https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
optional Transform extrinsic = 3
Camera frame to vehicle frame.
optional int32 width = 4
Camera image size.
optional int32 height = 5
optional CameraCalibration.RollingShutterReadOutDirection rolling_shutter_direction = 6

Used in: CameraCalibration

UNKNOWN = 0
TOP_TO_BOTTOM = 1
LEFT_TO_RIGHT = 2
BOTTOM_TO_TOP = 3
RIGHT_TO_LEFT = 4
GLOBAL_SHUTTER = 5

All timestamps in this proto are represented as seconds since Unix epoch.

Used in: Frame

optional CameraName.Name name = 1
optional bytes image = 2
JPEG image.
optional Transform pose = 3
SDC pose.
optional Velocity velocity = 4
SDC velocity at 'pose_timestamp' below. The velocity value is represented at *global* frame. With this velocity, the pose can be extrapolated. r(t+dt) = r(t) + dr/dt * dt where dr/dt = v_{x,y,z}. dR(t)/dt = W*R(t) where W = SkewSymmetric(w_{x,y,z}) This differential equation solves to: R(t) = exp(Wt)*R(0) if W is constant. When dt is small: R(t+dt) = (I+W*dt)R(t) r(t) = (x(t), y(t), z(t)) is vehicle location at t in the global frame. R(t) = Rotation Matrix (3x3) from the body frame to the global frame at t. SkewSymmetric(x,y,z) is defined as the cross-product matrix in the following: https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication
optional double pose_timestamp = 5
Timestamp of the `pose` above.
optional double shutter = 6
Rolling shutter params. The following explanation assumes left->right rolling shutter. Rolling shutter cameras expose and read the image column by column, offset by the read out time for each column. The desired timestamp for each column is the middle of the exposure of that column as outlined below for an image with 3 columns: ------time------> |---- exposure col 1----| read | -------|---- exposure col 2----| read | --------------|---- exposure col 3----| read | ^trigger time ^readout end time ^time for row 1 (= middle of exposure of row 1) ^time image center (= middle of exposure of middle row) Shutter duration in seconds. Exposure time per column.
optional double camera_trigger_time = 7
Time when the sensor was triggered and when last readout finished. The difference between trigger time and readout done time includes the exposure time and the actual sensor readout time.
optional double camera_readout_done_time = 8
optional CameraSegmentationLabel camera_segmentation_label = 10
Panoptic segmentation labels for this camera image. NOTE: Not every image has panoptic segmentation labels.

The camera labels associated with a given camera image. This message indicates the ground truth information for the camera image recorded by the given camera. If there are no labeled objects in the image, then the labels field is empty.

Used in: Frame

optional CameraName.Name name = 1
repeated Label labels = 2

(message has no fields)

Used in: CameraCalibration, CameraImage, CameraLabels, CameraSegmentationFrame, CameraTokens, Object

UNKNOWN = 0
FRONT = 1
FRONT_LEFT = 2
FRONT_RIGHT = 3
SIDE_LEFT = 4
SIDE_RIGHT = 5
REAR_LEFT = 6
REAR = 7
REAR_RIGHT = 8

Semantic classes for the camera segmentation labels.

(message has no fields)

TYPE_UNDEFINED = 0
Anything that does not fit the other classes or is too ambiguous to label.
TYPE_EGO_VEHICLE = 1
The Waymo vehicle.
TYPE_CAR = 2
Small vehicle such as a sedan, SUV, pickup truck, minivan or golf cart.
TYPE_TRUCK = 3
Large vehicle that carries cargo.
TYPE_BUS = 4
Large vehicle that carries more than 8 passengers.
TYPE_OTHER_LARGE_VEHICLE = 5
Large vehicle that is not a truck or a bus.
TYPE_BICYCLE = 6
Bicycle with no rider.
TYPE_MOTORCYCLE = 7
Motorcycle with no rider.
TYPE_TRAILER = 8
Trailer attached to another vehicle or horse.
TYPE_PEDESTRIAN = 9
Pedestrian. Does not include objects associated with the pedestrian, such as suitcases, strollers or cars.
TYPE_CYCLIST = 10
Bicycle with rider.
TYPE_MOTORCYCLIST = 11
Motorcycle with rider.
TYPE_BIRD = 12
Birds, including ones on the ground.
TYPE_GROUND_ANIMAL = 13
Animal on the ground such as a dog, cat, cow, etc.
TYPE_CONSTRUCTION_CONE_POLE = 14
Cone or short pole related to construction.
TYPE_POLE = 15
Permanent horizontal and vertical lamp pole, traffic sign pole, etc.
TYPE_PEDESTRIAN_OBJECT = 16
Large object carried/pushed/dragged by a pedestrian.
TYPE_SIGN = 17
Sign related to traffic, including front and back facing signs.
TYPE_TRAFFIC_LIGHT = 18
The box that contains traffic lights regardless of front or back facing.
TYPE_BUILDING = 19
Permanent building and walls, including solid fences.
TYPE_ROAD = 20
Drivable road with proper markings, including parking lots and gas stations.
TYPE_LANE_MARKER = 21
Marking on the road that is parallel to the ego vehicle and defines lanes.
TYPE_ROAD_MARKER = 22
All markings on the road other than lane markers.
TYPE_SIDEWALK = 23
Paved walkable surface for pedestrians, including curbs.
TYPE_VEGETATION = 24
Vegetation including tree trunks, tree branches, bushes, tall grasses, flowers and so on.
TYPE_SKY = 25
The sky, including clouds.
TYPE_GROUND = 26
Other horizontal surfaces that are drivable or walkable.
TYPE_DYNAMIC = 27
Object that is not permanent in its current position and does not belong to any of the above classes.
TYPE_STATIC = 28
Object that is permanent in its current position and does not belong to any of the above classes.

Used in: CameraSegmentationFrameList

optional CameraSegmentationLabel camera_segmentation_label = 1
Segmentation label for a camera.
optional string context_name = 2
These must be set when evaluating on the leaderboard. This should be set to Context.name defined in dataset.proto::Context.
optional int64 frame_timestamp_micros = 3
This should be set to Frame.timestamp_micros defined in dataset.proto::Frame.
optional CameraName.Name camera_name = 4
The camera associated with this label.

Used in: CameraSegmentationSubmission

repeated CameraSegmentationFrame frames = 1

Panoptic (instance + semantic) segmentation labels for a given camera image. Associations can also be provided between each instance ID and a globally unique ID across all frames.

Used in: CameraImage, CameraSegmentationFrame

optional int32 panoptic_label_divisor = 1
The value used to separate instance_ids from different semantic classes. See the panoptic_label field for how this is used. Must be set to be greater than the maximum instance_id.
optional bytes panoptic_label = 2
A uint16 png encoded image, with the same resolution as the corresponding camera image. Each pixel contains a panoptic segmentation label, which is computed as: semantic_class_id * panoptic_label_divisor + instance_id. We set instance_id = 0 for pixels for which there is no instance_id. NOTE: Instance IDs in this label are only consistent within this camera image. Use instance_id_to_global_id_mapping to get cross-camera consistent instance IDs.
repeated CameraSegmentationLabel.InstanceIDToGlobalIDMapping instance_id_to_global_id_mapping = 3
optional string sequence_id = 4
The sequence id for this label. The above instance_id_to_global_id_mapping is only valid with other labels with the same sequence id.
optional bytes num_cameras_covered = 5
A uint8 png encoded image, with the same resolution as the corresponding camera image. The value on each pixel indicates the number of cameras that overlap with this pixel. Used for the weighted Segmentation and Tracking Quality (wSTQ) metric.

A mapping between each panoptic label with an instance_id and a globally unique ID across all frames within the same sequence. This can be used to match instances across cameras and over time. i.e. instances belonging to the same object will map to the same global ID across all frames in the same sequence. NOTE: These unique IDs are not consistent with other IDs in the dataset, e.g. the bounding box IDs.

Used in: CameraSegmentationLabel

optional int32 local_instance_id = 1
optional int32 global_instance_id = 2
optional bool is_tracked = 3
If false, the corresponding instance will not have consistent global ids between frames.

optional float wstq = 1
Panoptic segmentation metrics. weighted Segmentation Tracking and Quality.
optional float waq = 2
weighted Association Quality.
optional float miou = 3
mean Intersection over Union.
optional int32 frame_dt = 5
User reported, number of frames between inference.
optional float runtime_ms = 6
Runtime for the method in milliseconds.

Next ID: 10.

optional string account_name = 1
This must be set as the full email used to register at waymo.com/open.
optional string unique_method_name = 2
This name needs to be short, descriptive and unique. Only the latest result of the method from a user will show up on the leaderboard.
repeated string authors = 3
optional string affiliation = 4
optional string description = 5
optional string method_link = 6
Link to paper or other link that describes the method.
optional int32 frame_dt = 7
The number of frames skipped between each prediction during inference. Usually 0 (inference on every frame) or 1 (inference on every other frame). e.g. the validation and test groundtruth is provided with frame_dt = 1.
optional float runtime_ms = 8
(Optional) The time for the method to run in ms.
optional CameraSegmentationFrameList predicted_segmentation_labels = 9
Inference results.

Camera tokens for a single camera sensor.

Used in: FrameCameraTokens

optional CameraName.Name camera_name = 1
Camera sensor name.
repeated uint32 tokens = 2
Camera tokens is a sequence of integers corresonding to codebook indices.

A set of predictions for a single scenario.

Used in: MotionChallengeSubmission

optional string scenario_id = 1
The unique ID of the scenario being predicted. This ID must match the scenario_id field in the test or validation set tf.Example or scenario proto corresponding to this set of predictions.
oneof prediction_set
The predictions for the scenario. For the motion prediction challenge, populate the predictions field. For the interaction prediction challenge, populate the joint_predictions_field.
- PredictionSet single_predictions = 2
  Single object predictions. This must be populated for the motion prediction challenge.
- JointPrediction joint_prediction = 3
  Joint predictions for the interacting objects. This must be populated for the interaction prediction challenge.

Lidar data of a frame.

Used in: Scenario

repeated CompressedLaser lasers = 1
The Lidar data for each timestamp.
repeated LaserCalibration laser_calibrations = 2
Laser calibration data has the same length as that of lasers.
optional Transform pose = 3
Poses of the SDC corresponding to the track states for each step in the scenario, similar to the one in the Frame proto.

Compressed Laser data.

Used in: CompressedFrameLaserData

optional LaserName.Name name = 1
optional CompressedRangeImage ri_return1 = 2
optional CompressedRangeImage ri_return2 = 3

Range image is a 2d tensor. The first dimension (rows) represents pitch. The second dimension represents yaw (columns). Zlib compressed range images include: Raw range image: Raw range image with a non-empty 'range_image_pose_delta_compressed' which tells the vehicle pose of each range image cell. NOTE: 'range_image_pose_delta_compressed' is only populated for the first range image return. The second return has the exact the same range image pose as the first one.

Used in: CompressedLaser

optional bytes range_image_delta_compressed = 1
Zlib compressed [H, W, 4] serialized DeltaEncodedData message version which stores MatrixFloat. MatrixFloat range_image; range_image.ParseFromString(val); Inner dimensions are: * channel 0: range * channel 1: intensity * channel 2: elongation * channel 3: is in any no label zone.
optional bytes range_image_pose_delta_compressed = 4
Zlib compressed [H, W, 4] serialized DeltaEncodedData message version which stores MatrixFloat. To decompress (Please see the documentation for lidar delta encoding): string val = delta_encoder.decompress(range_image_pose_compressed); MatrixFloat range_image_pose; range_image_pose.ParseFromString(val); Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform from vehicle frame to global frame for every range image pixel. This is ONLY populated for the first return. The second return is assumed to have exactly the same range_image_pose_compressed. The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, meaning that rotating from the navigation to vehicle frame consists of a yaw, then pitch and finally roll rotation about the z, y and x axes respectively. All rotations use the right hand rule and are positive in the counter clockwise direction.

Configuration to compute detection/tracking metrics.

repeated float score_cutoffs = 1
Score cutoffs used to remove predictions with lower Object::score during matching in order to compute precision-recall pairs at different operating points.
optional int32 num_desired_score_cutoffs = 2
If `score_cutoffs` above is not set, the cutoffs are generated based on the score distributions in the predictions and produce `num_desired_score_cutoffs`. NOTE: this field is to be deprecated. Manually set score_cutoffs above to [0:0.01:1]. TODO: clean this up.
repeated Breakdown.GeneratorId breakdown_generator_ids = 3
Breakdown generator IDs. Note that users only need to specify the IDs but NOT other information about this generator such as number of shards.
repeated Difficulty difficulties = 4
This has the same size as breakdown_generator_ids. Each entry indicates the set of difficulty levels to be considered for each breakdown generator.
optional MatcherProto.Type matcher_type = 5
repeated float iou_thresholds = 6
Indexed by label type. Size = Label::TYPE_MAX+1. The thresholds must be within [0.0, 1.0].
optional Label.Box.Type box_type = 7
optional float desired_recall_delta = 8
Desired recall delta when sampling the P/R curve to compute mean average precision.
optional Config.LongitudinalErrorTolerantConfig let_metric_config = 12
optional float min_precision = 9
////////////////////////////////////////////////////////////////////////// Users do not need to modify the following features. ////////////////////////////////////////////////////////////////////////// If set, all precisions below this value is considered as 0.
optional float min_heading_accuracy = 10
Any matching with an heading accuracy lower than this is considered as false matching.
optional bool include_details_in_measurements = 11
When enabled, the details in the matching such as index of the false positives, false negatives or true positives will be included.

Longitudinal error tolerant (LET) metrics config for Camera-Only (Mono) 3D Detection. By enabling this metric, the prediction-groundtruth matching will be more tolerant to the longitudinal noise, rather than just use IoU. The tolerance is larger in the long range, but only along the line of sight from the sensor origin.

Used in: Config

optional bool enabled = 1
When enabled, calculate the longitudinal error tolerant 3D AP (LET-3D-AP).
optional LongitudinalErrorTolerantConfig.Location3D sensor_location = 2
Location of the sensor used to infer the predictions (e.g., camera). The location is related to the vehicle origin. It is used to translate the centers of prediction and ground truth boxes to the sensor cooridinate system so that the range to the sensor origin can be calculated correctly.
optional float longitudinal_tolerance_percentage = 3
The percentage of allowed longitudinal error for a given ground truth object. The final longitudinal tolerance tol_lon in meters given a ground truth object with range r_gt is computed as: tol_r = max(longitudinal_tolerance_percentage* r_gt, min_range_tolerance_meter), where min_longitudinal_tolerance_meter is introduced to handle near-range ground truth objects so that it has a minimum longitudinal error tolerance in meters. A prediction bounding box can be matched with a ground truth bounding box only if the range error between them is less than the tolerance.
optional float min_longitudinal_tolerance_meter = 4
optional LongitudinalErrorTolerantConfig.AlignType align_type = 5

Describes how a prediction box aligns with a ground truth box to minimize the longitudinal error.

Used in: LongitudinalErrorTolerantConfig

TYPE_UNKNOWN = 0
TYPE_NOT_ALIGNED = 1
No alignment is performed.
TYPE_RANGE_ALIGNED = 2
The center of the prediction box moves along the line of sight such that it has the closest distance to the center of the ground truth box.
TYPE_CENTER_ALIGNED = 3
The center of the prediction box moves to the center of the ground truth box, which means no localization error after alignment.
TYPE_FURTHER_ONLY_RANGE_ALIGNED = 4
The center of the prediction box moves along the line of sight such that it has the closest distance to the center of the ground truth box. Same as `TYPE_RANGE_ALIGNED` except this only applies if the prediction is beyond the ground truth. Example: given O is sensor origin, G ground truth center, and P prediction center (O -> G [P]) P will only be moved if it is beyond G in reference to O.
TYPE_ANY_CLOSER_ONLY_RANGE_ALIGNED = 5
The center of the prediction box moves along the line of sight such that it has the closest distance to the center of the ground truth box. Same as `TYPE_RANGE_ALIGNED` except this only applies if the prediction is before the ground truth in references to the sensor origin. Example: given O is sensor origin, G ground truth center, and P(1/2) the prediction center ([P1] O -> [P2] -> G ) P will only be moved if it is before G in reference to O.
TYPE_BETWEEN_ORIGIN_AND_GT_ONLY_RANGE_ALIGNED = 6
The center of the prediction box moves along the line of sight such that it has the closest distance to the center of the ground truth box. Same as `TYPE_RANGE_ALIGNED` except this only applies if the prediction is between the sensor origin and ground truth. Example: given O is sensor origin, G ground truth center, and P prediction center (O -> [P] -> G ) P will only be moved if it is between G and O.

Location in 3D space described in a Cartersian coordinate system.

Used in: LongitudinalErrorTolerantConfig

optional double x = 1
optional double y = 2
optional double z = 3

Used in: Frame

optional string name = 1
A unique name that identifies the frame sequence.
repeated CameraCalibration camera_calibrations = 2
repeated LaserCalibration laser_calibrations = 3
optional Context.Stats stats = 4

Some stats for the run segment used.

Used in: Context

repeated Stats.ObjectCount laser_object_counts = 1
repeated Stats.ObjectCount camera_object_counts = 5
optional string time_of_day = 2
Day, Dawn/Dusk, or Night, determined from sun elevation.
optional string location = 3
Human readable location (e.g. CHD, SF) of the run segment.
optional string weather = 4
Currently either Sunny or Rain.

Used in: Stats

optional Label.Type type = 1
optional int32 count = 2
The number of unique objects with the type in the segment.

Used in: MapFeature

repeated MapPoint polygon = 1
The polygon defining the outline of the crosswalk. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).

Delta Encoded data structure. The protobuf compressed mask and residual data and the compressed data is encoded via zlib: compressed_bytes = zlib.compress( metadata + data_bytes + mask_bytes + residuals_bytes) The range_image_delta_compressed and range_image_pose_delta_compressed in the CompressedRangeImage are both encoded using this method.

repeated sint64 residual = 1
repeated uint32 mask = 2
optional Metadata metadata = 3

Used in: DetectionMeasurements

optional int32 num_fps = 1
Number of false positives.
optional int32 num_tps = 2
Number of true positives.
optional int32 num_fns = 3
Number of false negatives.
repeated DetectionMeasurement.Details details = 6
If set, will include the ids of the fp/tp/fn objects. Each element corresponds to one frame of matching.
optional float sum_ha = 4
Sum of heading accuracy (ha) for all TPs.
optional float sum_longitudinal_affinity = 7
Sum of longitudinal affinity for all TPs.
optional float score_cutoff = 5
The score cutoff used to compute this measurement. Optional.

Detailed information regarding the results.

Used in: DetectionMeasurement

repeated string fp_ids = 1
False positive prediction ids.
repeated string fn_ids = 2
False negative ground truth ids.
repeated string tp_gt_ids = 3
True positive ground truth ids. Should be of the same length with tp_pr_ids, tp_ious. Each pair of ids of the same index correspond to the ids of ground truth object and prediction objects which are matched.
repeated string tp_pr_ids = 4
True positive prediction ids.
repeated float tp_ious = 5
IoU values of the true positive pairs.
repeated float tp_heading_accuracies = 6
Heading accuracies of the true positive pairs.
repeated float tp_longitudinal_affinities = 7
Longitudinal affinities of the true positive pairs.

Used in: DetectionMetrics

repeated DetectionMeasurement measurements = 1
optional Breakdown breakdown = 2
The breakdown the detection measurements are computed for.

optional float mean_average_precision = 1
optional float mean_average_precision_ha_weighted = 2
Heading accuracy weighted mean average precision.
optional float mean_average_precision_longitudinal_affinity_weighted = 10
Longitudinal affinity weighted mean average precision.
repeated float precisions = 3
repeated float recalls = 4
repeated float precisions_ha_weighted = 5
repeated float recalls_ha_weighted = 6
repeated float precisions_longitudinal_affinity_weighted = 11
repeated float recalls_longitudinal_affinity_weighted = 12
repeated float score_cutoffs = 7
optional Breakdown breakdown = 8
The breakdown the detection metrics are computed for.
optional DetectionMeasurements measurements = 9
Raw measurements.

A set of difficulty levels.

Used in: Config

repeated Label.DifficultyLevel levels = 1
If no levels are set, the highest difficulty level is assumed.

Used in: MapFeature

repeated MapPoint polygon = 1
The polygon defining the outline of the driveway region. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).

The dynamic map information at a single time step.

Used in: Scenario

repeated TrafficSignalLaneState lane_states = 1
The traffic signal states for all observed signals at this time step.

Used in: Map

optional double timestamp_seconds = 1
The timestamp associated with the dynamic feature data.
repeated TrafficSignalLaneState lane_states = 2
The set of traffic signal states for the associated time step.

Message packaging a full submission to the challenge.

repeated FrameTrajectoryPredictions predictions = 1
The set of trajectories to evaluate. One entry should exist for every frame in the test set.
optional E2EDChallengeSubmission.SubmissionType submission_type = 2
Identifier of the submission type. Has to be set for the submission to be valid.
optional string account_name = 3
This must be set as the full email used to register at waymo.com/open.
optional string unique_method_name = 4
This name needs to be short, descriptive and unique. Only the latest result of the method from a user will show up on the leaderboard.
repeated string authors = 5
Author information.
optional string affiliation = 6
optional string description = 7
A brief description of the method.
optional string method_link = 8
Link to paper or other link that describes the method.
optional bool uses_public_model_pretraining = 11
Set this to true if your model used publicly available open-source LLM/VLM(s) for pre-training. This field is now REQUIRED for a valid submission.
repeated string public_model_names = 13
If any open-source model was used, specify their names and configuration.
optional string num_model_parameters = 12
Specify an estimate of the number of parameters of the model used to generate this submission. The number must be specified as an integer number followed by a multiplier suffix (from the set [K, M, B, T, ...], e.g. "200K"). This field is now REQUIRED for a valid submission.

The challenge submission type.

Used in: E2EDChallengeSubmission

UNKNOWN = 0
E2ED_SUBMISSION = 1
A submission for the Waymo open dataset end-to-end driving challenge.

This proto contains the Waymo Open Dataset End-to-End Driving (E2ED) data format.

optional Frame frame = 1
WOD frame object populated with camera image, calibration, and metadata. Populated fields: frame.context .name = unique identifier for this frame. .camera_calibrations = calibration metadata for all cameras. All other fields in `frame.context` are unused. frame.timestamp_micros = current frame timestamp. frame.images = camera images. All other fields in `frame` are unused. For details about frame.context.camera_calibrations and frame.images, see the CameraCalibration and CameraImage protos.
optional EgoTrajectoryStates future_states = 5
t = (0, 5s] future log states at 4Hz. Only position fields are populated. Future position x,y coords are used as prediction targets. z coords are included for visualization, but are not used as prediction targets.
optional EgoTrajectoryStates past_states = 6
t = (-4s, 0] past history states at 4Hz.
optional EgoIntent.Intent intent = 7
Driving intent of the ego-vehicle at this timestep.
repeated EgoTrajectoryStates preference_trajectories = 8
Future trajectories with human-labeled rater scores. Only x,y position fields are populated, along with the rated score. This field is valid for only a subset of frames. For these frames, there are up to 3 rated trajectories. In all other frames, this field is marked as invalid with assigned rater scores of -1 or left empty. Valid scores range from [0, 10].

optional float average_score = 1
The final score averaged over all scenario clusters.
optional float construction_score = 2
Rater feedback scores for each scenario cluster.
optional float intersection_score = 3
optional float pedestrian_score = 4
optional float cyclist_score = 5
optional float multi_lane_maneuver_score = 6
optional float single_lane_maneuver_score = 7
optional float cut_in_score = 8
optional float foreign_object_debris_score = 9
optional float special_vehicle_score = 10
optional float spotlight_score = 11
optional float others_score = 12
optional float ade_at_three_sec = 13
First, we compute per frame ADE using the ground truth trajectory with the highest rater score. Then, we average the ADE scores over all frames in the test set.
optional float ade_at_five_sec = 14

(message has no fields)

Driving intent of the ego-vehicle at a given timestep.

Used in: E2EDFrame

UNKNOWN = 0
GO_STRAIGHT = 1
GO_LEFT = 2
GO_RIGHT = 3

Used in: E2EDFrame

repeated float pos_x = 1
Position in meters. Right-handed coordinate system. +x = forward, +y = left, +z = up. The origin (0, 0, 0) is at the middle of the ego vehicle's rear axle.
repeated float pos_y = 2
repeated float pos_z = 3
repeated float vel_x = 4
Velocity in m/s.
repeated float vel_y = 5
repeated float accel_x = 6
Acceleration in m/s^2.
repeated float accel_y = 7
optional float preference_score = 8
Only populated for trajectories with human-labeled scores. Valid scores range from [0, 10], inclusive.

Used in: E2EDFrame

optional Context context = 1
This context is the same for all frames belong to the same driving run segment. Use context.name to identify frames belong to the same driving segment. We do not store all frames from one driving segment in one proto to avoid huge protos.
optional int64 timestamp_micros = 2
Frame start time, which is the timestamp of the first top LiDAR scan within this frame. Note that this timestamp does not correspond to the provided vehicle pose (pose).
optional Transform pose = 3
Frame vehicle pose. Note that unlike in CameraImage, the Frame pose does not correspond to the provided timestamp (timestamp_micros). Instead, it roughly (but not exactly) corresponds to the vehicle pose in the middle of the given frame. The frame vehicle pose defines the coordinate system which the 3D laser labels are defined in.
repeated CameraImage images = 4
The camera images.
repeated Laser lasers = 5
The LiDAR sensor data.
repeated Label laser_labels = 6
Native 3D labels that correspond to the LiDAR sensor data. The 3D labels are defined w.r.t. the frame vehicle pose coordinate system (pose).
repeated CameraLabels projected_lidar_labels = 9
The native 3D LiDAR labels (laser_labels) projected to camera images. A projected label is the smallest image axis aligned rectangle that can cover all projected points from the 3d LiDAR label. The projected label is ignored if the projection is fully outside a camera image. The projected label is clamped to the camera image if it is partially outside.
repeated CameraLabels camera_labels = 8
Native 2D camera labels. Note that if a camera identified by CameraLabels.name has an entry in this field, then it has been labeled, even though it is possible that there are no labeled objects in the corresponding image, which is identified by a zero sized CameraLabels.labels.
repeated Polygon2dProto no_label_zones = 7
No label zones in the *global* frame.
repeated MapFeature map_features = 10
Map features. Only the first frame in a segment will contain map data. This field will be empty for other frames as the map is identical for all frames.
optional Vector3d map_pose_offset = 11
Map pose offset. This offset must be added to lidar points from this frame to compensate for pose drift and align with the map features.

Camera tokens for all sensors of a frame.

Used in: Scenario

repeated CameraTokens camera_tokens = 1
Camera tokens for all sensors in a frame.

Used in: E2EDChallengeSubmission

optional string frame_name = 1
The unique identifier for this frame. This should match the name field in the Context proto (E2EDFrame.frame.context.name).
optional TrajectoryPrediction trajectory = 2
The ego-vehicle future trajectory prediction for this frame.

Used in: ChallengeScenarioPredictions

repeated ScoredJointTrajectory joint_trajectories = 1
A set of up to 6 predictions with varying confidences - all for the same pair of objects. All prediction entries must contain trajectories for the same set of objects or an error will be returned. Any joint predictions past the first six will be discarded.

Used in: ScenarioRollouts

repeated SimulatedTrajectory simulated_trajectories = 1
Collection of simulated objects trajectories defining a full simulated scene. This needs to be the product of a joint simulation of all the included objects. An object is to be included if is valid in the last history step of the original scenario (11th step).

A message containing a prediction for either a single object or a joint prediction for a set of objects.

Used in: MultimodalPrediction

repeated SingleTrajectory trajectories = 2
The trajectories for each object in the set being predicted. This may contain a single trajectory for a single object or a set of trajectories representing a joint prediction of a set of objects.
optional float confidence = 3
An optional confidence measure for this prediction. These should not be normalized across the set of trajectories.

Used in: CameraLabels, Frame, Object

optional Label.Box box = 1
optional Label.Metadata metadata = 2
optional Label.Type type = 3
optional string id = 4
Object ID.
optional Label.DifficultyLevel detection_difficulty_level = 5
Difficulty level for detection problem.
optional Label.DifficultyLevel tracking_difficulty_level = 6
Difficulty level for tracking problem.
optional int32 num_lidar_points_in_box = 7
The total number of lidar points in this box.
optional int32 num_top_lidar_points_in_box = 13
The total number of top lidar points in this box.
oneof keypoints_oneof
- keypoints.LaserKeypoints laser_keypoints = 8
  Used if the Label is a part of `Frame.laser_labels`.
- keypoints.CameraKeypoints camera_keypoints = 9
  Used if the Label is a part of `Frame.camera_labels`.
optional Label.Association association = 10
optional string most_visible_camera_name = 11
Used by Lidar labels to store in which camera it is mostly visible.
optional Label.Box camera_synced_box = 12
Used by Lidar labels to store a camera-synchronized box corresponding to the camera indicated by `most_visible_camera_name`. Currently, the boxes are shifted to the time when the most visible camera captures the center of the box, taking into account the rolling shutter of that camera. Specifically, given the object box living at the start of the Open Dataset frame (t_frame) with center position (c) and velocity (v), we aim to find the camera capture time (t_capture), when the camera indicated by `most_visible_camera_name` captures the center of the object. To this end, we solve the rolling shutter optimization considering both ego and object motion: t_capture = image_column_to_time( camera_projection(c + v * (t_capture - t_frame), transform_vehicle(t_capture - t_ref), cam_params)), where transform_vehicle(t_capture - t_frame) is the vehicle transform from a pose reference time t_ref to t_capture considering the ego motion, and cam_params is the camera extrinsic and intrinsic parameters. We then move the label box to t_capture by updating the center of the box as follows: c_camra_synced = c + v * (t_capture - t_frame), while keeping the box dimensions and heading direction. We use the camera_synced_box as the ground truth box for the 3D Camera-Only Detection Challenge. This makes the assumption that the users provide the detection at the same time as the most visible camera captures the object center.

Information to cross reference between labels for different modalities.

Used in: Label

optional string laser_object_id = 1
Currently only CameraLabels with class `TYPE_PEDESTRIAN` store information about associated lidar objects.

Upright box, zero pitch and roll.

Used in: Label

optional double center_x = 1
Box coordinates in vehicle frame.
optional double center_y = 2
optional double center_z = 3
optional double length = 5
Dimensions of the box. length: dim x. width: dim y. height: dim z.
optional double width = 4
optional double height = 6
optional double heading = 7
The heading of the bounding box (in radians). The heading is the angle required to rotate +x to the surface normal of the box front face. It is normalized to [-pi, pi).

Used in: Config

TYPE_UNKNOWN = 0
TYPE_3D = 1
7-DOF 3D (a.k.a upright 3D box).
TYPE_2D = 2
5-DOF 2D. Mostly used for laser top down representation.
TYPE_AA_2D = 3
Axis aligned 2D. Mostly used for image.

The difficulty level of this label. The higher the level, the harder it is.

Used in: Breakdown, Difficulty, Label

UNKNOWN = 0
LEVEL_1 = 1
LEVEL_2 = 2

Used in: Label

optional double speed_x = 1
optional double speed_y = 2
optional double speed_z = 5
optional double accel_x = 3
optional double accel_y = 4
optional double accel_z = 6

Used in: Context.Stats.ObjectCount, Label, Submission

TYPE_UNKNOWN = 0
TYPE_VEHICLE = 1
TYPE_PEDESTRIAN = 2
TYPE_SIGN = 3
TYPE_CYCLIST = 4

Used in: MapFeature

optional double speed_limit_mph = 1
The speed limit for this lane.
optional LaneCenter.LaneType type = 2
optional bool interpolating = 3
True if the lane interpolates between two other lanes.
repeated MapPoint polyline = 8
The polyline data for the lane. A polyline is a list of points with segments defined between consecutive points.
repeated int64 entry_lanes = 9
A list of IDs for lanes that this lane may be entered from.
repeated int64 exit_lanes = 10
A list of IDs for lanes that this lane may exit to.
repeated BoundarySegment left_boundaries = 13
The boundaries to the left of this lane. There may be different boundary types along this lane. Each BoundarySegment defines a section of the lane with a given boundary feature to the left. Note that some lanes do not have any boundaries (i.e. lane centers in intersections).
repeated BoundarySegment right_boundaries = 14
The boundaries to the right of this lane. See left_boundaries for details.
repeated LaneNeighbor left_neighbors = 11
A list of neighbors to the left of this lane. Neighbor lanes include only adjacent lanes going the same direction.
repeated LaneNeighbor right_neighbors = 12
A list of neighbors to the right of this lane. Neighbor lanes include only adjacent lanes going the same direction.

Type of this lane.

Used in: LaneCenter

TYPE_UNDEFINED = 0
TYPE_FREEWAY = 1
TYPE_SURFACE_STREET = 2
TYPE_BIKE_LANE = 3

Used in: LaneCenter

optional int64 feature_id = 1
The feature ID of the neighbor lane.
optional int32 self_start_index = 2
The self adjacency segment. The other lane may only be a neighbor for only part of this lane. These indices define the points within this lane's polyline for which feature_id is a neighbor. If the lanes are neighbors at disjoint places (e.g., a median between them appears and then goes away) multiple neighbors will be listed. A lane change can only happen from this segment of this lane into the segment of the neighbor lane defined by neighbor_start_index and neighbor_end_index.
optional int32 self_end_index = 3
optional int32 neighbor_start_index = 4
The neighbor adjacency segment. These indices define the valid portion of the neighbor lane's polyline where that lane is a neighbor to this lane. A lane change can only happen into this segment of the neighbor lane from the segment of this lane defined by self_start_index and self_end_index.
optional int32 neighbor_end_index = 5
repeated BoundarySegment boundaries = 6
A list of segments within the self adjacency segment that have different boundaries between this lane and the neighbor lane. Each entry in this field contains the boundary type between this lane and the neighbor lane along with the indices into this lane's polyline where the boundary type begins and ends.

Used in: Frame, SegmentationFrame

optional LaserName.Name name = 1
optional RangeImage ri_return1 = 2
optional RangeImage ri_return2 = 3

Used in: CompressedFrameLaserData, Context

optional LaserName.Name name = 1
repeated double beam_inclinations = 2
If non-empty, the beam pitch (in radians) is non-uniform. When constructing a range image, this mapping is used to map from beam pitch to range image row. If this is empty, we assume a uniform distribution.
optional double beam_inclination_min = 3
beam_inclination_{min,max} (in radians) are used to determine the mapping.
optional double beam_inclination_max = 4
optional Transform extrinsic = 5
Lidar frame to vehicle frame.

'Laser' is used interchangeably with 'Lidar' in this file.

(message has no fields)

Used in: CompressedLaser, Laser, LaserCalibration

UNKNOWN = 0
TOP = 1
FRONT = 2
SIDE_LEFT = 3
SIDE_RIGHT = 4
REAR = 5

repeated MapFeature map_features = 1
The full set of map features.
repeated DynamicState dynamic_states = 2
A set of dynamic states per time step. These are ordered in consecutive time steps.

Used in: Frame, Map, Scenario

optional int64 id = 1
A unique ID to identify this feature.
oneof feature_data
Type specific data.
- LaneCenter lane = 3
- RoadLine road_line = 4
- RoadEdge road_edge = 5
- StopSign stop_sign = 7
- Crosswalk crosswalk = 8
- SpeedBump speed_bump = 9
- Driveway driveway = 10

Used in: Crosswalk, Driveway, LaneCenter, RoadEdge, RoadLine, SpeedBump, StopSign, TrafficSignalLaneState

optional double x = 1
Position in meters. The origin is an arbitrary location.
optional double y = 2
optional double z = 3

Different types of matchers can be supported. Each matcher has a unique ID.

(message has no fields)

Used in: Config

TYPE_UNKNOWN = 0
TYPE_HUNGARIAN = 1
The Hungarian algorithm based matching that maximizes the sum of IoUs of all matched pairs. Detection scores have no effect on this matcher. https://en.wikipedia.org/wiki/Hungarian_algorithm
TYPE_SCORE_FIRST = 2
A COCO-style matcher: matches detections (ordered by scores) one by one to the groundtruth of largest IoUs.
TYPE_HUNGARIAN_TEST_ONLY = 100
TEST ONLY.

Row-major matrix. Requires: data.size() = product(shape.dims()).

Used in: RangeImage

repeated float data = 1
optional MatrixShape shape = 2

Row-major matrix. Requires: data.size() = product(shape.dims()).

repeated int32 data = 1
optional MatrixShape shape = 2

Used in: MatrixFloat, MatrixInt32

repeated int32 dims = 1
Dimensions for the Matrix messages defined below. Must not be empty. The order of entries in 'dims' matters, as it indicates the layout of the values in the tensor in-memory representation. The first entry in 'dims' is the outermost dimension used to lay out the values; the last entry is the innermost dimension. This matches the in-memory layout of row-major matrices.

Metadata used for delta encoder.

Used in: DeltaEncodedData

repeated int32 shape = 1
Range image's shape information in the compressed data.
repeated float quant_precision = 2
Range image quantization precision for each range image channel.

A set of ScenarioPredictions protos. A ScenarioPredictions proto for each example in the test or validation set must be included for a valid submission.

optional string account_name = 3
This must be set as the full email used to register at waymo.com/open.
optional string unique_method_name = 4
This name needs to be short, descriptive and unique. Only the latest result of the method from a user will show up on the leaderboard.
repeated string authors = 5
Author information.
optional string affiliation = 6
optional string description = 7
A brief description of the method.
optional string method_link = 8
Link to paper or other link that describes the method.
optional MotionChallengeSubmission.SubmissionType submission_type = 2
The challenge submission type.
optional bool uses_lidar_data = 9
Set this to true if your model uses the lidar data provided in the motion dataset. This field is now REQUIRED for a valid submission.
optional bool uses_camera_data = 10
Set this to true if your model uses the camera data provided in the motion dataset. This field is now REQUIRED for a valid submission.
optional bool uses_public_model_pretraining = 11
Set this to true if your model used publicly available open-source LLM/VLM(s) for pre-training. This field is now REQUIRED for a valid submission.
repeated string public_model_names = 13
If any open-source model was used, specify their names and configuration.
optional string num_model_parameters = 12
Specify an estimate of the number of parameters of the model used to generate this submission. The number must be specified as an integer number followed by a multiplier suffix (from the set [K, M, B, T, ...], e.g. "200K"). This field is now REQUIRED for a valid submission.
repeated ChallengeScenarioPredictions scenario_predictions = 1
The set of scenario predictions to evaluate. One entry should exist for every record in the test set.

Used in: MotionChallengeSubmission

UNKNOWN = 0
MOTION_PREDICTION = 1
A submission for the Waymo open dataset motion prediction challenge.
INTERACTION_PREDICTION = 2
A submission for the Waymo open dataset interaction prediction challenge.

A configuration for converting Scenario protos to tf.Example protos.

optional int32 max_num_agents = 1
The maximum number of agents to populate in the tf.Example.
optional int32 max_num_modeled_agents = 10
The maximum number of modeled agents to populate in the tf.Example. This field should not be changed from 8 for open dataset motion challenge uses.
optional int32 num_past_steps = 2
The number of past steps (including the current step) in each trajectory. The defaults correspond to the open motion dataset data - 11 past steps (includes the current step), and 80 future steps). Changing these values will make the data incompatible with the open dataset motion challenges.
optional int32 num_future_steps = 3
The number of future steps in each trajectory.
optional int32 max_roadgraph_samples = 5
The maximum number of map points to store in each example. This defines the sizes of the roadgraph_samples/* tensors. Any additional samples in the source Scenario protos will be truncated. Lane centers and lane boundaries are prioritized over other types. This parameter along with the polyline_sample_spacing and polygon_sample_spacing fields will determine if points are truncated. For reference, in the current waymo open motion dataset the vast majority of Scenarios have less than 60,000 samples at 0.5m spacing (not including polygon samples). Only a few outliers exceed this where the largest has approximately 75,000 samples at 0.5m.
optional double source_polyline_spacing = 6
The input source polyline sample spacing. Do not change this from the default when using open dataset input data.
optional double polyline_sample_spacing = 7
The roadgraph points will be re-sampled with this spacing. Note that decreasing this parameter may require an increase in the max_roadgraph_samples parameter to avoid truncating roadgraph data. If this is set to <= 0, the value in source_polyline_spacing will be used.
optional double polygon_sample_spacing = 8
Features like speed bumps and crosswalks are defined only by polygon corner points. If this value is > 0, samples along the sides of the polygons will be added, spaced apart by this value. If this value is <= 0, only the polygon vertices will be added as sample points. Note that decreasing this parameter may require an increase in the max_roadgraph_samples parameter to avoid truncating roadgraph data.
optional int32 max_traffic_light_control_points_per_step = 9
The maximum number of traffic light points per time step.

repeated MotionMetricsBundle metrics_bundles = 1
A set of metrics broken down by measurement time step and object type.

Used in: MotionMetrics

optional Track.ObjectType object_filter = 7
The object type these metrics were filtered by. All metrics below are only for this type of object. If not set, the metrics are aggregated for all types.
optional int32 measurement_step = 6
The prediction time step used to compute the metrics. The metrics are computed as if this was the last time step in the trajectory.
optional float min_ade = 1
For each object, the average difference from the ground truth in meters is computed up to the measurement time step is computed for all trajectory predictions for that object. The value with the minimum error is kept (minADE). The resulting values are accumulated for all predicted objects in all scenarios.
optional float min_fde = 2
For each object the error for a given trajectory at the measurement time step is computed for all trajectory predictions for that objects. The value with the minimum error is kept (minFDE). The mean of all measurements in the accumulator is the average minFDE.
optional float miss_rate = 3
The miss rate is calculated by computing the displacement from ground truth at the measurement time step. If the displacement is greater than the miss rate threshold it is considered a miss. The number of misses for all objects divided by the total number of objects is equal to the miss rate.
optional float overlap_rate = 4
Overlaps are detected as any intersection of the bounding boxes of the highest confidence predicted object trajectory with those of any other valid object at the same time step for time steps up to the measurement time step. Only objects that were valid at the prediction time step are considered. If one or more overlaps occur up to the measurement step it is considered a single overlap measurement. The total number of overlaps divided by the total number of objects is equal to the overall overlap rate.
optional float mean_average_precision = 5
The mAP metric is computed by accumulating true and false positive measurements based on thresholding the FDE at the measurement time step over all object predictions. The measurements are separated into buckets based on the trajectory shape. The mean average precision of each bucket is computed as described in "The PASCAL Visual Object Classes (VOC) Challenge" (Everingham, 2009, p. 11). using the newer method that includes all samples in the computation consistent with the current PASCAL challenge metrics. The mean of the AP value across all trajectory shape buckets is equal to this mAP value.
optional float soft_mean_average_precision = 8
Same as mean_average_precision but duplicate true positives per ground truth trajectory are ignored rather than counted as false positives.
map<string, float> custom_metrics = 9
Custom metrics (those not already included above) can be stored in the following map, identified by name.

Configuration to compute motion metrics.

optional int32 track_steps_per_second = 1
The sampling rates for the scenario track data and the prediction data. The track sampling must be an integer multiple of the prediction sampling.
optional int32 prediction_steps_per_second = 2
optional int32 track_history_samples = 3
The number of samples for both the history and the future track data. Tracks must be of length track_history_samples + track_future_samples + 1 (one extra for the current time step). Predictions must be length (track_history_samples + track_future_samples) * prediction_steps_per_second / track_steps_per_second (current time is not included in the predictions). IMPORTANT: Note that the first element of the prediction corresponds to time (1.0 / prediction_steps_per_second) NOT time 0.
optional int32 track_future_samples = 4
optional float speed_lower_bound = 5
Parameters for miss rate and mAP threshold scaling as a function of the object initial speed. If the object speed is below speed_lower_bound, the scale factor for the thresholds will equal speed_scale_lower. Above speed_upper_bound, the scale factor will equal speed_scale_upper. In between the two bounds, the scale factor will be interpolated linearly between the lower and upper scale factors. Both the lateral and longitudinal displacement thresholds for miss rate and mAP will be scaled by this factor before the thresholds are applied.
optional float speed_upper_bound = 6
optional float speed_scale_lower = 7
optional float speed_scale_upper = 8
repeated MotionMetricsConfig.MeasurementStepConfig step_configurations = 9
The prediction samples and parameters used to compute metrics at a specific time step. Time in seconds can be computed as (measurement_step + 1) / prediction_steps_per_second. Metrics are computed for each step in the list as if the given measurement_step were the last step in the predicted trajectory.
optional int32 max_predictions = 10
The maximum number of predictions to use as K in all min over K metrics computations.

Used in: MotionMetricsConfig

optional int32 measurement_step = 1
The prediction step to use to measure all metrics. The metrics are computed as if this were the last step in the predicted trajectory. Time in seconds can be computed as (measurement_step + 1) / prediction_steps_per_second.
optional float lateral_miss_threshold = 2
The threshold for lateral distance error in meters for miss rate and mAP computations.
optional float longitudinal_miss_threshold = 3
The threshold for longitudinal distance error in meters for miss rate and mAP computations.

Used in: ScenarioPredictions

repeated JointTrajectories joint_predictions = 1
A set of predictions (or joint predictions) with varying confidences - all for the same object or group of objects. All prediction entries must contain trajectories for the same set of objects or an error will be returned. Any predictions past the max number of predictions set in the metrics config will be discarded.

Used in: Objects

optional Polygon2dProto zone = 1
optional string context_name = 2
optional int64 frame_timestamp_micros = 3

This is a wrapper on waymo.open_dataset.Label. We have another proto to add more information such as class confidence for metrics computation.

Used in: Objects

optional Label object = 1
optional float score = 2
The confidence within [0, 1] of the prediction. Defaults to 1.0 for ground truths.
optional bool overlap_with_nlz = 3
Whether this object overlaps with any NLZ (no label zone). Users do not need to set this field when evaluating on the eval leaderboard as the leaderboard does this computation.
optional string context_name = 4
These must be set when evaluating on the leaderboard. This should be set to Context.name defined in dataset.proto::Context.
optional int64 frame_timestamp_micros = 5
This should be set to Frame.timestamp_micros defined in dataset.proto::Frame.
optional CameraName.Name camera_name = 6
Optionally, if this object is used for camera image labels or predictions, this needs to be populated to uniquely identify which image this object is for.

Used in: Track

optional double center_x = 2
Coordinates of the center of the object bounding box.
optional double center_y = 3
optional double center_z = 4
optional float length = 5
The dimensions of the bounding box in meters.
optional float width = 6
optional float height = 7
optional float heading = 8
The yaw angle in radians of the forward direction of the bounding box (the vector from the center of the box to the middle of the front box segment) counter clockwise from the X-axis (right hand system about the Z axis). This angle is normalized to [-pi, pi).
optional float velocity_x = 9
The velocity vector in m/s. This vector direction may be slightly different from the heading of the bounding box.
optional float velocity_y = 10
optional bool valid = 11
False if the state data is invalid or missing.

Used in: ScoredJointTrajectory

optional int32 object_id = 1
The ID of the object being predicted. This must match the object_id field in the test or validation set tf.Example or scenario proto corresponding to this prediction. Note this must be the same as the object_id in the scenario track or the state/id field in the tf.Example, not the track index.
optional Trajectory trajectory = 2
The trajectory for the object.

Used in: Submission

repeated Object objects = 1
repeated NoLabelZoneObject no_label_zone_objects = 2
Users do not need to set this when evaluating on the leaderboard.

Occupancy and flow metrics averaged over all prediction waypoints. Please refer to occupancy_flow_metrics.py for an implementation of these metrics.

optional int32 num_waypoints_with_observed_occupancy = 8
The metrics stored in this proto are averages over all waypoints. However, blank waypoints, which contain no occupancy or flow ground-truth, are excluded when computing the metrics. The following fields record the number of waypoints which are used for computing each of the 3 categories of metrics.
optional int32 num_waypoints_with_occluded_occupancy = 9
optional int32 num_waypoints_with_flow = 10
optional float vehicles_observed_auc = 1
Treating occupancy in each grid cell as an independent binary prediction, this metric measures the area under the precision-recall curve of all grid cells in the future occupancy of currently-observed vehicles.
optional float vehicles_observed_iou = 2
Measures the soft intersection-over-union between ground-truth bounding boxes and predicted future occupancy grids of currently-observed vehicles.
optional float vehicles_occluded_auc = 3
Same as above, but for currently-occluded vehicles. NOTE: All agents in future timesteps are divided into the two categories (currently-observed and currently-occluded) depending on whether the agent is present (valid) at the current timestep. Agents which are not valid at the current time, but become valid later are considered currently- occluded. The model is expected to predict the two categories separately, and the occupancy metrics are also computed separately for the two categories.
optional float vehicles_occluded_iou = 4
optional float vehicles_flow_epe = 5
End-point-error between ground-truth and predicted flow fields, averaged over all cells in the grid. Flow end-point-error measures the Euclidean distance between the predicted and ground-truth flow vectors.
optional float vehicles_flow_warped_occupancy_auc = 6
optional float vehicles_flow_warped_occupancy_iou = 7

Configuration for all parameters defining the occupancy flow task.

optional int32 num_past_steps = 1
The following default values reflect the size of sequences in the Waymo Open Motion Dataset.
optional int32 num_future_steps = 2
optional int32 num_waypoints = 3
Number of predicted waypoints (snapshots over time) for each scene. The waypoints uniformly divide the future timesteps (num_future_steps) into num_waypoints equal intervals.
optional bool cumulative_waypoints = 4
When cumulative_waypoints is false, ground-truth waypoints are created by sampling individual timesteps from the future timesteps. For example, for num_futures_steps = 80 and num_waypoints = 8, ground-truth occupancy is taken from timesteps {10, 20, 30, ..., 80}, and ground-truth flow fields are constructed from the displacements between timesteps {0 -> 10, 10 -> 20, ..., 70 -> 80} where 0 is the current time and 1-80 are the future timesteps. When cumulative_waypoints is true, ground-truth waypoints are created by aggregating occupancy and flow over all the timesteps that fall inside each waypoint. For example, the last waypoint's occupancy is constructed by accumulating occupancy over timesteps [71, 72, ..., 80] and the last waypoint's flow field is constructed by averaging all 10 flow fields between timesteps [61 -> 71, 62 -> 72, ..., 70 -> 80]. The code provided in occupancy_flow_data.py implements the above logic to construct the ground truth.
optional bool normalize_sdc_yaw = 12
Whether to rotate the scene such that the SDC is heading up in ground-truth grids.
optional int32 grid_height_cells = 5
Occupancy grids are organized [grid_height_cells, grid_width_cells, 1]. Flow fields are organized as [grid_height_cells, grid_width_cells, 2].
optional int32 grid_width_cells = 6
optional int32 sdc_y_in_grid = 7
The ground-truth occupancy and flow for all future waypoints are rendered with reference to the location of the autonomous vehicle at the current time. The autonomous vehicle's current location is mapped to the following coordinates.
optional int32 sdc_x_in_grid = 8
optional float pixels_per_meter = 9
Prediction scale. With a value of 3.2, the 256x256 grid covers an 80mx80m area of the world.
optional int32 agent_points_per_side_length = 10
Ground-truth occupancy grids are constructed by sampling the specified number of points along the length and width from the interior of agent boxes and scattering those points on the grid. Similarly, ground-truth flow fields are constructed from the (dx, dy) displacements of such points over time.
optional int32 agent_points_per_side_width = 11

Non-self-intersecting 2d polygons. This polygon is not necessarily convex.

Used in: Frame, NoLabelZoneObject

repeated double x = 1
repeated double y = 2
optional string id = 3
A globally unique ID.

Used in: ChallengeScenarioPredictions

repeated SingleObjectPrediction predictions = 1
A list of predictions for the required objects in the scene. These must exactly match the objects in the tracks_to_predict field of the test scenario or tf.Example.

Range image is a 2d tensor. The first dim (row) represents pitch. The second dim represents yaw. There are two types of range images: 1. Raw range image: Raw range image with a non-empty 'range_image_pose_compressed' which tells the vehicle pose of each range image cell. 2. Virtual range image: Range image with an empty 'range_image_pose_compressed'. This range image is constructed by transforming all lidar points into a fixed vehicle frame (usually the vehicle frame of the middle scan). NOTE: 'range_image_pose_compressed' is only populated for the first range image return. The second return has the exact the same range image pose as the first one.

Used in: Laser

optional bytes range_image_compressed = 2
Zlib compressed [H, W, 4] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_compressed); MatrixFloat range_image; range_image.ParseFromString(val); Inner dimensions are: * channel 0: range * channel 1: intensity * channel 2: elongation * channel 3: is in any no label zone.
optional bytes camera_projection_compressed = 3
Lidar point to camera image projections. A point can be projected to multiple camera images. We pick the first two at the following order: [FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT]. Zlib compressed [H, W, 6] serialized version of MatrixInt32. To decompress: string val = ZlibDecompress(camera_projection_compressed); MatrixInt32 camera_projection; camera_projection.ParseFromString(val); Inner dimensions are: * channel 0: CameraName.Name of 1st projection. Set to UNKNOWN if no projection. * channel 1: x (axis along image width) * channel 2: y (axis along image height) * channel 3: CameraName.Name of 2nd projection. Set to UNKNOWN if no projection. * channel 4: x (axis along image width) * channel 5: y (axis along image height) Note: pixel 0 corresponds to the left edge of the first pixel in the image.
optional bytes range_image_pose_compressed = 4
Zlib compressed [H, W, 6] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_pose_compressed); MatrixFloat range_image_pose; range_image_pose.ParseFromString(val); Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform from vehicle frame to global frame for every range image pixel. This is ONLY populated for the first return. The second return is assumed to have exactly the same range_image_pose_compressed. The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, meaning that rotating from the navigation to vehicle frame consists of a yaw, then pitch and finally roll rotation about the z, y and x axes respectively. All rotations use the right hand rule and are positive in the counter clockwise direction.
optional bytes range_image_flow_compressed = 5
Zlib compressed [H, W, 5] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_flow_compressed); MatrixFloat range_image_flow; range_image_flow.ParseFromString(val); Inner dimensions are [vx, vy, vz, pointwise class]. If the point is not annotated with scene flow information, class is set to -1. A point is not annotated if it is in a no-label zone or if its label bounding box does not have a corresponding match in the previous frame, making it infeasible to estimate the motion of the point. Otherwise, (vx, vy, vz) are velocity along (x, y, z)-axis for this point and class is set to one of the following values: -1: no-flow-label, the point has no flow information. 0: unlabeled or "background,", i.e., the point is not contained in a bounding box. 1: vehicle, i.e., the point corresponds to a vehicle label box. 2: pedestrian, i.e., the point corresponds to a pedestrian label box. 3: sign, i.e., the point corresponds to a sign label box. 4: cyclist, i.e., the point corresponds to a cyclist label box.
optional bytes segmentation_label_compressed = 6
Zlib compressed [H, W, 2] serialized version of MatrixInt32. To decompress: string val = ZlibDecompress(segmentation_label_compressed); MatrixInt32 segmentation_label. segmentation_label.ParseFromString(val); Inner dimensions are [instance_id, semantic_class]. NOTE: 1. Only TOP LiDAR has segmentation labels. 2. Not every frame has segmentation labels. This field is not set if a frame is not labeled. 3. There can be points missing segmentation labels within a labeled frame. Their label are set to TYPE_NOT_LABELED when that happens.
optional MatrixFloat range_image = 1
Deprecated, do not use.

An object that must be predicted for the scenario.

Used in: Scenario

optional int32 track_index = 1
An index into the Scenario `tracks` field for the object to be predicted.
optional RequiredPrediction.DifficultyLevel difficulty = 2
The difficulty level for this object.

A difficulty level for predicting a given track.

Used in: RequiredPrediction

NONE = 0
LEVEL_1 = 1
LEVEL_2 = 2

Used in: MapFeature

optional RoadEdge.RoadEdgeType type = 1
The type of road edge.
repeated MapPoint polyline = 2
The polyline defining the road edge. A polyline is a list of points with segments defined between consecutive points.

Type of this road edge.

Used in: RoadEdge

TYPE_UNKNOWN = 0
TYPE_ROAD_EDGE_BOUNDARY = 1
Physical road boundary that doesn't have traffic on the other side (e.g., a curb or the k-rail on the right side of a freeway).
TYPE_ROAD_EDGE_MEDIAN = 2
Physical road boundary that separates the car from other traffic (e.g. a k-rail or an island).

Used in: MapFeature

optional RoadLine.RoadLineType type = 1
The type of the lane boundary.
repeated MapPoint polyline = 2
The polyline defining the road edge. A polyline is a list of points with segments defined between consecutive points.

Type of this road line.

Used in: BoundarySegment, RoadLine

TYPE_UNKNOWN = 0
TYPE_BROKEN_SINGLE_WHITE = 1
TYPE_SOLID_SINGLE_WHITE = 2
TYPE_SOLID_DOUBLE_WHITE = 3
TYPE_BROKEN_SINGLE_YELLOW = 4
TYPE_BROKEN_DOUBLE_YELLOW = 5
TYPE_SOLID_SINGLE_YELLOW = 6
TYPE_SOLID_DOUBLE_YELLOW = 7
TYPE_PASSING_DOUBLE_YELLOW = 8

optional string scenario_id = 5
The unique ID for this scenario.
repeated double timestamps_seconds = 1
Timestamps corresponding to the track states for each step in the scenario. The length of this field is equal to tracks[i].states_size() for all tracks i and equal to the length of the dynamic_map_states_field.
optional int32 current_time_index = 10
The index into timestamps_seconds for the current time. All time steps after this index are future data to be predicted. All steps before this index are history data.
repeated Track tracks = 2
Tracks for all objects in the scenario. All object tracks in all scenarios in the dataset have the same number of object states. In this way, the tracks field forms a 2 dimensional grid with objects on one axis and time on the other. Each state can be associated with a timestamp in the 'timestamps_seconds' field by its index. E.g., tracks[i].states[j] indexes the i^th agent's state at time timestamps_seconds[j].
repeated DynamicMapState dynamic_map_states = 7
The dynamic map states in the scenario (e.g. traffic signal states). This field has the same length as timestamps_seconds. Each entry in this field can be associated with a timestamp in the 'timestamps_seconds' field by its index. E.g., dynamic_map_states[i] indexes the dynamic map state at time timestamps_seconds[i].
repeated MapFeature map_features = 8
The set of static map features for the scenario.
optional int32 sdc_track_index = 6
The index into the tracks field of the autonomous vehicle object.
repeated int32 objects_of_interest = 4
A list of objects IDs in the scene detected to have interactive behavior. The objects in this list form an interactive group. These IDs correspond to IDs in the tracks field above.
repeated RequiredPrediction tracks_to_predict = 11
A list of tracks to generate predictions for. For the challenges, exactly these objects must be predicted in each scenario for test and validation submissions. This field is populated in the training set only as a suggestion of objects to train on.
repeated CompressedFrameLaserData compressed_frame_laser_data = 12
Per time step Lidar data. This contains lidar up to the current time step such that compressed_frame_laser_data[i] corresponds to the states at timestamps_seconds[i] where i <= current_time_index. This field is not populated in all versions of the dataset.
repeated FrameCameraTokens frame_camera_tokens = 13
Per time step camera tokens. This contains camera tokens up to the current time step such that frame_camera_tokens[i] corresponds to the states at timestamps_seconds[i] where i <= current_time_index. This field is not populated in all versions of the dataset.

A set of predictions used for metrics evaluation.

optional string scenario_id = 1
The unique ID of the scenario being predicted. This ID must match the scenario_id field in the test or validation set tf.Example or scenario proto corresponding to this set of predictions.
repeated MultimodalPrediction multi_modal_predictions = 2
The predictions for the scenario. These represent either single object predictions or joint predictions for a group of objects.

Used in: SimAgentsChallengeSubmission

optional string scenario_id = 1
String ID of the original scenario proto used as initial conditions.
repeated JointScene joint_scenes = 2
Collection of multiple `JointScene`s simulated from the same initial conditions (corresponding to the original Scenario proto). This needs to include exactly 32 parallel simulations.

A message containing a prediction for either a single object or a joint prediction for a set of objects.

Used in: JointPrediction

repeated ObjectTrajectory trajectories = 2
The trajectories for the objects in the scenario being predicted. For the interactive challenge, this must contain exactly 2 trajectories for the pair of objects listed in the tracks_to_predict field of the Scenario or tf.Example proto.
optional float confidence = 3
An optional confidence measure for this joint prediction. These confidence scores should reflect confidence in the existence of the trajectory across scenes, not normalized within a scene or per-agent.

Used in: SingleObjectPrediction

optional Trajectory trajectory = 1
The object predicted trajectory.
optional float confidence = 2
An optional confidence measure for this joint prediction. These confidence scores should reflect confidence in the existence of the trajectory across scenes, not normalized within a scene or per-agent.

(message has no fields)

Used in: SegmentationMetricsConfig

TYPE_UNDEFINED = 0
TYPE_CAR = 1
TYPE_TRUCK = 2
TYPE_BUS = 3
TYPE_OTHER_VEHICLE = 4
Other small vehicles (e.g. pedicab) and large vehicles (e.g. construction vehicles, RV, limo, tram).
TYPE_MOTORCYCLIST = 5
TYPE_BICYCLIST = 6
TYPE_PEDESTRIAN = 7
TYPE_SIGN = 8
TYPE_TRAFFIC_LIGHT = 9
TYPE_POLE = 10
Lamp post, traffic sign pole etc.
TYPE_CONSTRUCTION_CONE = 11
Construction cone/pole.
TYPE_BICYCLE = 12
TYPE_MOTORCYCLE = 13
TYPE_BUILDING = 14
TYPE_VEGETATION = 15
Bushes, tree branches, tall grasses, flowers etc.
TYPE_TREE_TRUNK = 16
TYPE_CURB = 17
Curb on the edge of roads. This does not include road boundaries if there’s no curb.
TYPE_ROAD = 18
Surface a vehicle could drive on. This include the driveway connecting parking lot and road over a section of sidewalk.
TYPE_LANE_MARKER = 19
Marking on the road that’s specifically for defining lanes such as single/double white/yellow lines.
TYPE_OTHER_GROUND = 20
Marking on the road other than lane markers, bumps, cateyes, railtracks etc.
TYPE_WALKABLE = 21
Most horizontal surface that’s not drivable, e.g. grassy hill, pedestrian walkway stairs etc.
TYPE_SIDEWALK = 22
Nicely paved walkable surface when pedestrians most likely to walk on.

Used in: SegmentationFrameList

repeated Laser segmentation_labels = 1
Segmentation labels by lasers.
optional string context_name = 2
These must be set when evaluating on the leaderboard. This should be set to Context.name defined in dataset.proto::Context.
optional int64 frame_timestamp_micros = 3
This should be set to Frame.timestamp_micros defined in dataset.proto::Frame.

Used in: SemanticSegmentationSubmission

repeated SegmentationFrame frames = 1

Used in: SegmentationMetrics

repeated int64 intersections = 1
Stats for each class. The length of each field should be the num_class. The number of points with matching prediction and groundtruth for this class.
repeated int64 unions = 2
The total number of points for this class in both prediction and groundtruth.

map<int32, float> per_class_iou = 1
Per class IOU (Intersection Over Union). Keyed by class index.
optional float miou = 2
optional SegmentationMeasurements segmentation_measurements = 3

repeated Segmentation.Type segmentation_types = 1
The list of segmentation_types to eval.

If your inference results are too large to fit in one proto, you can shard them to multiple files by sharding the inference_results field. Next ID: 11.

optional string account_name = 1
This must be set as the full email used to register at waymo.com/open.
optional string unique_method_name = 2
This name needs to be short, descriptive and unique. Only the latest result of the method from a user will show up on the leaderboard.
repeated string authors = 3
optional string affiliation = 4
optional string description = 5
optional string method_link = 6
Link to paper or other link that describes the method.
optional SemanticSegmentationSubmission.SensorType sensor_type = 7
optional int32 number_past_frames_exclude_current = 8
Number of frames used.
optional int32 number_future_frames_exclude_current = 9
optional SegmentationFrameList inference_results = 10
Inference results.

Used in: SemanticSegmentationSubmission

INVALID = 0
LIDAR_ALL = 1
LIDAR_TOP = 2
CAMERA_ALL = 3
CAMERA_LIDAR_TOP = 4
CAMERA_LIDAR_ALL = 5

Aggregation (at the dataset-level or scenario-level) of the lower-level features into proper metrics.

optional string scenario_id = 1
If these metrics are at the scenario-level, specify the ID of the scenario they relate to. If not specified, represent the aggregation at the dataset level of the per-scenario metrics.
optional float metametric = 2
The meta-metric, i.e. the weighted aggregation of all the lower-level features. This score is used to rank the submissions for the Sim Agents challenge.
optional float average_displacement_error = 3
Average displacement error (average or minimum over simulations).
optional float min_average_displacement_error = 13
optional float linear_speed_likelihood = 4
Dynamic features, i.e. speeds and accelerations.
optional float linear_acceleration_likelihood = 5
optional float angular_speed_likelihood = 6
optional float angular_acceleration_likelihood = 7
optional float distance_to_nearest_object_likelihood = 8
Interactive features.
optional float collision_indication_likelihood = 9
optional float time_to_collision_likelihood = 10
optional float distance_to_road_edge_likelihood = 11
Map-based features: distance to road edge, offroad indication.
optional float offroad_indication_likelihood = 12
optional float traffic_light_violation_likelihood = 16
optional float simulated_collision_rate = 14
Fraction of simulated objects that collide for at least one step with any other simulated object.
optional float simulated_offroad_rate = 15
Fraction of simulated objects that drive offroad for at least one step.
optional float simulated_traffic_light_violation_rate = 17
Fraction of simulated objects that violate a traffic light for at least one step.

Configuration for the Sim Agents metrics.

optional SimAgentMetricsConfig.FeatureConfig linear_speed = 1
Dynamics features.
optional SimAgentMetricsConfig.FeatureConfig linear_acceleration = 2
optional SimAgentMetricsConfig.FeatureConfig angular_speed = 3
optional SimAgentMetricsConfig.FeatureConfig angular_acceleration = 4
optional SimAgentMetricsConfig.FeatureConfig distance_to_nearest_object = 5
Interactive features.
optional SimAgentMetricsConfig.FeatureConfig collision_indication = 6
optional SimAgentMetricsConfig.FeatureConfig time_to_collision = 7
optional SimAgentMetricsConfig.FeatureConfig distance_to_road_edge = 8
Map-based features.
optional SimAgentMetricsConfig.FeatureConfig offroad_indication = 9
optional SimAgentMetricsConfig.FeatureConfig traffic_light_violation = 10

The Bernoulli estimator is used for boolean features, e.g. collision.

Used in: FeatureConfig

optional float additive_smoothing_pseudocount = 4
Additive smoothing to apply to the underlying 2-bins histogram, to avoid infinite values for empty bins.

Each of the features used to evaluated sim-agents has one of the following configs.

Used in: SimAgentMetricsConfig

oneof estimator
To estimate the likelihood of the logged features under the simulated distribution of features, an approximator of such distribution is needed. For continuous values we support histogram-based and kernel-density-based estimators.
- HistogramEstimate histogram = 1
- KernelDensityEstimate kernel_density = 2
- BernoulliEstimate bernoulli = 3
optional bool independent_timesteps = 4
Based on this flag, the distribution of simulated features will be aggregated over time to approximate one single (per-scenario, per-object) distribution instead of `N_STEP` per-step distributions. Example: When using `independent_timesteps=False` for speed, each logged step will be evaluated under the speed distribution of the 32 parallel simulations at that specific step. When `independent_timesteps=True`, each logged step will be evaluated against the same distribution over all the steps (32 * 80 total samples).
optional float metametric_weight = 5
For each of the features, we extract a likelihood score in the range [0,1]. The meta-metric (i.e. how all the submission are finally scored and ranked) is just a weighted average of these scores.
optional bool aggregate_objects = 6
Based on this flag, the distribution of simulated features will be aggregated over all the objects at every single time step. Example: 1. SIM_AGENTS challenge uses `aggregate_objects=False` for all histogram-based features. For example, for speed features, each logged step for each single agent will be evaluated under the speed distribution of the 32 parallel simulations at that specific step; 2. SCENARIO_GEN challenge uses `aggregate_objects=True` for all histogram-based features. Similar to speed, for each logged step all objects will be evaluated against the same distribution over all the 32 parallel simulated scenarios' objects (parallel_simulations * num_valid_objects);

Configuration for the histogram-based likelihood estimation.

Used in: FeatureConfig

optional float min_val = 1
Extremes on which the histogram is defined. The default configuration provided for the challenge has these values carefully set based on ground truth data. Any user submission exceeding these thresholds will be clipped, resulting in lower score for the submission.
optional float max_val = 2
optional int32 num_bins = 3
Number of bins for the histogram to be discretized into.
optional float additive_smoothing_pseudocount = 4
Additive smoothing to apply to the histogram, to avoid infinite values when 1+ bins are empty.

Used in: FeatureConfig

optional float bandwidth = 1
Bandwidth for the Kernel Density estimation. For more details, check sklearn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html This field needs to be set and needs to be strictly positive, otherwise an error is raised at runtime.

Bucketed version of the sim agent metrics. This aggregated message is used in the challenge leaderboard to provide an easy to read but still informative metric output format. All the bucketed metrics are rescaled to be in the range [0, 1], but still according to the meta-metric weights defined in the metrics config.

optional float realism_meta_metric = 1
Realism meta-metric.
optional float kinematic_metrics = 2
Kinematic metrics: a linear combination of the kinematic-related likelihoods, namely `linear_speed`, `linear_acceleration`, `angular_speed` and `angular_acceleration`.
optional float interactive_metrics = 5
Interactive metrics: a linear combination of the object-interaction likelihoods, namely `distance_to_nearest_object`, `collision_indication` and `time_to_collision`.
optional float map_based_metrics = 6
Map-based metrics: a linear combination of the map-related likelihoods, namely `distance_to_road_edge` and `offroad_indication`.
optional float min_ade = 7
MinADE.
optional float simulated_collision_rate = 8
Fraction of simulated objects that collide for at least one step with any other simulated object.
optional float simulated_offroad_rate = 9
Fraction of simulated objects that drive offroad for at least one step.
optional float simulated_traffic_light_violation_rate = 10
Fraction of simulated objects that violate a traffic light for at least one step.

Message packaging a full submission to the challenge.

repeated ScenarioRollouts scenario_rollouts = 1
The set of scenario rollouts to evaluate. One entry should exist for every record in the test set.
optional SimAgentsChallengeSubmission.SubmissionType submission_type = 2
Identifier of the submission type. Has to be set for the submission to be valid.
optional string account_name = 3
This must be set as the full email used to register at waymo.com/open.
optional string unique_method_name = 4
This name needs to be short, descriptive and unique. Only the latest result of the method from a user will show up on the leaderboard.
repeated string authors = 5
Author information.
optional string affiliation = 6
optional string description = 7
A brief description of the method.
optional string method_link = 8
Link to paper or other link that describes the method.
optional bool uses_lidar_data = 9
Set this to true if your model uses the lidar data provided in the motion dataset. This field is now REQUIRED for a valid submission.
optional bool uses_camera_data = 10
Set this to true if your model uses the camera data provided in the motion dataset. This field is now REQUIRED for a valid submission.
optional bool uses_public_model_pretraining = 11
Set this to true if your model used publicly available open-source LLM/VLM(s) for pre-training. This field is now REQUIRED for a valid submission.
repeated string public_model_names = 13
If any open-source model was used, specify their names and configuration.
optional string num_model_parameters = 12
Specify an estimate of the number of parameters of the model used to generate this submission. The number must be specified as an integer number followed by a multiplier suffix (from the set [K, M, B, T, ...], e.g. "200K"). This field is now REQUIRED for a valid submission.
optional bool acknowledge_complies_with_closed_loop_requirement = 14
Several submissions for the 2023 challenge did not comply with the closed-loop at 10Hz requirement we specified both on the website https://waymo.com/open/challenges/2024/sim-agents/ and the NeurIPS paper https://arxiv.org/abs/2305.12032, Section 3 "Task constraints". Please make sure your method complies with these rules before submitting, to ensure our leaderboard is fair.

The challenge submission type.

Used in: SimAgentsChallengeSubmission

UNKNOWN = 0
SIM_AGENTS_SUBMISSION = 1
A submission for the Waymo open dataset Sim Agents challenge.

Used in: JointScene

repeated float center_x = 2
The simulated trajectory for a single object, including position and heading. The (x, y, z) coordinates identify the centroid of the modeled object, defined in the same coordinate frame as the original input scenario. Heading is defined in radians, counterclockwise from East. See https://waymo.com/open/data/motion/ for more info. The length of these fields must be exactly 80, encoding the 8 seconds of future simulation at the same frequency of the Scenario proto (10Hz). These objects will only be considered if they are valid at the `current_time_index` step (which is hardcoded to 10, with 0-indexing). These objects will be assumed to be valid for the whole duration of the simulation after `current_time_index`, maintaining the latest box sizes (width, length and height) observed in the original scenario at the `current_time_index`.
repeated float center_y = 3
repeated float center_z = 4
repeated float heading = 5
repeated float width = 7
Optional fields. These fields represent the dimensions (in time) of the bounding box of the object, with the same conventions as above. If these are not required by the challenge, we assume fixed box dimensions. Please refer to challenge specification to check if these fields are used.
repeated float length = 8
repeated float height = 9
repeated bool valid = 11
Specifies an object field, when required by the challenge, otherwise ignored. Please refer to challenge specification to check if these fields are used.
optional int32 object_id = 6
ID of the object.
optional Track.ObjectType object_type = 10
Optional field, representing the type of the object. If this is not required by the challenge, this field is ignored. Please refer to challenge specification to check if these fields are used.

Used in: PredictionSet

optional int32 object_id = 1
The ID of the object being predicted. This must match the object_id field in the test or validation set tf.Example or scenario proto corresponding to this prediction. Note this must be the same as the object_id in the scenario track or the state/id field in the tf.Example, not the track index.
repeated ScoredTrajectory trajectories = 2
A set of up to 6 trajectory predictions for this object with varying confidences. Any predictions past the first six will be discarded.

Used in: JointTrajectories

optional int32 object_id = 1
The ID of the object being predicted. This must match the object_id field in the test or validation set tf.Example or scenario proto corresponding to this prediction.
repeated float center_x = 2
The predicted trajectory positions.
repeated float center_y = 3

Used in: MapFeature

repeated MapPoint polygon = 1
The polygon defining the outline of the speed bump. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).

Used in: MapFeature

repeated int64 lane = 1
The IDs of lane features controlled by this stop sign.
optional MapPoint position = 2
The position of the stop sign.

If your inference results are too large to fit in one proto, you can shard them to multiple files by sharding the inference_results field. Next ID: 17.

optional Submission.Task task = 1
This specifies which task this submission is for.
optional string account_name = 13
This must be set as the full email used to register at waymo.com/open.
optional string unique_method_name = 2
This name needs to be short, descriptive and unique. Only the latest result of the method from a user will show up on the leaderboard.
repeated string authors = 3
optional string affiliation = 4
optional string description = 5
optional string method_link = 6
Link to paper or other link that describes the method.
optional string docker_image_source = 16
Link to the latency submission Docker image stored in Google Storage bucket or pushed to Google Container/Artifact Registry. Google Storage bucket example: gs://example_bucket_name/example_folder/example_docker_image.tar.gz Google Container/Artifact Registry example: us-west1-docker.pkg.dev/example-registry-name/example-folder/example-image@sha256:example-sha256-hash Follow latency/README.md to create a docker file.
optional Submission.SensorType sensor_type = 12
optional int32 number_past_frames_exclude_current = 9
Number of frames used.
optional int32 number_future_frames_exclude_current = 10
optional Objects inference_results = 11
Inference results.
repeated Label.Type object_types = 14
Object types this submission contains. By default, we assume all types.
optional float latency_second = 15
Self-reported end to end inference latency in seconds. This is NOT shown on the leaderboard for now. But it is still recommended to set this. Do not confuse this with the `docker_image_source` field above. That is needed to evaluate your model latency on our server.

Used in: Submission

INVALID = 0
LIDAR_ALL = 1
LIDAR_TOP = 2
CAMERA_ALL = 3
CAMERA_LIDAR_TOP = 4
CAMERA_LIDAR_ALL = 5

These values correspond to the tasks on the waymo.com/open site.

Used in: Submission

UNKNOWN = 0
DETECTION_2D = 1
DETECTION_3D = 2
TRACKING_2D = 3
TRACKING_3D = 4
DOMAIN_ADAPTATION = 5
CAMERA_ONLY_DETECTION_3D = 6

The object states for a single object through the scenario.

Used in: Scenario

optional int32 id = 1
The unique ID of the object being tracked. The IDs start from zero and are non-negative.
optional Track.ObjectType object_type = 2
The type of object being tracked.
repeated ObjectState states = 3
The object states through the track. States include the 3D bounding boxes and velocities.

Used in: MotionMetricsBundle, SimulatedTrajectory, Track

TYPE_UNSET = 0
This is an invalid state that indicates an error.
TYPE_VEHICLE = 1
TYPE_PEDESTRIAN = 2
TYPE_CYCLIST = 3
TYPE_OTHER = 4

Used in: TrackingMeasurements

optional int32 num_misses = 1
The number of misses (false negatives).
optional int32 num_fps = 2
The number of false positives.
optional int32 num_mismatches = 3
The number of mismatches.
optional double matching_cost = 4
The sum of matching costs for all matched objects.
optional int32 num_matches = 5
Total number of matched objects.
optional int32 num_objects_gt = 6
Total number of ground truth objects (i.e. labeled objects).
optional float score_cutoff = 7
The score cutoff used to compute this measurement.
repeated TrackingMeasurement.Details details = 8
If set, will include the ids of the fp/tp/fn objects. Each element corresponds to one frame of matching.

Used in: TrackingMeasurement

repeated string fp_pred_ids = 1
False positive prediction ids.
repeated string fn_gt_ids = 2
False negative ground truth ids.
repeated string tp_gt_ids = 3
True positive ground truth ids. Should be of the same length with tp_pr_ids, tp_ious. Each pair of ids of the same index correspond to the ids of ground truth object and prediction objects which are matched.
repeated string tp_pred_ids = 4
True positive prediction ids.

Used in: TrackingMetrics

repeated TrackingMeasurement measurements = 1
optional Breakdown breakdown = 2
The breakdown this measurements are computed for.

optional float mota = 1
Multiple object tracking accuracy (sum of miss, mismatch and fp).
optional float motp = 2
Multiple object tracking precision (matching_cost / num_matches).
optional float miss = 3
Miss ratio (num_misses / num_objects_gt).
optional float mismatch = 4
Mismatch ratio (num_mismatches / num_objects_gt).
optional float fp = 5
False positive ratio (num_fps / num_objects_gt).
optional int32 num_objects_gt = 9
Total number of ground truth objects (i.e. labeled objects).
optional float score_cutoff = 6
optional Breakdown breakdown = 7
The breakdown this metrics are computed for.
optional TrackingMeasurements measurements = 8
Raw measurements.

Used in: DynamicMapState, DynamicState

optional int64 lane = 1
The ID for the MapFeature corresponding to the lane controlled by this traffic signal state.
optional TrafficSignalLaneState.State state = 2
The state of the traffic signal.
optional MapPoint stop_point = 3
The stopping point along the lane controlled by the traffic signal. This is the point where dynamic objects must stop when the signal is in a stop state.

Used in: TrafficSignalLaneState

LANE_STATE_UNKNOWN = 0
LANE_STATE_ARROW_STOP = 1
States for traffic signals with arrows.
LANE_STATE_ARROW_CAUTION = 2
LANE_STATE_ARROW_GO = 3
LANE_STATE_STOP = 4
Standard round traffic signals.
LANE_STATE_CAUTION = 5
LANE_STATE_GO = 6
LANE_STATE_FLASHING_STOP = 7
Flashing light signals.
LANE_STATE_FLASHING_CAUTION = 8

Used in: ObjectTrajectory, ScoredTrajectory

repeated float center_x = 2
The predicted trajectory positions. For the Waymo prediction challenges, these fields must be exactly length 16 - 8 seconds with 2 steps per second starting at timestamp 1.5 (step 15) in the scenario. IMPORTANT: For the challenges, the first entry in each of these fields must correspond to time step 15 in the scenario NOT step 10 or 11 (i.e. the entries in these fields must correspond to steps 15, 20, 25, ... 85, 90 in the scenario).
repeated float center_y = 3

Used in: FrameTrajectoryPredictions

repeated float pos_x = 1
Position in meters. Right-handed coordinate system. +x = forward, +y = left, +z = up. The ego-vehicle is located at (0, 0, 0) at t=0. The prediction length should be 5s at 4Hz, containing 20 waypoints. The first waypoint should be at t+0.25s and the last waypoint should be at t+5s. Only x,y coordinates are included. The z coordinate is not used.
repeated float pos_y = 2

4x4 row major transform matrix that tranforms 3d points from one frame to another.

Used in: CameraCalibration, CameraImage, CompressedFrameLaserData, Frame, LaserCalibration

repeated double transform = 1

Used in: Box2d, keypoints.Keypoint2d

optional double x = 1
optional double y = 2

Used in: Box3d, Frame, keypoints.Keypoint3d

optional double x = 1
optional double y = 2
optional double z = 3

Used in: CameraImage

optional float v_x = 1
Velocity in m/s.
optional float v_y = 2
optional float v_z = 3
optional double w_x = 4
Angular velocity in rad/s.
optional double w_y = 5
optional double w_z = 6

package waymo.open_dataset

message BoundarySegment

optional int32 lane_start_index = 1

optional int32 lane_end_index = 2

optional int64 boundary_feature_id = 3

optional RoadLine.RoadLineType boundary_type = 4

message Box2d

optional Vector2d center = 1

optional Vector2d size = 2

optional double heading = 3

message Box3d

optional Vector3d center = 1

optional Vector3d size = 2

optional double heading = 3

message Breakdown

optional Breakdown.GeneratorId generator_id = 1

optional int32 shard = 2

optional Label.DifficultyLevel difficulty_level = 3

enum Breakdown.GeneratorId

UNKNOWN = 0

ONE_SHARD = 1

OBJECT_TYPE = 2

RANGE = 3

TIME_OF_DAY = 4

LOCATION = 5

WEATHER = 6

VELOCITY = 7

ALL_BUT_SIGN = 8

SIZE = 9

CAMERA = 10

message CameraCalibration

optional CameraName.Name name = 1

repeated double intrinsic = 2

optional Transform extrinsic = 3

optional int32 width = 4

optional int32 height = 5

optional CameraCalibration.RollingShutterReadOutDirection rolling_shutter_direction = 6

enum CameraCalibration.RollingShutterReadOutDirection

UNKNOWN = 0

TOP_TO_BOTTOM = 1

LEFT_TO_RIGHT = 2

BOTTOM_TO_TOP = 3

RIGHT_TO_LEFT = 4

GLOBAL_SHUTTER = 5

message CameraImage

optional CameraName.Name name = 1

optional bytes image = 2

optional Transform pose = 3

optional Velocity velocity = 4

optional double pose_timestamp = 5

optional double shutter = 6

optional double camera_trigger_time = 7

optional double camera_readout_done_time = 8

optional CameraSegmentationLabel camera_segmentation_label = 10

message CameraLabels

optional CameraName.Name name = 1

repeated Label labels = 2

message CameraName

enum CameraName.Name

UNKNOWN = 0

FRONT = 1

FRONT_LEFT = 2

FRONT_RIGHT = 3

SIDE_LEFT = 4

SIDE_RIGHT = 5

REAR_LEFT = 6

REAR = 7

REAR_RIGHT = 8

message CameraSegmentation

enum CameraSegmentation.Type

TYPE_UNDEFINED = 0

TYPE_EGO_VEHICLE = 1

TYPE_CAR = 2

TYPE_TRUCK = 3

TYPE_BUS = 4

TYPE_OTHER_LARGE_VEHICLE = 5

TYPE_BICYCLE = 6

TYPE_MOTORCYCLE = 7

TYPE_TRAILER = 8

TYPE_PEDESTRIAN = 9